Using diff to find the portions of many files that are the same? (bizzaro-diff, or inverse-diff)

Using diff to find the portions of many files that are the same? (bizzaro-diff, or inverse-diff) - html

Bizzaro-Diff!!!
Is there a away to do a bizzaro/inverse-diff that only displays the portions of a group of files that are the same? (I.E. way more than three files)
Odd question, I know...but I'm converting someone's ancient static pages to something a little more manageable.

You want a clone detector. It detects similar code chunks across
large source systems.
See our ClonedR tool: http://www.semdesigns.com/Products/Clone/index.html

You could try the comm command (for common). It'll only compare 2 files at a time, but you should be able to do 3+ with some clever scripting.

You could try sim. Been a few years since I've used it, but I recall it being very useful when looking for similarities within a file or in many different files.

This is a classic problem.
If I had to quick-and-dirty it, I'd probably do something like a diff -U 1000000 (assuming a version of diff that supports it), piped through sed to just get the lines in common (and strip the leading spaces). You'd have to loop through all the files, though.
Edit: I forgot there is also Tcl implementation that would be slightly more versatile, but would require more coding. You may be able to find an implementation for your language of choice.

Related

What is the general term for cruft left over after a build?

When I run some commands, like certain scripts or Makefiles, a number of files and folders are generated along the way to the final output. (For the moment, let's not go into whether or not the script should tidy up after itself. Sometimes this might be a good idea, sometimes not.) What term describes these files?
(I know what I mean when I say "cruft", but I don't think this is necessarily clear, and it could come off as colloquial, which is not what I'm aiming for.)

A common term seems to be "intermediate files"; maybe you could say "intermediate build artefacts" if they are not necessarily just files.

CruiseControl refers to them as Artifacts at least.

I would just refer to them as ~TempFiles generated by your scripts.

Merging - can I change what is recognised as similar

Is it possible to control how merge tools recognise similar blocks of code? Particularly meld, but any suggestions of alternative tools also welcome. If it is relevant, I am using mercurial.
I am working on a system which has a code generator generating initial get/set functions and a fairly common situation is two developers have each added a field, and the new get/set functions are generated at the end of the library. When it comes to merging, there is an inevitable merge conflict.
What I would like is for the merge tool to recognise these as separate functions, rather than modified versions of the same function.
Meld at least starts off with a nice enough view, showing the functions added to each version:
Unfortunately, after I pull across the first it, it now thinks the functions have been modified on one system, instead of seeing them as two separate functions. This is also the same merge result as I see initially in KDiff3:

With KDiff3 you can place manual sync marks to force it to consider lines to be equal. See this answer for example and screen shots.

How difficult would it be to add a message on 1000+ html files?

I have over 1000 html files that I need to edit in the exact same way. I need to;
Add a simple javascript code at the top of each file.
Put some kind of message at the top (it can be anything, as long as it displays the message I want it to).
I was wondering, do I have to edit each file manually to do this? Is there not .htaccess hacks or anything like that?
Any suggestions/help would be appreciated.

I you are using linux, or have installed Cygwin on windows, then sed may be the quickest way to edit the files.
Combined with find, it can be used to very quickly add (or indeed edit) many files.
For example, the following command will replace all instances of the word 'old' with 'new' in all .html files:
find . -name "*.html" -exec sed -i "s/old/new/g" '{}' \;
There are many other examples online.

You can use .htaccess to autoprepend some code, but to be honest, a global find/replace would be a better idea in many ways.
I don't know what OS you use, but as a Mac Developer, http://www.hexmonkeysoftware.com/ is a neat little tool that does find and replace over loads of files.
Otherwise, a quick python script would be easy to write to do this.

If there is any common structure to the files, and their content is valuable and going to be used further in some way, then I would consider going the opposite route and extracting all that information, storing it in a database (or something) and presenting it like normal. This would provide more flexibility in presentation, and could even make the data useful/usable in other ways.

Find and Replace in Files - UTF8

Searching for a free application for commercial usage that allows find/replace in multiple files (regular expressions are nice but not a must), that supports opening and saving in UTF-8.
Tried a few like BKReplaceEm but the application ends up saving all the files as ASCII which causes some problems with web-rendering.
Please advise.
[UPDATE] To further clarify, I am searching for a windows utility.
[UPDATE #2] This is going to be used to run through our 450 page site and replace all french characters with the much needed HTML entities.

Notepad++ supports this feature, and is a great little editor in it's own regard.
Edit : Actually, Notepad++ does support replace in files. Click Search -> Find in Files, then select "Replace in files" in the dialog.

In the spirit of previous answer, you can use Perl (which has seamless native Unicode support and whose RegEx capablity are unparalleled). There are Windows perl versions avialable (ActivePerl, Strawberry, or you can use CygWin), and you can even slap GUIs on top of it -= for the latter, you can see what answers are given to my very recent So question :)
Plus, Perl can grab pretty much unlimitedly powerful collection of files, by using globs for simple things, File::Find for more complicated, and using grep on resulting file list to refine further if you need more fancy stuff, e.g. by content of modification time.
UPDATE For a Windows Editor, you can use UltraEdit. It has free evaluation period, and to be perfectly honest, I find the purchase price to be WELL worth paying for this very nice and powerful editor. Among its other features, it supports Unicode, and has pretty fancy search/replace ablities, including Perl RegEx support and S/R in multiple files.

Use sed.

jEdit has a feature called "HyperSearch" (just open the find dialog). You can specify a directory, a file name pattern and jEdit (being based on Java) does support lots of different encodings (and is often smart enough to figure out the correct one).

You could try my editor, Code Trowel
If it doesn't do what you want I'd probably fix it :-)

For windows, Notepad++ is awesome. It's licensed under the GPL. It does search and replace in files and does support regular expressions.

how to configure Apache + SVN webDAV directory listing

I have an subversion server running with Apache mod_dav_svn and it works nicely but the browsing ability via HTML is a bit spartan. Is there a way to customize it at all?
There's two things I'd like to do to make a huge difference:
separate the directories from the files so all the directories are at the top. Right now everything is in alphabetical order. (the picture above happens to have all the directories preceding files in alphabetical order, but trust me, that's not the normal case)
List the basic file statistics (file size, mod time, last updated version, etc)
Is it posssible to do either of these with mod_dav_svn?

In a vanilla Subversion install, the web interface is very spartan by design. (Remember the HTTP interface is designed for SVN clients, not human beings.)
You can customize the display somewhat via the SVNIndexXSLT directive. (Here is a good place to start).
If you want something richer (with logs and diff features), you will need to install a special front end. WebSVN and ViewVC are very popular. There is also Trac, but this is a higher-level tool.
A list of other repo browsing tools.
Just FYI, we use WebSVN for our repo instance. It took some effort to get it up and running, but once it is setup you can pretty much leave it alone.

WebSvn looks like it might help you. I tried trac and it is very slick but I found it to be complicated and seems overkill for what you're looking for, imo.

Not out of the box - that is, without modifying the source code. You might be interested in tools like ViewSVN or the more sophisticated trac or redmine.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using diff to find the portions of many files that are the same? (bizzaro-diff, or inverse-diff) - html

Bizzaro-Diff!!! Is there a away to do a bizzaro/inverse-diff that only displays the portions of a group of files that are the same? (I.E. way more than three files) Odd question, I know...but I'm converting someone's ancient static pages to something a little more manageable.

You want a clone detector. It detects similar code chunks across large source systems. See our ClonedR tool: http://www.semdesigns.com/Products/Clone/index.html

You could try the comm command (for common). It'll only compare 2 files at a time, but you should be able to do 3+ with some clever scripting.

You could try sim. Been a few years since I've used it, but I recall it being very useful when looking for similarities within a file or in many different files.

Related

What is the general term for cruft left over after a build?

Merging - can I change what is recognised as similar

How difficult would it be to add a message on 1000+ html files?

Find and Replace in Files - UTF8

how to configure Apache + SVN webDAV directory listing

Categories

Resources