How to annote a large file with mercurial? - mercurial

i'm trying to find out who did the last change on a certain line in a large xml file ( ~100.00 lines ).
I tried to use the log file viewer of thg but it does not give me any results.
Using hg annote file.xml -l 95000 .. took forever and eventually died with an error message.
Is there a way to annote a single line in a large file that does not take forever ?

You can use hg grep to dig into a file if you have interest in very specific text:
hg grep "text-to-search" file.txt
You will likely need to add the --all switch to get every change that matches and then limit your results to a specific changeset range with -r firstchage:lastchange syntax.
I don't have a file on hand of the size you are working with, so this may also have trouble past a certain point, particularly if the search string matches many many lines in the file. But if you can get as specific as possible with your search string, you should be able to track it down.

Related

Mercurial: Most recent change per file

I'm looking for a way to make Mercurial output a table like this:
File Most recent revision changing the file Date of that revision
==== ====================================== =====================
foo.py 44159adb0312 2018-09-16 12:24
... ... ...
This is just like github does it on the "Code" overview page. (screenshot from torvalds/linux):
"Most recent" could refer the date or to the DAG hierarchy relative to the current changeset, or maybe to the current branch. Perhaps the latter is more useful, but in my particular use case, it doesn't make a difference.
I'd also like to be able to provide a list of files or a subdirectory for which I want the table. (I don't necessarily want it for everything)
I am aware that I could do it using a small script, looping over hg log -l 1 <file>, but I was wondering if there is a more efficient / more natural solution.
You won't get around looping over all files. Yet with hg manifest you get that list of files. Then template the output as needed:
for f in $(hg ma); do hg log -l1 $f -T"$f\t\t{rev}:{node|short}\t\t{date|isodate}"; done
This gives output like
.hgignore 38289:f9c426385853 2018-06-09 13:34 +0900
.hgsigs 38289:f9c426385853 2018-06-09 13:34 +0900
.hgtags 38289:f9c426385853 2018-06-09 13:34 +0900
You might want to twiddle more with the output formatting. See the mercurial wiki for a complete overview of output templating.
Git will follow the commit DAG, because that's all it has. In Mercurial, you have (many) more options because you have more data.
Probably the ideal option here is follow(file, .) (combined with first or last as appropriate). But as hg help revset will tell you, you have the following options (I've shrunk the list to the obvious applicable ones):
ancestors(set[, depth])
Use this with the set being . to get ancestors of the current commit, for instance, if you want to do DAG-following a la Git. Or, use ::., which is basically the same.
branch(string or set)
Use this with . to get all commits in the current branch. Combine with other restrictors (e.g., parents) to avoid looking at later commits in the current branch if you're not at the tip of the current branch.
file(pattern)
Use this with a glob pattern to find changesets that affect a given file.
filelog(pattern)
Like file but faster, trading off some accuracy for speed (see documentation for further details).
follow([file[, startrev]])
To quote the documentation:
An alias for "::." (ancestors of the working directory's first parent).
If file pattern is specified, the histories of files matching given
pattern in the revision given by startrev are followed, including
copies.
modifies(pattern)
Use this (with any pattern, not just glob) to find changesets that modify some file or directory. I think this is limited to M type modifications, not addition or removal of files, as there is also adds(pattern) and removes(pattern). Use all three, or-ed together, to find any add/modify/remove operations.
first(set, [n])
last(set, [n])
limit(set[, n[, offset]])
Use this to extract a particular entry out of the revset.
When searching forwards (the default), last(follow(file, .)) seems to work nicely to locate the correct revision. As you noted, you have to do this once per file—it will definitely go faster if you write your own Mercurial plug-in to do this without reloading the rest of the system all the time.
Somehow more efficient / more natural solution can be:
create template|style for desired log output (I can't predict, which way will be better for you)
create alias for hg log -l 1 --template ... or hg log -l 1 --style ...
EDIT
A lot later, more correct solution (from recent discoveries) with hg grep
hg grep "." "set:**.py" --files-with-matches -d -q -T"{files % '{file} {date|age}\n'}"
Part of output in test-repo
hggit/__init__.py 7 weeks ago
hggit/git_handler.py 7 weeks ago
hggit/gitdirstate.py 7 weeks ago
…
You have to modify fileset in order to get results only for part of your tree (for all branches) and, maybe, template in order to fulfill your needs.
I didn't have fileset for selecting "files in branch X" just now, I think, it will be something using revs() predicate
"revs(revs, pattern)"
Evaluate set in the specified revisions. If the
revset match multiple revs, this will return file matching pattern in
any of the revision.
because some not published predicates (according to examples, see # "set:revs('wdir()'..." for referencing working directory) can be used for defining revset and I can't discover/predict the correct form for branch predicate

auto-accepting a Mercurial change chunk

I have a very large repo with thousands of files that can regularly get updated by automatic processes that are out of my control (this is for Unity 3D, for what it's worth).
For example, if I upgrade Unity to a new version, it will reimport all textures and maybe add a line in thousands of .meta files that correspond to a new serialized data that didn't exist previously.
Obviously reviewing thousands of files is terrible. Most of the time though, I can quickly identify a particular diff, and would just like to automatically check all the files that have the same diff, commit to get them out of the way, and see what's left: other diffs that I might not know about.
For example I just commited 4000+ files that all contained this diff:
So the pattern would be easy to find:
- textureFormat: -5
+ textureFormat: -1
I suppose I could write a script, or a TortoiseHg tool to do that, I just have no idea where to begin. I'd need to iterate over all changed files/chunks, match a pattern, commit the chunks...
I know of no tool to do exactly what you want. However I believe it's relatively easy to write a small bash script for such or use the command line:
hg diff --nodates --noprefix -U 0 | grep '^+' | grep -v '+++' | sort | uniq -c
will list you the inserted lines of the current diff in descending order of the number of occurences, thus the most frequently occurring diff first.
With that list you get a list of files which match the newly inserted pattern, for instance
hg files "set:grep('^ textureFormat: -1')"
should give you all files with that pattern (whether it's new or not, though). You probably want to check those files, whether their diff contains anything else:
hg diff "set:grep('^ textureFormat: -1')"
Now you can make use of the results and even exclude single files, if the diff output didn't suit you:
hg commit "set:grep('^ textureFormat: -1') and not 'unwantedFilename.cpp'"
In the above commands I made use of the fileset capability and of hg grep which accepts regular expressions. Check hg help grep, hg help fileset and hg help patterns for a more in-depth explanation.

Script to adjust history in an RCS/CVS ,v file

In preparation for a migration to Mercurial, I would like to make some systematic changes to many thousands of ,v files. (I'll be editing copies of the originals, I hasten to add.)
Examples of the sorts of changes I'm after:
For each revision whose message begins with some text that indicates a known username (e.g. [Fred Bloggs]), if the username in the comment matches the Author in the ,v file, then delete the unnecessary username text from the commit message
If the ,v contains a useful description, append it to the commit message for revision 1.1 (cvs2hg ignores the description - but lots of our CVS files actually came from RCS, where it was easy to put the initial commit message into the description field by mistake)
For edits made from certain shared user accounts, adjust the author, depending on the contents of the commit message.
Things I've considered:
Running 'cvs log' on each individual ,v file - parsing the output, and using rcs -m to change this history. Problems with this include:
there doesn't seem to be a way to pass a text file to rcs -m - so if the revision message contained singled and/or or double quotes, or spanned multiple lines, it would be quite a challenge quoting it correctly in the script
I can't see an rcs or cvs facility to change the author name associated with a revision
less importantly, it would be likely to start a huge number of processes - which I think could get slow
Writing Python to parse the ,v file, and adjust the contents. Problems with this include:
we have a mixture of line-endings in our ,v files - including some binary files that should have been text, and vice-versa - so great care would be needed to not corrupt the files
care would be needed for quoting of the # character in any commit messages, if it fell on the start of the line in a multi-line comment
care would also be needed on revisions where the last line of the committed file was changed, and doesn't have a newline - meaning that the ,v has a # at the very end of a line, instead of being preceded by \n
Clone the version of cvs2hg that we are using, and try to adjust its code to make the desired edits in-place
Are there any other approaches that would be less work, or any existing code that implements this kind of functionality?
Your first approach may be the best one. I know that in Perl, handling quotation marks and multiple lines wouldn't be a problem. For example:
my $revision = ...;
my $log_message = ...;
system('rcs', "-m$revision:$log_message", $filename);
where $log_message can contain any arbitrary text. Since the string doesn't go through the shell, newlines and other metacharacters won't be reinterpreted. I'm sure you can do the same thing in Python.
(As for your second approach, I wouldn't expect line endings to be a problem. If you have Unix-style \n endings and Windows-style \r\n endings, you can just treat the trailing \r as part of the line, and everything should stay consistent. I'm making some assumptions here about the layout of ,v files.)
I wrote a Python library, EditRCS (PyPi) that implements the RCS format so the user can load an RCS file as a tree of Python objects, modify it programmatically and save to a new RCS file.
You can apply a function to every revision using mapDeltas(), for example to change an author's name; or walk the tree using getNext() for something more complicated such as joining two file histories together.

How to make 'hg log --verbose' show files on multiple lines?

By default all files changed in a changeset are on the same line, which makes them very easily to skip one or two, and hard to read.
How to make each file show on its own separate line?
The real way to see information about changed files is to use hg status. This shows the files that were modified in revision 100:
$ hg status -c 100
But if you want to have the log messages as well, then hg log is of course a natural starting point. Unfortunately there is no built-in switch that will make it display one file per line.
However, the output of hg log is controlled by a template system and you can write your own styles for it. The default style is here and you can customize to do what you want by changing
file = ' {file}'
to
file = '{file}\n'
Then save the new style as my-default.style and add
[ui]
style = ~/path/to/my-default.style
to your configuration file. This gives you one file per line and it even works when there are spaces in your file names.
I'm aware of one problem: you lose colors in the hg log output. It turns out that Mercurial is cheating here! It doesn't actually use the default template I showed you when generating log output. It doesn't use any template system at all, it just generates the output using direct code since this is faster. The problem is that the color extension only work with the hard-coded template. When you switch to a custom template and thereby invoke the template engine, you lose the color output.
However, you can recover the colors by inserting the ANSI escape codes directly into your template (on Unix-like systems). Changing
changeset = 'changeset: {rev}:{node|short}\n{branches}...
to
changeset = '\033[33mchangeset: {rev}:{node|short}\033[0m\n{branches}...
does the trick and hard-codes a yellow header line for the changeset. Adjust the changeset_verbose and changeset_quiet lines as well and you'll have colored output with your own template.
The hg template help file has this gem in its examples.
Format lists, e.g. files:
$ hg log -r 0 --template "files:\n{files % ' {file}\n'}"
This works on Windows without any translation.
I believe there is no built-in way to achieve this, but a bit of sed (also available for Windows: http://gnuwin32.sourceforge.net/packages/sed.htm) can help:
hg log --template "Rev: {rev}:{node}\nDate: {date|isodate}\nFiles: {files}\n\n" -l 10 | sed -e '/^Files:/s/ /\n /g'
Output:
Rev: 1:2538bd4661c755ccab9b68e1d5e91144f6f97d33
Date: 2011-12-20 15:47 +0100
Files:
test1.txt
Rev: 2:853a6f3c505308c9babff5a5a2f1e09303f1689c
Date: 2011-12-20 15:44 +0100
Files:
test2.txt
test3.txt
Explanation of sed -e '/^Files:/s/ /\n /g':
/^Files:/ searches for lines starting woth "Files:", and applies the following search and replace just to those line
s/ /\n /g replaces all lines with a newline, followed by the spaces.
This solution won't work when file names contain spaces.
To see all files that changed in a revision, use:
hg status --change REV
hg log --style changelog
OR
hg log --template "Description: {desc}\n" - supported keywords like desc, files etc. are listed here

DIFF utility works for 2 files. How to compare more than 2 files at a time?

So the utility Diff works just like I want for 2 files, but I have a project that requires comparisons with more than 2 files at a time, maybe up to 10 at a time. This requires having all those files side by side to each other as well. My research has not really turned up anything, vimdiff seems to be the best so far with the ability to compare 4 at a time.
My question: Is there any utility to compare more than 2 files at a time, or a way to hack diff/vimdiff so it can do multiple comparisons? The files I will be comparing are relatively short so it should not be too slow.
Displaying 10 files side-by-side and highlighting differences can be easily done with Diffuse. Simply specify all files on the command line like this:
diffuse 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt 10.txt
Vim can already do this:
vim -d file1 file2 file3
But you're normally limited to 4 files. You can change that by modifying a single line in Vim's source, however. The constant DB_COUNT defines the maximum number of diffed files, and it's defined towards the top of diff.c in versions 6.x and earlier, or about two thirds of the way down structs.h in versions 7.0 and up.
diff has built-in option --from-file and --to-file, which compares one operand to all others.
--from-file=FILE1
Compare FILE1 to all operands. FILE1 can be a directory.
--to-file=FILE2
Compare all operands to FILE2. FILE2 can be a directory.
Note: argument name --to-file is optional.
e.g.
# this will compare foo with bar, then foo with baz .html files
$ diff --from-file foo.html bar.html baz.html
# this will compare src/base-main.js with all .js files in git repo,
# that has 'main' in their filename or path
$ git ls-files :/*main*.js | xargs diff -u --from-file src/base-main.js
Checkout "Beyond Compare": http://www.scootersoftware.com/
It lets you compare entire directories of files, and it looks like it runs on Linux too.
if your running multiple diff's based off one file you could probably try writing a script that has a for loop to run through each directory and run the diff. Although it wouldn't be side by side you could at least compare them quickly. hope that helped.
Not answering the main question, but here's something similar to what Benjamin Neil has suggested but diffing all files:
Store the filenames in an array, then loop over the combinations of size two and diff (or do whatever you want).
files=($(ls -d /path/of/files/some-prefix.*)) # Array of files to compare
max=${#files[#]} # Take the length of that array
for ((idxA=0; idxA<max; idxA++)); do # iterate idxA from 0 to length
for ((idxB=idxA + 1; idxB<max; idxB++)); do # iterate idxB + 1 from idxA to length
echo "A: ${files[$idxA]}; B: ${files[$idxB]}" # Do whatever you're here for.
done
done
Derived from #charles-duffy's answer: https://stackoverflow.com/a/46719215/1160428
There is a simple an good way to do this = GREP.
Depending on the size of the text you can copy and paste it, or you can redirect the input of the file to the grep command. If you make a grep -vir /path to make a reverse search or a grep -ir /path. This is my way for certification exams.