How to get the revision count for file in Mercurial - mercurial

Using templates, I want to find out how many times a file has been revised across all changesets. So, put another way, how many changesets feature that file.
Is there a way to do it? And can it be done with the Keywords extension?
And yes, I realise it's not really what Mercurial is about. I have sucky requirements:)

hg log -q filename | wc -l will output amount of changesets

It is a normal feature of an VCS to track, when a file was changed, just run hg log THE_FILENAME to see all changesets which affect one specific file.
To count them, run for example hg log THE_FILENAME | grep -c "^changeset".

I thought I'd just add one more option to the list here since grep and wc (word count) may not be available in your console (Windows users especially). There is an equivalent functionality in PowerShell:
hg log -q filename | Measure-Object
This will return the count by default (and as you can see there are other options you can play with using Measure-Object)
Count : 14
Average :
Sum :
Maximum :
Minimum :
Property :
And if you are interested in how many commits you have done for the entire repository you can omit the -q filename parameter:
hg log | Measure-Object
Count : 492
Average :
Sum :
Maximum :
Minimum :
Property :

Related

Find out the email of a user?

Is it possible to find out the email of a certain user.
I tried:
hg log --user sherman
but that only told me all the changes that sherman made and didn't tell me his email.
It's possible that the user hasn't provided an email at all when committing, but if they have this might help although it'll not give you the result for a single user.
hg churn -c --template "{author|person} - {author|email}"
Should give you a list of all the authors in the format
username - email_address
It'll also give you the number of commits they've made.
Another option that'll give you similar output (in Powershell) without the churn extension is:
hg log --template "{author|person} - {author|email}\n" | Sort-Object -Unique
I believe the Linux equivalent is something like:
hg log --template "{author|person} - {author|email}\n" | sort | uniq

Mercurial: Most recent change per file

I'm looking for a way to make Mercurial output a table like this:
File Most recent revision changing the file Date of that revision
==== ====================================== =====================
foo.py 44159adb0312 2018-09-16 12:24
... ... ...
This is just like github does it on the "Code" overview page. (screenshot from torvalds/linux):
"Most recent" could refer the date or to the DAG hierarchy relative to the current changeset, or maybe to the current branch. Perhaps the latter is more useful, but in my particular use case, it doesn't make a difference.
I'd also like to be able to provide a list of files or a subdirectory for which I want the table. (I don't necessarily want it for everything)
I am aware that I could do it using a small script, looping over hg log -l 1 <file>, but I was wondering if there is a more efficient / more natural solution.
You won't get around looping over all files. Yet with hg manifest you get that list of files. Then template the output as needed:
for f in $(hg ma); do hg log -l1 $f -T"$f\t\t{rev}:{node|short}\t\t{date|isodate}"; done
This gives output like
.hgignore 38289:f9c426385853 2018-06-09 13:34 +0900
.hgsigs 38289:f9c426385853 2018-06-09 13:34 +0900
.hgtags 38289:f9c426385853 2018-06-09 13:34 +0900
You might want to twiddle more with the output formatting. See the mercurial wiki for a complete overview of output templating.
Git will follow the commit DAG, because that's all it has. In Mercurial, you have (many) more options because you have more data.
Probably the ideal option here is follow(file, .) (combined with first or last as appropriate). But as hg help revset will tell you, you have the following options (I've shrunk the list to the obvious applicable ones):
ancestors(set[, depth])
Use this with the set being . to get ancestors of the current commit, for instance, if you want to do DAG-following a la Git. Or, use ::., which is basically the same.
branch(string or set)
Use this with . to get all commits in the current branch. Combine with other restrictors (e.g., parents) to avoid looking at later commits in the current branch if you're not at the tip of the current branch.
file(pattern)
Use this with a glob pattern to find changesets that affect a given file.
filelog(pattern)
Like file but faster, trading off some accuracy for speed (see documentation for further details).
follow([file[, startrev]])
To quote the documentation:
An alias for "::." (ancestors of the working directory's first parent).
If file pattern is specified, the histories of files matching given
pattern in the revision given by startrev are followed, including
copies.
modifies(pattern)
Use this (with any pattern, not just glob) to find changesets that modify some file or directory. I think this is limited to M type modifications, not addition or removal of files, as there is also adds(pattern) and removes(pattern). Use all three, or-ed together, to find any add/modify/remove operations.
first(set, [n])
last(set, [n])
limit(set[, n[, offset]])
Use this to extract a particular entry out of the revset.
When searching forwards (the default), last(follow(file, .)) seems to work nicely to locate the correct revision. As you noted, you have to do this once per file—it will definitely go faster if you write your own Mercurial plug-in to do this without reloading the rest of the system all the time.
Somehow more efficient / more natural solution can be:
create template|style for desired log output (I can't predict, which way will be better for you)
create alias for hg log -l 1 --template ... or hg log -l 1 --style ...
EDIT
A lot later, more correct solution (from recent discoveries) with hg grep
hg grep "." "set:**.py" --files-with-matches -d -q -T"{files % '{file} {date|age}\n'}"
Part of output in test-repo
hggit/__init__.py 7 weeks ago
hggit/git_handler.py 7 weeks ago
hggit/gitdirstate.py 7 weeks ago
…
You have to modify fileset in order to get results only for part of your tree (for all branches) and, maybe, template in order to fulfill your needs.
I didn't have fileset for selecting "files in branch X" just now, I think, it will be something using revs() predicate
"revs(revs, pattern)"
Evaluate set in the specified revisions. If the
revset match multiple revs, this will return file matching pattern in
any of the revision.
because some not published predicates (according to examples, see # "set:revs('wdir()'..." for referencing working directory) can be used for defining revset and I can't discover/predict the correct form for branch predicate

How can I see changesets that remove a match *in a specific branch*?

As hg help grep says:
By default, grep prints the most recent revision number for each file in
which it finds a match. To get it to print every revision that contains a
change in match status ("-" for a match that becomes a non-match, or "+"
for a non-match that becomes a match), use the --all flag.
This works as advertised: When I run hg grep --all pattern, I get a list of hits marked with :+: or :-::
plaintext.py:8055:+: ...
plaintext.py:4690:-: ...
otherfile.py:4690:-: ...
plaintext.py:4630:+: ...
plaintext.py:4630:+: ...
The problem is when I try to restrict the search to a branch or revset:
hg grep --all -r 'branch(default)' pattern
The above will no longer print the revisions in which there is a change of status. Lots of revisions that match are printed (not just the most recent or most ancient one), and many revisions that removed a match (marked with :-:) are no longer printed. (Some :-:-revisions are still printed; I don't understand when this happens.)
This seems like it could be a bug, but what do I know. I'm using mercurial 4.2 (on OS X).
I could live with filtering the output of unrestricted hg grep --all; but the default format does not include the branch (and I do know know enough to write a template that includes all the current information plus the branch).
I think you're running into this bug:
https://bz.mercurial-scm.org/show_bug.cgi?id=3885
Unfortunately "hg grep" is notoriously buggy and likely needs lots of work to get into a place where it's more usable.

auto-accepting a Mercurial change chunk

I have a very large repo with thousands of files that can regularly get updated by automatic processes that are out of my control (this is for Unity 3D, for what it's worth).
For example, if I upgrade Unity to a new version, it will reimport all textures and maybe add a line in thousands of .meta files that correspond to a new serialized data that didn't exist previously.
Obviously reviewing thousands of files is terrible. Most of the time though, I can quickly identify a particular diff, and would just like to automatically check all the files that have the same diff, commit to get them out of the way, and see what's left: other diffs that I might not know about.
For example I just commited 4000+ files that all contained this diff:
So the pattern would be easy to find:
- textureFormat: -5
+ textureFormat: -1
I suppose I could write a script, or a TortoiseHg tool to do that, I just have no idea where to begin. I'd need to iterate over all changed files/chunks, match a pattern, commit the chunks...
I know of no tool to do exactly what you want. However I believe it's relatively easy to write a small bash script for such or use the command line:
hg diff --nodates --noprefix -U 0 | grep '^+' | grep -v '+++' | sort | uniq -c
will list you the inserted lines of the current diff in descending order of the number of occurences, thus the most frequently occurring diff first.
With that list you get a list of files which match the newly inserted pattern, for instance
hg files "set:grep('^ textureFormat: -1')"
should give you all files with that pattern (whether it's new or not, though). You probably want to check those files, whether their diff contains anything else:
hg diff "set:grep('^ textureFormat: -1')"
Now you can make use of the results and even exclude single files, if the diff output didn't suit you:
hg commit "set:grep('^ textureFormat: -1') and not 'unwantedFilename.cpp'"
In the above commands I made use of the fileset capability and of hg grep which accepts regular expressions. Check hg help grep, hg help fileset and hg help patterns for a more in-depth explanation.

Counting changed lines of code over time in a repository

Is there a way to obtain the number of changed lines of code over a certain time period in a mercurial repository? Something along the lines of what statsvn does would be great, but anything counting the number of changed lines of code within 6 months will do (including a clever combination of arguments to hg log).
The hg churn extension is what you want.
You can get visual results with hg activity or hg chart.
Edit: hg diff and hg log both support a --stat option that can do this for you, only better and quicker.
I made an alias called lines to count changed lines (not necessarily lines of code) for me. Try putting this alias in your .hgrc file:
[alias]
lines = !echo `hg log -pr $# | grep "^+" | wc -l` Additions; echo `hg log -pr $# | grep "^-" | wc -l` Deletions;
Then pass it the revision first, followed by any optional arguments:
hg lines tip or hg lines 123:456 -u brian
Sometimes you want to know the number of lines changed excluding whitespace-only changes. This requires using diff -w underneath instead of log -p. I set up a linesw alias for this:
#ignore whitespace
linesw = ![[ $1 =~ : ]] && r=$1 || r="$1~1:$1"; echo `hg diff -wr $r | grep "^+\([^+]\|$\)" | wc -l` Additions; echo `hg diff -wr $r | grep "^-\([^-]\|$\)" | wc -l` Deletions;
hg linesw tip or hg lines 123:456
Note they behave slightly differently because diff and log behave differently -- for example, log will take a --user parameter while diff will not, and when passing a range, log will show changes commited in the first revision given in the range, while diff will not.
This has only been tested using bash.
I needed to do this, and spent quite a bit of time with the hg churn extension and similar solutions.
In the end, I found that what worked best for me was CLOC (Count Lines of Code): http://cloc.sourceforge.net/
You can give it two folders containing two versions of a project, and it will count all of the lines that are the same, modified, added, removed. It recognises multiple languages and itemises code, comments and blank lines.
To use it, I pulled out the two versions of my code from Hg into two parallel folders, and then used cloc --diff --ignore-whitespace