Using Mercurial, how can I measure individual contributions? - mercurial

My team is using Mercurial, and I would like to know the relative contributions by each team member. I know that we cannot measure productivity by lines of code, but I would like to see if each person at least contributed something, even if it was overwritten by others later. So, I don't just want to see who is responsible for the current version (a la Mercurial annotate), but to do this recursively through all revisions, ideally with output that can be easily visualized or dumped into a spreadsheet.
Any tips?

There's an extension exactly for this, named churn, it is bundled with Mercurial, but not automatically enabled. You can find more information here: ChurnExtension.
In your mercurial.ini file, to the [extensions] section, add the following:
[extensions]
churn=
Then to look at the churn of your repository, just do:
hg churn
This will output something like this (this is for the Noda-Time project):
[C:\Dev\VS.NET\Noda-Time-docs] :hg churn
skeet#pobox.com 296444 *************************************************************************************************************
james.keesey#gmail.com 203877 ***************************************************************************
James Keesey 80466 ******************************
dmitry.bulavin#gmail.com 25552 *********
Dmitry Bullavin 17657 ******
martinho.fernandes#gmail.com 16325 ******
Dmitry Bulavin 4273 **
james.keesey 2650 *
matt.scharley 768
configurator 450
lasse#vkarlsen.no 64
TeamCity#Nordrassil 2

Churn does the job, but note that it if a user moves files a lot, he will have huge amounts of changed lines. I just did a test, here are the results:
C:\Projects\personal\test>hg churn
darius.damalakas#gmail.com 10 *****************************************
C:\Projects\personal\test>hg mv a.a b.b
moving a.a to b.b
C:\Projects\personal\test>hg commit -m "moving 10 lines to another location"
b.b
committed changeset 1:c54200557152
C:\Projects\personal\test>hg churn
darius.damalakas#gmail.com 30 *****************************************
Note that I have only created 10 lines, but for moving a file I got 20 line changes. That does not convey a good picture.

Related

Mercurial: Most recent change per file

I'm looking for a way to make Mercurial output a table like this:
File Most recent revision changing the file Date of that revision
==== ====================================== =====================
foo.py 44159adb0312 2018-09-16 12:24
... ... ...
This is just like github does it on the "Code" overview page. (screenshot from torvalds/linux):
"Most recent" could refer the date or to the DAG hierarchy relative to the current changeset, or maybe to the current branch. Perhaps the latter is more useful, but in my particular use case, it doesn't make a difference.
I'd also like to be able to provide a list of files or a subdirectory for which I want the table. (I don't necessarily want it for everything)
I am aware that I could do it using a small script, looping over hg log -l 1 <file>, but I was wondering if there is a more efficient / more natural solution.
You won't get around looping over all files. Yet with hg manifest you get that list of files. Then template the output as needed:
for f in $(hg ma); do hg log -l1 $f -T"$f\t\t{rev}:{node|short}\t\t{date|isodate}"; done
This gives output like
.hgignore 38289:f9c426385853 2018-06-09 13:34 +0900
.hgsigs 38289:f9c426385853 2018-06-09 13:34 +0900
.hgtags 38289:f9c426385853 2018-06-09 13:34 +0900
You might want to twiddle more with the output formatting. See the mercurial wiki for a complete overview of output templating.
Git will follow the commit DAG, because that's all it has. In Mercurial, you have (many) more options because you have more data.
Probably the ideal option here is follow(file, .) (combined with first or last as appropriate). But as hg help revset will tell you, you have the following options (I've shrunk the list to the obvious applicable ones):
ancestors(set[, depth])
Use this with the set being . to get ancestors of the current commit, for instance, if you want to do DAG-following a la Git. Or, use ::., which is basically the same.
branch(string or set)
Use this with . to get all commits in the current branch. Combine with other restrictors (e.g., parents) to avoid looking at later commits in the current branch if you're not at the tip of the current branch.
file(pattern)
Use this with a glob pattern to find changesets that affect a given file.
filelog(pattern)
Like file but faster, trading off some accuracy for speed (see documentation for further details).
follow([file[, startrev]])
To quote the documentation:
An alias for "::." (ancestors of the working directory's first parent).
If file pattern is specified, the histories of files matching given
pattern in the revision given by startrev are followed, including
copies.
modifies(pattern)
Use this (with any pattern, not just glob) to find changesets that modify some file or directory. I think this is limited to M type modifications, not addition or removal of files, as there is also adds(pattern) and removes(pattern). Use all three, or-ed together, to find any add/modify/remove operations.
first(set, [n])
last(set, [n])
limit(set[, n[, offset]])
Use this to extract a particular entry out of the revset.
When searching forwards (the default), last(follow(file, .)) seems to work nicely to locate the correct revision. As you noted, you have to do this once per file—it will definitely go faster if you write your own Mercurial plug-in to do this without reloading the rest of the system all the time.
Somehow more efficient / more natural solution can be:
create template|style for desired log output (I can't predict, which way will be better for you)
create alias for hg log -l 1 --template ... or hg log -l 1 --style ...
EDIT
A lot later, more correct solution (from recent discoveries) with hg grep
hg grep "." "set:**.py" --files-with-matches -d -q -T"{files % '{file} {date|age}\n'}"
Part of output in test-repo
hggit/__init__.py 7 weeks ago
hggit/git_handler.py 7 weeks ago
hggit/gitdirstate.py 7 weeks ago
…
You have to modify fileset in order to get results only for part of your tree (for all branches) and, maybe, template in order to fulfill your needs.
I didn't have fileset for selecting "files in branch X" just now, I think, it will be something using revs() predicate
"revs(revs, pattern)"
Evaluate set in the specified revisions. If the
revset match multiple revs, this will return file matching pattern in
any of the revision.
because some not published predicates (according to examples, see # "set:revs('wdir()'..." for referencing working directory) can be used for defining revset and I can't discover/predict the correct form for branch predicate

Condensing a mercurial repository - recommanded way?

Let's say I have a repository 'Main', and Max and co work on a clone each. Max has some local commits ('f'&'g') that are not yet pushed to 'Main'. This is how it looks now (pipes being pushs/pulls):
A--B1--B2--C--D1--D2--D3--E (Main)
| | | |
A--B1--B2--C--D1--D2--D3--E--f--g (Max)
'B1' and 'B2' as well as 'D1', 'D2' and 'D3' are changes that only make sense together. We would like to combine 'B1' and 'B2' to a single changeset 'B' and combine 'D1', 'D2' and 'D3' to a single changeset 'D'. The new structure should look like this:
A--B--C--D--E (Main)
| | |
A--B--C--D--E--f--g (Max)
My (main) question is: What is the reccommended way of doing this?
Now let's make things worse:
We have a branch that was merged within the change-sets that we want to collapse. It would look like this:
A--B1--B2--C--D1--D2------D4--E (Main)
| | \-------D3-/ |
| | |
A--B1--B2--C--D1--D2------D4--E--f--g (Max)
\-------D3-/
The new history should look like this:
A--B--C--D--E (Main)
| | |
A--B--C--D--E--f--g (Max)
How would you do that?
Thanks in advance.
It depends on how much effort you want to put into this. While I don't know a solution within Mercurial itself (I only know history editing functions which can't cope with merges), Git does have the functionality you need:
If I would really have to do such an operation, I would
Try to convince the management that this is not worth it
Try harder to convince the management that this is not worth it
Make a backup! The following steps involve destructive operations, so consider this as not optional. You have been warned.
exort the repo with hg-git into a git repository
export the complete (git) history into a fast-import-stream with git fastexport --no-data --all > history.fi
Create a Pseudohistory by editing history.fi, dropping your unwanted revisions
import the adjusted history into the git repo with ``git fast-import -f < history.fi`
check extensively if the newly created history is in fact the way you want it to have
clone Max into a local work repository
Remove successors of commmit A in the local work repository
pull your updated history back from git (again with hg-git) into the local work repository
check, if the Mercurial history matches your expectation (diffs of commits between the new and old repos, metadata (time stamps, committer names, ...)
Remove successors of commmit A in every repo (Main, Max and every developer clone)
hg push -r E Main the partial history back to Main out of the work repository
hg push -r g Max the complete history back to Max out of the work repository

How to find changes to a section of code

So there is a bug in a bit of code I wrote a long while back. When I went to look into it, it had all been changed! I don't know which colleague changed it. I don't know when it was changed. This file has been changed many, many times. I'm not concerned with everytime this file has a commit. I definitely don't want too look through all 100 commits this file has been in just to find which commits changed this area of code.
I want to find all of the commits that affected file xyz.txt between lines 250 and 300.
Better yet, I want to find all of the commits that affected the function doStuff() in file xyz.txt.
Is that possible?
As torek said, hg blame will do the job.
To filter lines between 250 and 300 you could do:
hg blame -ucd xyz.txt | cat -n | sed -n 250,300p
-u: Show user
-c: Show changeset
-d: Show date

Mercurial diff including first changeset

I have recently encountered the need to generate a Mercurial diff of all changes up to a particular changeset which includes the first changeset of the repo. I realize this kind of stretches the definition of a diff, but this is for uploading a new project to a code review tool.
Let's assume the following changesets:
p83jdps99shjhwop8 - second feature 12:00 PM
hs7783909dnns9097 - first feature - 11:00 AM
a299sdnnas78s9923 - original app setup - 10:00 AM
If I need a "diff" of all changes that have been committed, the only way that I can seem to achieve this is with the following diff command...
diff -r 00:p83jdps99shjhwop8
In this case the first changeset in the argument param (here - 00) takes the regexp form of 0[0]+
This seems to be exactly what we need based on a few tests, but I have had trouble tracking down documentation on this scenario (maybe I just can't devise the right Google query). As a result, I am unsure if this will work universally, or if it happens to be specific to my setup or the repos I have tested by chance.
Is there a suggested way to achieve what I am trying to accomplish? If not, is what I described above documented anywhere?
It appears this actually is documented, but you need to do some digging...
https://www.mercurial-scm.org/wiki/ChangeSetID
https://www.mercurial-scm.org/wiki/Nodeid
So the special nodeid you're referring to is the 'nullid'.
2 digits may not be adequate to identify the nullid as such (as it may be ambiguous if other hashes start with 2 zeros), so you may be better off specifying 4 0's or more.
Eg: hg diff -r 00:<hash of initial add changeset> has resulted in the abort: 00changelog.i#00: ambiguous identifier! error.
I'm a little confused about what you need. The diff between an empty repository and the revision tip is just the content of every file at tip-- in other words, it's the state of your project at tip. In diff format, that'll consist exclusively of + lines.
Anyway, if you want a way to refer to the initial state of the repository, the documented notation for it is null (see hg help revisions). So, to get a diff between the initial (empty) state and the state of your repository at tip, you'd just say
hg diff -r null -r tip
But hg diff gives you a diff between two points in your revision graph. So this will only give you the ancestors of tip: If there are branches (named or unnamed) that have not been merged to an ancestor of tip, you will not see them.
3--6
/
0--1--2--5--7 (tip)
\ /
4
In the above example, the range from null to 7 does not include revisions 3 and 6.

What tools or techniques are available to "datamine" my mercurial repository?

We have a 2,000,000 lines of code application in Mercurial. Obviously there is a lot of valuable information inside this repository.
Are there any tools or techniques to dig out some of that information?
For instance, over the history of the project, what five files have seen the most changes? What five files are the most different from what they were one year ago? Any particular lines of code seen a lot of churn?
I'm interested in that sort of thing and more.
Is there a way to extract this kind of information from our repository?
I don't know of any tools specifically made for doing this, but Mercurial's log templates are very powerful for getting data out of the system. I've done a bit of this sort of analysis in the past, and my approach was:
Use hg log to dump commits to some convenient format (xml in my case)
Write a script to import the xml into something queryable (database, or just work from the XML directly if it's not too big)
Here's an example hg log command to get you going:
mystyle.txt: (template)
changeset = '<changeset>\n<user>{author|user}</user>\n<date>{date|rfc3339date|escape}</date>\n<files>\n{file_mods}{file_adds}{file_dels}</files>\n<rev>{node}</rev>\n<desc>{desc|strip|escape}</desc>\n<branch>{branches}</branch><diffstat>{diffstat}</diffstat></changeset>\n\n'
file_mod = '<file action="modified">{file_mod|escape}</file>\n'
file_add = '<file action="added">{file_add|escape}</file>\n'
file_del = '<file action="deleted">{file_del|escape}</file>\n'
Example invocation using template and date range:
hg --repository /path/to/repo log -d "2012-01-01 to 2012-06-01" --no-merges --style mystyle.txt
Try the built-in hg churn extension. One thing I like to use it for, for example, is to see a monthly bar graph of commits like this:
> hg churn -csf '%Y-%m'
2014-02 65 *************************************
2014-03 22 *************
2014-04 52 ******************************
2014-05 67 ***************************************
2014-06 31 ******************
2014-07 29 *****************
2014-08 29 *****************
2014-09 61 ***********************************
2014-10 36 *********************
2014-11 23 *************
2014-12 32 ******************
2015-01 60 ***********************************
2015-02 20 ************
(might want to set up aliases if you find you're using the command often enough)