Finding Merges in the Wrong Direction - mercurial

We have three major releases a year, with co-responding branches: e.g. 2013A, 2013B, and 2013C. When we create each branch, it is started from default. Changes in each branch should only be merged forward, e.g. 2013A -> 2013B -> 2013C -> default. We have a push hook on the server that checks if a pushed merge is in the wrong direction, i.e. default -> 2013C, 2013C -> 2013B, etc.
We also have team-specific branches, some of which are working on features for a release, and others which are working on the next release, e.g. default. While a team is working on a release, they merge to/from the release branch. When the team is ready to work on the next release, they start to merge to/from the default branch.
The other day we had a situation where a new developer merged default into his team branch before the team was ready to move on to the next release, then merged the team branch into a previous release, i.e. default -> TeamBranch -> 2013B. Unfortunately, our hook didn't take this situation into account.
Essentially, this is what happened:
2013B A---o---o---o---o---B---o
/ \ / \
Team / o---o---o---C---o---o
/ / \
Default D---o---o---o---o---o---o---o---o
A = Creation of 2013B branch
B = Merge into release branch
C = The bad merge. We want to detect and prevent these whenever we merge into a release branch.
D = The first common ancestor of the release branch and default.
So, I've re-written our hook to check that when a change is merged into a release branch, it doesn't merge backward. For each merge into a release branch, I check that there aren't any ancestor merges from a forward branch. Here's the revset query I'm using:
> hg log -r "limit(descendants(p1(first(branch('2013B')))) and reverse(p2(ancestors(branch('2013B'))) and branch('default')),1)"
This works. However, we have a large repository (111,000+ changesets,) and the check takes 30 to 40 seconds. I wanted to know if there is a quicker/faster/more efficient way of writing my revset query, or another approach I'm not seeing.

I sent this same question to the Mercurial mailing list and received an answer. The branch() query is the performance bottleneck. It causes Mercurial to unroll all the changesets on a branch. Mercurial doesn't cache the results of this, so each call will unroll changesets.
Instead of using branch() I switched to using descendants() and ancestors():
limit(children(p2(2013BBaseline:: and ::2013B and merge()) and branch(default)) and reverse(::2013B))
p2(2013BBaseline:: and ::2013B and merge()) and branch(default) grabs the second parent (the incoming branch) for all merges between the start of the 2013B branch and its head, and returns just those on the default branch. [1]
The clause above is then wrapped with children() to go back down to the children of that parent.
and reverse(::2013B) then gets the children that are ancestors of the 2013B branch, i.e. the bad merge(s).
limit() then returns just the first of those bad merges.
The query above takes about 1.5 seconds.
Thanks to Matt Mackall for suggesting the solution.
2013BBaseline is a tag which identifies the changeset in the default branch from which 2013B branch was created, otherwise I would have had to replace 2013BBaseline:: with:
p1(first(branch(2013B)))::
to discover the release branch's baseline, and that is not very performant.

To detect the merge C, you should be able to use
$ hg log -r "parents(branch(Team) and merge()) and branch(default)"
This gives you changesets on default that were merged into Team. If there are any, then someone merged into their team branch by mistake. I think you need to attack this from the point of view of the team branch: you cannot forbid merges to/from the release branch since some of them will be legitimate.
If your team branches follow a consistent naming scheme (they should), then you can use a regular expression with the branch() predicate to select them. Something like
$ hg log -r "branch('re:team-.*')"
would match branches starting with team-.

Related

What are all of Mercurial's built in commit identifiers?

I'm looking for easy ways to move around to different commits, sometimes within a branch (and not necessarily from the latest commit). For example, I'd want a way to always get to the previous commit:
# move to commit before current commit
hg checkout -r ~.1
or move to the top of the branch
hg checkout tip
But I can't figure out things like how to move to the next commit (i.e. the one above the current commit, the negation of ~.1). hg seems to have built in ways of referencing these things (e.g. tip (latest commit), . (current commit), and .~N (N-th previous commit)), but are there any others?
You have to re-read hg help revsets carefully and a) build (if needed) b) use these revsets in hg commands
If you want to use "~" notation, you have to use proper format of revset hg log -r ".~1" for immediate parent and remember "only 1-st parent is evaluated" (mergesets, f.e, have two parents)
"x~n"
The nth first ancestor of x
Top of named branch (branch head) isn't tip (tip - ltest commit in repo, can be in another branch), but branchname per se for hg up
With "x~n" revsets you can use negative numbers also: for n < 0, the nth unambiguous descendant of x.

How to update to a branch by name when there's a hash collision?

So my coworker just won the hash lottery. We create a feature branch for every ticket we resolve, following the convention b##### where ##### is the issue number.
The trouble is that when he tried updating to that branch (before it existed) via hg up branch(b29477), it took him to default instead of saying that the branch doesn't exist.
It turns out that branch(b29477) actually returns the name of branch of the thing inside the parens (instead of forcing Mercurial to evaluate the thing inside the parens as a branch name as I thought!), and there so happened to be a changeset beginning with b29477 which was on default, so instead of saying the branch didn't exist, it took him to the tip of default!
Now we can work around this problem by choosing a different branch name, but I want to know if there's any way to hg update <branch_name_and_dont_interpret_this_as_anything_else>?
BTW, hg log also lies about what it's --branch parameter does. It says:
-b --branch BRANCH [+] show changesets within the given named branch
But that's not true at all. Go ahead and run it with a hash. e.g.,
hg log --branch eea844fb
And it will turn up results. If you dig through the docs, you'll discover that it's actually the same as:
hg log -r 'branch(eea844fb)'
Try this:
hg update -r "branch('literal:b29477')"
From the Mercurial help page:
branch(string or set)
All changesets belonging to the given branch or
the branches of the given changesets.
If string starts with re:, the remainder of the name is treated as a
regular expression. To match a branch that actually starts with re:,
use the prefix literal:.
This means that if you use the literal prefix, you are specifying a string. And a string is not a set.
As the text says, if you specify a changeset, Mercurial will show:
the branches of the given changesets

Mercurial diff including first changeset

I have recently encountered the need to generate a Mercurial diff of all changes up to a particular changeset which includes the first changeset of the repo. I realize this kind of stretches the definition of a diff, but this is for uploading a new project to a code review tool.
Let's assume the following changesets:
p83jdps99shjhwop8 - second feature 12:00 PM
hs7783909dnns9097 - first feature - 11:00 AM
a299sdnnas78s9923 - original app setup - 10:00 AM
If I need a "diff" of all changes that have been committed, the only way that I can seem to achieve this is with the following diff command...
diff -r 00:p83jdps99shjhwop8
In this case the first changeset in the argument param (here - 00) takes the regexp form of 0[0]+
This seems to be exactly what we need based on a few tests, but I have had trouble tracking down documentation on this scenario (maybe I just can't devise the right Google query). As a result, I am unsure if this will work universally, or if it happens to be specific to my setup or the repos I have tested by chance.
Is there a suggested way to achieve what I am trying to accomplish? If not, is what I described above documented anywhere?
It appears this actually is documented, but you need to do some digging...
https://www.mercurial-scm.org/wiki/ChangeSetID
https://www.mercurial-scm.org/wiki/Nodeid
So the special nodeid you're referring to is the 'nullid'.
2 digits may not be adequate to identify the nullid as such (as it may be ambiguous if other hashes start with 2 zeros), so you may be better off specifying 4 0's or more.
Eg: hg diff -r 00:<hash of initial add changeset> has resulted in the abort: 00changelog.i#00: ambiguous identifier! error.
I'm a little confused about what you need. The diff between an empty repository and the revision tip is just the content of every file at tip-- in other words, it's the state of your project at tip. In diff format, that'll consist exclusively of + lines.
Anyway, if you want a way to refer to the initial state of the repository, the documented notation for it is null (see hg help revisions). So, to get a diff between the initial (empty) state and the state of your repository at tip, you'd just say
hg diff -r null -r tip
But hg diff gives you a diff between two points in your revision graph. So this will only give you the ancestors of tip: If there are branches (named or unnamed) that have not been merged to an ancestor of tip, you will not see them.
3--6
/
0--1--2--5--7 (tip)
\ /
4
In the above example, the range from null to 7 does not include revisions 3 and 6.

Mercurial commit disappeared

We have switched to Mercurial recently. All had been going well until we had two incidents of committed changes going missing. Examining the logs has not made us any wiser.
Below is an example. The files committed at (1) revert to a previous state at (2) even though those files are not mentioned in the merge.
What can I check to understand why the files reverted?
There are three interesting changesets in this graph that can influence the (2) merge:
Teal changeset: not shown, but looks like it's just below the graph. This is the first parent of (2)
Blue changeset: number five from the bottom, labelled "Fix test". This is the second parent of (2).
Common ancestor of the parents: also not shown, will be further below. Strangely, it looks like the teal changeset could be the common ancestor, but Mercurial will now allow you to make such a degenerate merge under normal circumstances.
When Mercurial does a merge, these are the only three changesets that matter: the two heads you merge and their common ancestor. In a three-way merge the logic is now:
ancestor parent1 parent2 => merge
X X Y Y (clean)
X Y X Y (clean)
X Y Y Y (clean)
X Y Z W (conflict)
Read the table like this: "if the ancestor was X, and the first parent was also X and the second parent was Y, then the merge will contain Y". In other words: a three-way merge favors change and will let a modification win.
You can find the ancestor with
$ hg log -r "ancestor(p1(changeset-2), p2(changeset-2))"
where changeset-2 is the one marked with (2) above. When you say
The files committed at (1) revert to a previous state at (2) even though those files are not mentioned in the merge.
then it's important to understand that "a merge" is just a snapshot that shows how to mix two other changesets. The change made "in" a merge is the difference between this snapshot and its two parent changesets:
$ hg status --rev "p1(changeset-2):changeset-2"
$ hg status --rev "p2(changeset-2):changeset-2"
This shows how the merge changeset is different from its first and second parent, respectively. I'm sure the files are mentioned in one of those lists — unless the merge isn't the culprit after all.
When you examine the three changesets and the differences between them, then you will probably see that someone has to resolve a conflict (the fourth line in the merge table above) and picked the wrong file at some step along the way.
The merge at 2 is between a very old branch (dark blue, forked from the mainline/green branch just after commit 1) and an even older branch (light blue, hasn't been in sync with mainline since before commit 1)
It seems likely that the merge at 2 picked the wrong version of the file - can't tell from here if that was the tool picking the wrong version of the file, or the user manually selecting the wrong version.
Edited to add:
To help track down exactly what changed at 2, you can use hg diff -r REV1 -r REV2, which will show you the line-by-line differences between any two revisions.
When you know that the badness was introduced sometime between point 1 and point 2, hg bisect may help you track down the exact source of the badness:
hg bisect [-gbsr] [-U] [-c CMD] [REV]
subdivision search of changesets
This command helps to find changesets which introduce problems. To use,
mark the earliest changeset you know exhibits the problem as bad, then mark
the latest changeset which is free from the problem as good.
Bisect will update your working directory to a revision for testing
(unless the -U/--noupdate option is specified). Once you have
performed tests, mark the working directory as good or bad, and bisect
will either update to another candidate changeset or announce that it
has found the bad revision.

Mercurial: Picking commits to be included in a release

I have a local mercurial repository (for now) within which I have already made several commits, each commit is a self contained bug fix. Is it possible to pick which of the bug fixes (commits) I want to be included when it is time to build a release version of my application.
To elaborate, assuming A, B, C, D, and E are commits I have already done to my repository and each of them relates to a bug fix like so:
A <- B <- C <- D <- E <- working dir
I need to be able to for example pick which of the bug fixes will go into the release version (this depends on the time allocated for deployment as well as testing outcomes). So for example I might get a report saying the release should only contain bug fixes A, C and D.
Is it possible to construct a release version containing only the A, C and D commits (Keeping in mind that each commit is self contained and does not depend on the other commits to actually be there)?
Probably having a branch for each bug fix and then merging into a release branch is the easiest way to accomplish this (or is it not?), but the current situation at hand is as described above with no branches.
This isn't the normal work mode of Mercurial (or git). A repository can only contain a changeset if it also contains all of that changeset's ancestors. So you can't get D into a repo without also having A, B, and C in there.
So here's:
What you Should have Done
Control the parentage of your changesets. Don't make C the parent of D just because you happen to have fixed D after C. Before you fix a bug hg update to the previous release.
Imagine A was a release and B, C, and D, were all bug fixes. If you do a loop like this:
foreach bug you have:
hg update A
... fix bug ...
hg commit
hg merge # merges with the "other" head
then you'll end up with a graph like this:
---[A]----[B2]--[C2]--[D2]----
| / / /
+-[B] / /
| / /
+-----[C] /
| /
+---------[D]
and now if you want to create a release with only, say, B and D in it you can do:
hg update B
hg merge D
and that creates a new head that has A + B + D but no C.
Tl;Dr: make a change's parent be as early in history as you can, not whatever happens to be tip at the time.
What you can do Now
That's the ideal, but fortunately it's no big thing. You can never bring exactly D across without bringing C (because C's hash is part of the calculation of D's hash), but you can bring the work that's in D into a new head easily enough. Here are some ways, any of which will work:
hg export / hg import
hg transplant
hg graft (new in 2.0)
hg rebase (only possible if you haven't yet pushed)
Any of those will let you bring that patch/delta that's in D over -- it will have a different hash ID and when some day you merge D in for real (using merge) you'll have duplicate work in two different changesets, but merge will figure it all out.
If this was my tree and it hasn't been pushed anywhere, I'd (assuming an empty patch queue and MQ enabled):
hg qimport -g -r B: # import revisions B and later into mq as "git" style patches
hg qpop -a # unapply them all
hg qpush --move C # Apply changes in C (--move rearranges the order)
hg qpush --move D # Apply changes in D
hg qfin -a # Convert C & D back to changesets
hg push <release server> # Push them out to the release branch
Then you can hg qpush -a; hg qfin -a to get B & E back into changesets.
Final Result:
---A---C---D---B---E
Advantages:
Nobody needs know you didn't do things in this order to start with (evil grin)
You could modify any of the change-sets whilst doing this
Alternatively, with graft in 2.0:
hg update -r A # Goto rev A (no need to do anything special for A)
hg graft C # Graft C on to a new anonymous branch
hg graft D # Graft D
This will give you
---A---B---C---D---E
\
--C'--D' <-You are here
An hg push -r D' should just push the new, cherry-picked, head.
You can then hg merge to get one head again with B and E included.
Advantages:
Non destructive, so true history is kept, and no chance of loss if you muck up
hg tags the new changesets with the hash of the original version, so totally trackable
Probably a little simpler.
While it's somehow strange way and release-policy, you can do it in different form. You have to manipulate with two main objects: changesets and branches
Version 1
You use two branches (default + f.e "release 1.0"). Default branch is mainline of your work - all changesets commited to this branch. At release-time, you branch first needed-for-release changeset into (new) branch, transplant or graft rest of needed in release changesets from default to this branch, head of release 1.0 will be prepared for release this way.
Next release will differ only in new branch name
Version 2
One branch used, MQ extension added. Variations:
all changesets are MQ-pathes and only needed for release are applied to repo
changesets are changesets, only unwanted for release converted to mq-pathes, later qfinish'ed and returned to repo