Mercurial - differences merging repository A->B vs B->A? - mercurial

Given repository A and repository B (created off a clone of changeset A2):
A1 - A2 - A3 - A4 - A5
\
B3 - B4
Say we wanted to merge these two repositories together. Is there any difference if we merged repo B into repo A versus merging repo A into repo B?
The only diff I can think of is the merge tool local/base arguments would be reversed depending on which option you chose. Are there any other differences to be aware of?

Generally merges are symmetric, with a few exceptions:
If the merged heads are from different named branches, the order is important. The merge
revision will be on the first parent's branch.
Starting with Mercurial 1.8.4 the same applies to bookmarks, i.e. only the bookmark
of the first parent will move forward to the merge revision.
The order of the merge revision's parents in log and diff views is different, but that
doest not have any noteworthy practical implications (from my experience).

The order of the parents for the merge will be different, but that will only affect which diff you see first when you're looking at the merge changeset. Otherwise, there isn't really any difference if you update to B4 and merge to A5 or vice versa.

I'm no mercurial expert, but after reading this question&resolution I get the feeling that the merge direction can make a huge difference: Backing Out a backwards merge on Mercurial

Related

Mercurial, Get history of branch since last tag - including merged in commits

With Mercurial, I'm trying to get the history of a branch since the last tag.
BUT I want to include all the comments that were merged in as well.
Our devs usually create a branch, do some work, possibly multiple commits, then merge the branch back in.
Using: hg log -b . -r "last(tagged())::" --template "{desc|firstline}\n"
I'll get entries like "Merge" - with no information on what commits were included in that merge.
How do I get it to include the merged commits?
We also have multiple active branches, so just including ALL commits for ALL branches won't work.
There are at least two issues here that seem (after analysis) to stem from using -b .. There may be more issues as well. I will take them in some order, not necessarily the best one, perhaps even the worst one. :-)
Combining -b and last(set) seems unwise in general
Your -b . constraint means you get only commits that are on the current branch. If your revset would otherwise include commits on another branch, those commits will be excluded. Or, to put it another (more set-theoretic) way, using -b . within hg log is a bit like taking whatever revset specifier you have and adding:
(revset) & branch(.)
—though simply asking this question this brought up one point I was unsure about: is the limiting done before calculating tagged(), or after? Some poking-about in hg --debugger tells me that it's "after", which means we get:
(last(tagged()) & (branch(.))
which means that if there are tags on, e.g., revs 1, 7, and 34, we'll select rev 34 first, then select revisions whose branch is the current branch. Suppose that rev 7 is a member of the current branch, but rev 34 is not. The result of the & is then the empty set.
That's probably not the issue here—the actual final expression is, or might as well be, branch(.) & descendants(last(tagged()))—but in at least some cases, it would probably be better to use:
last(tagged() & branch(.))
so that you start with the last revision that is both tagged, and on the current branch. (If this revlist is empty it's not clear what you should do next, but that is hard to program at this level, so let's just assume the revlist has one revision in it, e.g., rev 7 in our example here.)
This is probably not what you want after all, though; see the last section below.
Combining -b and a DAG range
A DAG-range operator like X::Y in Mercurial simply means: All commits/revisions that are descendants of X, including X itself, and ancestors of Y, including Y itself. Omitting Y entirely means all descendants of X. Without a -b limiter, you will get all such commits, but with -b ., you once again restrict yourself to those commits that on the current branch.
If you merge commits within one branch, then, you will get the merge commits and their ancestors and descendants that are on this branch. (Remember that in Mercurial, any commit is on exactly one branch, forever: this is the branch that was current when the commit itself was made.) But if you are using branches at all in Mercurial, you are probably merging commits that are in other branches. If you want to see any those commits, you cannot use -b . here.
Getting what you want
Let's go back to your first statement above:
With Mercurial, I'm trying to get the history of a branch since the last tag. BUT I want to include all the [commits] that were merged in as well.
Let's draw a quick example or two and see which commits you might want.
Here's a horizontal graph, with newer commits toward the right. Each commit is represented by an o unless it is tagged, in which case it is represented by a *. There are several merges. Commits on the first two rows are on branch B1 and commits on the third row are on branch B2.
o--o---o--o---o--*--o--o--o
b1: \ / /
*--o /
\ /
b2: o--*--o--o--o--o
It's not clear to me which commits you wish to see. The last tagged commit on b1 is the top row *, but there is a tagged commit on b2 as well (whose rev number is probably lower than the one on b1). Or suppose we had a slightly different graph, so that the highest numbered tagged revision were one on b2:
o--o---o--o---*--o--o--o--o
b1: \ / /
*--o /
\ /
b2: o--o--*--o--o--o
If we use the expression last(tagged()) without any branch masking, we will choose the rightmost starred commit. If we then feed that into a DAG operator (e.g., as X in X:: or using descendants(), we get all the commits that are "after" that one.
When we start with the single starred commit on b2—as in the last graph—we get that commit and the remaining three commits that are on b2, plus the last two commits that are on b1. That may be what you want, but perhaps you also want some commits that are on b1 that come before the merge, but after (and maybe including) the final starred commit that is on b1 itself.
Note that this is what you get with just descendants(last(tagged()), i.e., if you remove -b . from your original hg log command.
When we start with the last starred commit on b1, though, as in the earlier graph, we get just that commit plus the final three commits on branch b1. None of the commits on branch b2 that get merged are descendants of the starred commit we chose. So the DAG-range approach itself is suspect, here. Still, if eliminating tagged commits that are directly on b1 suffices, note that we can use:
descendants(last(tagged() and not branch(b1)))
(there is no difference between and and & here, I just spelled it out because I spelled out not).
There is another possibility that I see here: perhaps you want any commits that are ancestors of the current branch's final commit, but stopping at:
any tagged commit, or
any predecessor merge for any other branch than the first-merged branch arrived-at by traversing ancestors.
Visualizing this last case requires a more complex branch topology, with more than two total named branches. Since it's (a) hard and (b) not at all clear to me that this is what you want, I'm not going to write an expression to produce it.

Find all log messages explaining differences between two changesets

I'd like to find all differences between two mercurial revisions. I'd primarily like to see the history of the differences (i.e. the changeset log messages), not the internal details of what changed in the files.
Example: compare revisions 105 and 106
/---101---103---105
100 \
\---102---104---106
Here, revision 106 includes changesets 106,104 and 102 which 105 doesn't have, and 105 in turn includes 103 and 105 that 106 doesn't have. How can I easily get this list; ideally taking into account grafts too?
The following revision set query almost works:
(ancestors(105) - ancestors(106)) + (ancestors(106) - ancestors(105))
However, that's a fairly long query for something that seems like a fairly common question: why exactly does this branch differ from my local version? I also believe it fails to take into account grafts and it unfortunately includes uninteresting changesets such as merges.
Bonus points for including the git equivalent.
Edit: The reason I want this is to explain to humans how these versions differ. I've got a complex source tree, and I need to be able to tell people that version X includes features A & B and bugfix P, but version Y includes features C & D and bugfix Q - and that they're otherwise the same.
If I go back to my example: merges themselves aren't interesting (so in the example above 104 isn't interesting), but the changesets the merges consist of are very interesting - meaning 101 and 102. Merges combine lots of changes into one changeset that lacks reasonable log information. In particular, if I just find the nearest ancestor, I'd find 101, and then it'd look like 102 isn't of particular interest. In terms of the actual patches applied, this information is complete - I don't need to see how merge changeset 104 was constructed, only the result. However, if I want to know why it contains those changes, I need the log messages from 102.
Hrm, I've not tested it, but would:
ancestor(X,Y)::X + ancestor(X,Y)::Y
get you the same list. I think it would, and would also likely be faster.

sequence of branch taken or not-taken that reduces the branch misprediction rate

Increasing the size of a branch prediction table implies that the two branches in a program are less likely to share a common predictor. A single predictor predicting a single branch instruction is generally more accurate than is the same predictor serving more than one branch instruction.
List a sequence of branch taken and not-taken actions to show a simple example of a 2-bit predictor sharing (several different branch instructions are mapped into the same entry of the prediction table) that reduces the branch misprediction rate, compared to the situation where separate predictor entries are used for each branch. (Note: Be sure to show the outcomes of two different branch instructions and specifically indicate the order of these outcomes and which branch they correspond to)
Can someone explain to me what this question is asking for specifically? Also, what does "2-bit predictor sharing (several different branch instructions are mapped into the same entry of the prediction table)" and "separate predictor entries are used for each branch" mean? I've been reading and rereading my notes but I couldn't figure it out. I tried to find some branch prediction examples online but couldn't come across any.
"2-bit predictor" could be referring to either of two things, but much more likely one than the other.
The unlikely possibility is that they mean a branch table with only four entries, so two bits are used to associated a particular branch with an entry in the table. That's unlikely because a 4-entry table is so small that lots of branches would share the same table entries, so the branch predictor wouldn't be much more accurate than static branch prediction (e.g., always predicting backward branches as taken, since they're typically used to form loops).
The much more like possibility is using two bits to indicate whether a branch is likely to be taken or not. Some of the earliest microprocessors that included branch prediction (e.g., Pentium, PowerPC 604) worked roughly this way. The basic idea is that you keep a two-bit saturating counter, and make a prediction based on its current state. Intel called the states strongly not taken, weakly not taken, weakly taken, strongly taken. These would be numbered as (say) 0, 1, 2 and 3, so you can use a two-bit counter to track the states. Every time a branch is taken, you increment the number (unless it's already 3) and every time it's not taken, you decrement it (again, unless it's already 0). When you need to predict a branch if the counter is 0 or 1 you predict the branch not taken, and if it's 2 or 3 you predict it taken1.
A separate predictor entry used for each branch means each branch instruction in the program has its own entry in the branch prediction table. The alternative is some sort of mapping from branch instructions to table entries. For example, if you had a table with 220 entries, you could use 20 bits from a branch instruction's address, and use those bits as the index into the table. Assuming a machine with 32-bit addressing, and 32-bit instructions, you'd have up to 1024 branch instructions that could map to any one entry in the table (32-20-2 = 10, 210 = 1024). In reality you expect only a small percentage of instructions to be branches, some of the address space to be used for data, etc., so probably only a few branches would map to one entry in the table.
As far as the basic question of what it's asking for: they want a sequence of branch instructions that will (by what coincidence) be predicted more accurately when two branches map to the same slot in the branch predictor table than when/if each maps to a separate slot in the table. To go into just slightly more detail (but hopefully without giving away the whole puzzle), start with a pattern of branches where the branch predictor will usually be wrong. What the predictor basically does is assume that if the branch was taken the last time, that indicates that it's more likely to be taken this time (and conversely, if it wasn't taken last time, it probably won't be this time either).
So, you start with a pattern of branches exactly the opposite of that. Then, you want to add a second branch mapping to the same spot in the branch prediction table that will follow a pattern of branches that will adjust the data in the branch predictor table so that it more accurately reflects the upcoming branch rather than the previous branch.
1Technically, the Pentium didn't actually work this way, but it's how it was documented to work, and probably intended to work; the discrepancy in how it actually did work seems to have been a bug.

Mercurial update to local revision or hash changeset

I use Mercurial and i have a weird problem, i have a very big history and the local revisions in Mercurial now has 5 characters.
In Mercurial you can execute "hg up " and it can choose between the local revision or the hash changeset ( i have no idea the policy it uses to choose between each other ), in my case the local revision coincide with the 5 first characters of another hash changeset. For example:
I want to update to the local revision: 80145
If i execute:
"hg up 80145"
Mercurial doesn't update to the revision i want, it updates to an old one because its hash changeset is:
801454d1cd5e
So, does anyone know if there is a way to specify to which type of revision you want to update to? local revision or hash changeset.
Thanks all!
====
Problem solved. After some investigation i realized that Mercurial always update to the local revision if it exists, and to the hash changeset otherwise.
In my case the local revision didn't exist, so it was updating to the hash changeset
Sounds like you found your own answer (and should enter it as an answer instead of a comment and then select it -- that's not just allowed but encouraged around here), but for reference here's where that information lived:
$ hg help revisions
Specifying Single Revisions
Mercurial supports several ways to specify individual revisions.
A plain integer is treated as a revision number. Negative integers are
treated as sequential offsets from the tip, with -1 denoting the tip, -2
denoting the revision prior to the tip, and so forth.
A 40-digit hexadecimal string is treated as a unique revision identifier.
A hexadecimal string less than 40 characters long is treated as a unique
revision identifier and is referred to as a short-form identifier. A
short-form identifier is only valid if it is the prefix of exactly one
full-length identifier.
Any other string is treated as a bookmark, tag, or branch name. A bookmark
is a movable pointer to a revision. A tag is a permanent name associated
with a revision. A branch name denotes the tipmost revision of that
branch. Bookmark, tag, and branch names must not contain the ":"
character.
The reserved name "tip" always identifies the most recent revision.
The reserved name "null" indicates the null revision. This is the revision
of an empty repository, and the parent of revision 0.
The reserved name "." indicates the working directory parent. If no
working directory is checked out, it is equivalent to null. If an
uncommitted merge is in progress, "." is the revision of the first parent.
So as you found the first interpretation was as a revision number and when that didn't match anything it was tried as the prefix of a revision id. In theory this could happen with even the number 1 if your only changeset was revision 0 and its hash started with 1.

Assignment of mercurial global changeset id

Apparently Mercurial assigns a global changeset id to each change. How do they ensure that this is unique?
As Zach says, the changeset ID is computed using the SHA-1 hash function. This is an example of a cryptographically secure hash function. Cryptographic hash functions take an input string of arbitrary length and produces a fixed-length digest from this string. In the case of SHA-1, the output length is fixed to 160 bit, of which Mercurial by default only shows you the first 48 bit (12 hexadecimal digits).
Cryptographic hash functions have the property that it is extremely difficult to find two different inputs that produce the same output, that is, it is hard to find strings x != y such that H(x) == H(y). This is called collision resistance.
Since Mercurial uses the SHA-1 function to compute the changeset ID, you get the same changeset ID for identical inputs (identical changes, identical committer names and dates). However, if you use different inputs (x != y) when you will get different outputs (changeset IDs) because of the collision resistance.
Put differently, if you do not get different changeset IDs for different input, then you have found a collision for SHA-1! So far, nobody has ever found a collision for SHA-1, so this will be a major discovery.
In more detail, the SHA-1 hash function is used in a recursive way in Mercurial. Each changeset hash is computed by concatenating:
manifest ID
commit username
commit date
affected files
commit message
first parent changeset ID
second parent changeset ID
and then running SHA-1 on all this (see changelog.py and revlog.py). Because the hash function is used recursively, the changeset hash will fix the entire history all the way back to the root in the changeset graph.
This also means that you wont get the same changeset ID if you add the line Hello World! to two different projects at the same time with the same commit message -- when their histories are different (different parent changesets), the two new changesets will get different IDs.
Mercurial's changeset IDs are SHA-1 hashes of the "manifest" for each changeset. It only prints the first dozen hex digits of the global ID, but it uses the full SHA-1 for internal operations. There's no actual guarantee that they are unique, but it is sufficiently unlikely for practical purposes.
See here for gory details.