does mercurial have problems with files being added and removed - mercurial

I had some code changes that I accidentally pushed to the team repo and as such I created extra heads.
Usually I keep those changes in my own repo but I just forgot to refer to the changeset to push and ended up with extra heads.
To solve this problem I used the tip in How to merge to get rid of head with Mercurial command line, like I can do with TortoiseHg?
Works great to me, in fact I am quite happy with the fact that this way my changes are not lost at all and that team members don't have to strip any files in their repo. As matter of fact even though this head is not used, I might want to refer to it in the future and do take some of those changes.
Now after I did this change I got a reaction from a colleque who is our mercurial expert as follows : [begin quote]
The main issue now will be that everyone also in MS will have have these changesets, will see merge ‘arrows’, will see this in history despite that it is not included. Also the ‘merge’ commit does not mention at all that this is a dummy merge. So in case you add other heads that later on have to be dummy merged, please clearly comment this in the next merge commits to clarify this is the case.
Also the adding and removing and changing files, can become a problem now. I am not sure if this will be the case now due to the dummy merge, it might optimize that these files were not impacted, this will have to be tested.
[end quote]
my reaction is that a dummy merge is easily seen, if you select it in TortoiseHg, you just don't see any changes. A commit message 'dummy merge' is probably a good idea. So that point is taken.
The fact that my changes are seen by everybody, is hardly a problem for me, on the contrary this way they are kept and are a future reference. I don't see the problem here.
But the last bullet in bold about the adding removing and changing files ? Can this really be a problem in mercurial? Is this not the essence of using a versioning system in the first place ? What should I test, like he seems to suggest ?
Is his comment correct ?

Your friend's fears are unfounded. He or she is right that a good detailed comment is always appreciated, but there's no pain coming in the future.
When the Evolve mechanism graduates from experimental extension to core functionality it will offer an even better solution. You'll strip the changeset, but it won't actually be removed (as strip does now). Instead it'll be marked obsolete and folks pulling will get it and the "obsolete" marker saying not to show it in normal usage.

Related

Is it safe to use Mercurial Queues and the Share extension together as long as your working directories are on separate branches?

I've thought this through and I think I understand the implications, but I wanted to get a sanity check because the caveats on https://www.mercurial-scm.org/wiki/ShareExtension are pretty general.
Specifically, the warning is "It's probably not a good idea to mix MQ and shared clones; if you do so, you should definitely avoid pushing/popping patches in one clone while another clone has patches applied."
However based on my understanding of how Mq works, it's only unsafe to push/pop patches (create/destroy history) if you have two shares whose working directory parent would be affected by such changes. That is, if you have two shares that are updated to separate named branches, pushing/popping patches from one should only have the effect on the other of creating/destroying history that is unrelated to the working directory and thus should NOT have any undesirable side-effects.
There will be small side effects, such as revision sequence number changes in some situations, but nothing that should jeopardize correctness or cause problems with the working directory.
Is this correct or am I missing something?
I'm not sure about this, but AFAIK if you end up having the exact same file content in both branches (repos), this might still wind up as shared storage and wreak havoc.
This is obviously not a definitive answer, but I just wanted to report back in case anyone else is interested in this situation. I've been running for several months now with multiple shares on a "central copy" of a large repo, with each share being dedicated to its own branch, and using MQ freely within each share. I have not hit any problems. History changes on other branches just look the same as pulls/strips would -- unrelated changesets being added, modified, and removed.

Is cvs2hg still potentially producing corrupted repositories?

Trying to migrate a repository from cvs to hg, I found the tool cvs2hg, and it seems to do nicely he job (conversion goes fine, and I have all the tags and branches).
However, the hg documentation warns about "fixup commits" making the repository somewhat corrupted or at least dangerous.
Is this still a problem ? Maybe hg or cvs2hg have benefited from fixes since this warning was written.
If it is, potentially, how can I check if I am in such a dangerous situation, on the resulting hg repository ?
Fixup commits are good and necessary. And cvs2hg does much better job than hg convert.
But maybe first about the problem. In CVS repository you can play various dirty tricks with tags and branches. For example, you can manually fine-tune some tag tagging today's version of 3 files, yesterday's version of 4 others, and month-long version of yet another. In practice, I did it a lot of times to make "patch tags" (there is some old tag, I have various commits afterwards, there turns out to be a bug, I fix the bug, make fixup tag by old tag, moving it on 1-2 files).
In the result, you get tag which points to release which naver has existed or will exist at any point of repository history, if the history is taken for whole repo.
Similar tricks could be made with branches. Or branches can start from "ugly" tag.
Any kind of „natural” conversion of CVS to HG is dead lost on such cases. There is no place in the time-based history at which such tag or branch could be hooked. And hg convert just binds such tags at more-or-less random places, and branches at very ugly places.
Fixup commits simply are those missing revisions: artificial commits which are bound at appropriate place and introduce changes which put repository at state at which it should be at given tag. With those, we get both "artificial" tags, and branches, properly bound to proper code.
So if you:
commited a.c(1.1), b.c(1.1) and c.c(1.1)
commited a.c(1.2), b.c(1.2)
commited c.c(1.2)
artificially created tag blah_1.0 which points to a.c(1.1), b.c(1.1) and c.c(1.2)
commited a.c(1.3), b.c(1.3)
...
then hg convert based history will have 4 edit changesets (just like those above) and blah_1.0 bound at some ugly place with wrong content. At the same time, cvs2hg will create "fixup commit" which will artificially create changeset at which we really have a.c(1.1), b.c(1.1) and c.c(1.2), and tag there. In a history, such changeset is reasonably similar to transplanted/grafted/cherry-picked commit.
You should carefully check the resulting repository to make sure it represents your code history and doesn't contain any of these crappy fixup commits.
BTW, it might be worthwhile to check out the newer http://www.catb.org/esr/reposurgeon/ tool.

Why does Mercurial's strip command not rewrite history?

The Mercurial help text says, that the "strip command removes the specified changesets and all their descendants." This sounds very much like rewriting history to me, and that it must cause problems if somebody has based his work on one of the changesets that suddenly is removed. But the help text also says that the command "is not a history-rewriting operation and can be used on changesets in the public phase." I am sure that the person who wrote the help text knew very well what he was doing, so what am I missing to understand this?
The key point is that if you strip a public changeset, and then pull it again from somewhere, you haven't caused any issues. You just get the original changeset back.
If you (for example) collapse two public changesets together, and then pull the original from somewhere, you now have two branches. One with the original two changesets, and one with the collapsed changeset, but both have the same changes. At that point hell breaks loose and child eating monsters roam the earth.
Hence 'history re-writing' isn't the same as 'history stripping'.
davidmc24 pointed out this post by Matt Mackall (Mercurial's father) in which he says basically the same thing
I can't say it with certainty, but my guess is that it's "grandfathered in". hg strip started out as part of mq which predates the addition of phases by at least three years.
Likely better phrasing would be:
is not *considered *a history-rewriting operation and can be used on changesets in the public phase
When phases were added a huge amount of care was taken to break no one's existing workflow. Commits start out in the draft phase and become public once pushed. Any phase-aware commands knows that after pushing the commit's phase is public and not to allow modification if it (unless the push was to a non-publishing repository...).
However, there were people already using strip manually and in scripts to remove changesets that had been pushed, and if strip had after an upgrade suddenly said "Hey, you can't strip that it's public!" then those people would have had their backward-compatibility promise broken.
Phases is slowly growing into a pretty amazing evolve system that will be a much better choice than mq for almost all cases, but I still doubt we'll ever get Matt to remove mq and strip -- he still insists on maintaining a Python 2.4 compatible codebase and that's 9 years old!
Tl;Dr: Even though strip was always a disabled extension, too many people use it to change it's behavior w/ the advent of phases.

Grouping a set of commits in Mercurial?

I'm working on a new feature branch. It's necessary to keep all the history, but for someone scouring over the history at a later date, much of it is over verbose.
For example I may have 5 commits taking through the steps of adding a new database table, its business logic, its validation and some experiments that I change my mind about etc etc. But for a co-developer all they might need to know is "this fixed bug X".
Is it possible to somehow group a set of commits, so that an overview is shown in the log but still being able to view all the history. Not only only my local repo, but the remote repo as well.
I'm guessing that I could have separate sub-branches and merge them as I go along. But I'll only know that I want to group a set of commit retrospectively. So I don't think that is a good route, as I'll have to keep going back and forth.
I can see that there is a group extension but it's unmaintained. And my experience with unmaintained plugins means that usually I'm going the wrong way about it and that there is a perhaps a better technique.
Is there any best practice around achieving this sort of thing?
For what it's worth, I think you're going down the correct route when you say you want to keep all of your history available. You could use the MQ extension to collapse your changesets into a single commit, but - although this would give you a 'clean' commit - you would lose all that juicy detail.
My way of handling this is to develop on a branch or in a separate clone, and when it's going into Production I describe the whole group of changes in the commit message of the merge, i.e. don't just use "Merge" for the commit message :).
I understand your point about only knowing if you need to group retrospectively, but I think as long as you have some rigour around your dev/test/release process then this shouldn't be too much of a limitation.
You want the collapse extension.

A mercurial merge chose the wrong changes, what is the correct way to fix this?

Changes were made to our .vcproj to fix an issue on the build machine (changeset 1700). Later, a developer merged his changes (changes 1710 through 1715) into the trunk, but the mercurial auto-merge overwrote the changes from 1700. I assume this happened because he chose the wrong branch as the "parent" of the merge (see part 2 of the question).
1) What is the "correct" mercurial way to fix this issue, considering out of all the merged files, only one file was merged incorrectly, and
2) what should the developer have done differently in order to make sure this didn't occur? Are there ways we can enforce the "correct" way?
Edit: I probably wasn't clear enough on what happened. Developer A modified a line in our .vcproj file that removed an option for the compiler. His check-in became changeset 1700. Developer B, working from a previous parent (let's say changeset 1690), made some changes to completely different parts of the project, but he did touch the .vcproj file (just not anywhere near the changes made by Developer A). When Developer B merged his changes (becoming changes 1710 through 1715), the merge process overwrote the changes from 1700.
To fix this, I just re-modified the .vcproj file to include the change again, and checked it in. I just wanted to know why Mercurial thought that it shouldn't keep the changes in 1700, and whether or not there was an "official" way to fix this.
Edit the second: Developer B swears up and down that Mercurial merged the .vcproj file without prompting him for conflict resolution, but it is of course possible that he's just misremembering, in which case this whole exercise is academic.
I will address the 2nd part of you question first...
If there is a conflict, the automated merge tools should force the programmer to decide how the merge happens. But the general assumption is that a conflict will involve two edits to the same set of lines. If somehow a conflict arises because of edits to lines that are not close to each other the automated merge will blithely choose both of the edits and a bug will appear.
The general case of a merge tool always merging properly is very hard to solve, and really can't be with current technology. Here is an example of what I mean from C:
int i; // Someone replaces this with 'short i' in one changeset stating
// that a short is more efficient.
// ... lots of code;
// Someone else replaces all the 65000s with 100000s in another changeset,
// saying that more precision is needed.
for (i = 0; i < 65000; ++i) {
integral_approximation_piece(start + i/65000.0, end + (i + 1) / 65000.0);
}
No merge tool is going to catch this kind of conflict. The tool would have to actually compile the code to see that those two parts of the code have anything to do with eachother, and while that would likely be enough in this case, I can construct an example that would require the code to be run and the results examined to catch the conflict.
This means that what you really ought to do is rigorously test your code after a merge, just like you should after any other change. The vast majority of merges will result in obvious conflicts that a developer will have to resolve (even though that resolution is often fairly obvious), or will merge cleanly. But the very few merges that don't fit either category can't easily be handled in an automated fashion.
This can also be fixed by development practices that encourage locality. For example a coding standard that states "Variables should be declared near where they're used.".
I'm guessing that .vcproj files are particularly prone to this problem since they are not well understood by developers and so if conflicts do appear they will not be sure what to do with them. My guess is that this happened and your developer simply did a revert back to the revision (s)he checked in.
As for part 1...
What to do in this case depends a lot on your development process. You can either strip the merge changeset out and redo it, though that won't work very well if lots of people have already pulled it, and it will work especially poorly if there are lots of changesets that have already been checked in that are based on the merge changeset.
You can also check in a new change that fixes the problem with the merge.
Those are basically your two options.
The tone of your post seems to me to indicate that you may have some politics surrounding this issue in your organization, and people are blaming this error on the frequent merges of Mercurial. So I will point out that any change control system can have this problem. In the case of Subversion, for example, every time a developer does an update while they have outstanding changes in their working directory, they are doing a merge, and this kind of problem can arise with any merge.
In mercurial a merge doesn't have a single parent, it by definition has two and only two parents. When someone is merging they're making two choices:
What two changesets will constitute the two changes
Which of those changesets will be the left-parent and which will be the right-parent
Of those two questions the first is very important, and the second barely matters at all, though it took me a while to come to understand that.
You select the left-parent by using hg update X. That changes the output of hg parents (or in newer versions hg summary) and essentially determines what's in your working directory before the merge.
You select the right-parent by using hg merge Y. That says merge X (the working directory's parent) with changeset Y. As a special case, if there are only two heads in your repository and your parent is already one of them then Y will default to the the other.
I'd have to see your resulting graph to know just what the developer did, but it's possible he didn't update to one head or another before invoking merge, which would have him merging one head with some point back in history.
If your developer picked the right parents for the merge then the left vs. right doesn't much matter -- the only real difference is that when one uses hg diff or hg log -p or some other command that shows the patch for a merge changeset, it's displayed relative to the left-parent. That's, however, mostly a factor in display only. Functionally they're pretty much identical.
Assuming your developer picked the right changesets then what he should have done was test the result of the merge before committing it. Merging is software development, not an annoying VCS side effect, and not testing before committing is the error.
Fixing
To fix this, just re-do the merge correctly. Use hg update to set one parent, use hg merge to pick the other. Make sure your current working directory is correct and then commit. You can get rid of his bad merge using something like hg strip or better, just close down his branch with hg commit --close-branch after updating to it.
Avoiding
You say "mercurial auto-merge", but mercurial doesn't really auto-merge. It does a premerge which is an extremely cautious combination of obvious changes, but it's so careful it won't even merge for you if each merge parent adds code in the same region because it can't know which block of code you'd rather have first.
You can disable this premerge entirely or on a file-by-file basis using the merge tool configuration options:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration?highlight=premerge