Breaking the file connection in Mercurial after bad file split - mercurial

Background:
At first there was a file base.c, and that file was in the repository which had only one branch Base.
Base was branched to make Extended-branch. Then this new branch had several changes made to base.c.
If bugs were fixed in file base.c in Base, they would be merged to Extended.
It turns out that adding too much stuff to base.c in Extended branch was not a good idea, so file is copied to ext.c. Then most of the Extended additions are removed from base.c, and Base functionality from ext.c. So at this point base.c in Extended is very similar as in Base.
Problem:
When file was split, Mercurial was informed that ext.c is a copy of base.c, because they shared a common history. Unfortunately this was not a good idea.
Now if bugs are fixed on Base branch and merged to Extended, Mercurial thinks that those changes should be applied to both base.c and ext.c, even though the latter is no longer has any similarities to former. This makes the merges very annoying.
Is there a way to tell the Mercurial that ext.c should no longer be considered same as base.c? One solution would be to replace ext.c with new file, but then history would not follow.

You can break the connection with hg forget ext.c, and then hg add ext.c as a new file. But if you do this with the tip revision of ext.c, you'll lose all history up to this point.
What you can do instead is to add an earlier version of ext.c, maybe even from before the rename, and replay (graft) its history since then. You could add the history of ext.c as a branch to a past revision, and merge it into tip:
---o---o---o---(tip)--(merge)
\ /
e---e---e ... e
Or perhaps it doesn't matter, and you can just add the history of ext.c to the current tip:
---o---o---o---(tip)--e--e--e--e
Neither of these approaches involves any history rewriting (the "forgotten" version of ext.c is just left as a dead end), so there should be no problems with propagating changes.

Related

Mercurial: How do I fix an incorrectly published rename/copy?

I've somehow marked a bunch of files as being "copy/renamed" instead of just marking them as new files.
I've also published these changes to our central repository and can cannot strip them out (we've already got some changes on top and I don't have the permissions/ability to fiddle with the central repository anyway).
Context:
I've actually done it to a whole bunch of files, but here's an example of what I did to one: I've copied a file called "Labels.properties", to a new file called "Labels_ja.properties" (the new file contains a subset of the original properties, and eventually all the values will be translated to a different language). The "Labels_ja.properties" file has been marked as a copy/rename of "Labels.properties" (the only way I've been able to see exactly what's happened is by looking with SourceTree, which shows "File copied/renamed from Labels.properties").
Our environment is a sort of central repository with automated "pull request" style merging tools built on top, so solutions involving all our developers magically knowing exactly how to drive Hg in order to resolve these conflicts aren't going to work - it's the scripts that are getting the merge conflicts.
These copy/renames are causing a lot of hassles: when people touch the original versions of the property files (they don't even know about the copied files yet) - it looks like Hg is trying to merge those changes onto the copies, but because the files are very different, those merges are failing with conflicts.
Problem:
What can I do to sort out all these merge conflicts that our automated merge scripts are getting?
Ideally, I'd like to go back in time and just mark all these files "new" - but there's no going back now that the changesets have been published.
Can I just make a big backout commit, then re-add the files (making sure that they are marked "new" and not "copy/renamed")?
I cannot think of a good solution. Even if you delete the files, Mercurial will still complain repeatedly when they get changed and merged. The simplest solution I can think of is moving the bad files into some "attic" directory where they don't hurt anyone and never touching them again. Then add the real files where they belong.
If I understand correctly, the problem is in fact that the developers are still based on a revision that does not have your move.
When they change the original file, an 'hg merge' will then smash together the changes from the developer and the changes to the new file in that new file.
I see a few solutions for this:
You can tell the developers to rebase their changes on top of the new changes. This will actually not work, since rebase will make the same mistake as merge.
The developers can turn their changes into patches and apply their patches on top of your changes. Their patches will refer to the original files, not the changed files, so this should work fine for the tools that merge later on.
The above flow can be done in a more sophisticated way using Mercurial Queues: import the existing developer commits as patches, then qpop all of these, update to your newer revision and qpush all of the revisions again.
Mark the head with your changes as 'closed' using '--close-branch'. However, if your tools are not equipped to handle this correctly (by ignoring the closed branch), this may cause issues. Branches are reopened when a new commit is done on top of the branch. So if you merge developer changes with the closed branch, it will open again.

Is cvs2hg still potentially producing corrupted repositories?

Trying to migrate a repository from cvs to hg, I found the tool cvs2hg, and it seems to do nicely he job (conversion goes fine, and I have all the tags and branches).
However, the hg documentation warns about "fixup commits" making the repository somewhat corrupted or at least dangerous.
Is this still a problem ? Maybe hg or cvs2hg have benefited from fixes since this warning was written.
If it is, potentially, how can I check if I am in such a dangerous situation, on the resulting hg repository ?
Fixup commits are good and necessary. And cvs2hg does much better job than hg convert.
But maybe first about the problem. In CVS repository you can play various dirty tricks with tags and branches. For example, you can manually fine-tune some tag tagging today's version of 3 files, yesterday's version of 4 others, and month-long version of yet another. In practice, I did it a lot of times to make "patch tags" (there is some old tag, I have various commits afterwards, there turns out to be a bug, I fix the bug, make fixup tag by old tag, moving it on 1-2 files).
In the result, you get tag which points to release which naver has existed or will exist at any point of repository history, if the history is taken for whole repo.
Similar tricks could be made with branches. Or branches can start from "ugly" tag.
Any kind of „natural” conversion of CVS to HG is dead lost on such cases. There is no place in the time-based history at which such tag or branch could be hooked. And hg convert just binds such tags at more-or-less random places, and branches at very ugly places.
Fixup commits simply are those missing revisions: artificial commits which are bound at appropriate place and introduce changes which put repository at state at which it should be at given tag. With those, we get both "artificial" tags, and branches, properly bound to proper code.
So if you:
commited a.c(1.1), b.c(1.1) and c.c(1.1)
commited a.c(1.2), b.c(1.2)
commited c.c(1.2)
artificially created tag blah_1.0 which points to a.c(1.1), b.c(1.1) and c.c(1.2)
commited a.c(1.3), b.c(1.3)
...
then hg convert based history will have 4 edit changesets (just like those above) and blah_1.0 bound at some ugly place with wrong content. At the same time, cvs2hg will create "fixup commit" which will artificially create changeset at which we really have a.c(1.1), b.c(1.1) and c.c(1.2), and tag there. In a history, such changeset is reasonably similar to transplanted/grafted/cherry-picked commit.
You should carefully check the resulting repository to make sure it represents your code history and doesn't contain any of these crappy fixup commits.
BTW, it might be worthwhile to check out the newer http://www.catb.org/esr/reposurgeon/ tool.

Why does Mercurial make you pull/update/merge for unrelated files?

For larger teams, having to pull/update/merge then commit each time makes no sense to me, specifically when the files that were changed by other developers have nothing to do with my changeset files.
i.e. I change file1.txt, and someone else changes file10.txt. Why must I merge on my computer before being allowed to push?
It makes pushing a big pain, as you have to constantly pull/update/merge if many developers are commiting.
Also, it makes your changeset look much larger than it was since it shows your merges as seperate commits.
Mercurial makes you do this since its atomic unit isn't a file but a changeset. That is a node containing a group of changes. Each changeset is an individual node in history and represents what that person did. This does result in you having to merge even if no common files where changes (which would be a simple automatic merge). These merge nodes are important since they are part of your repositories history and gives Mercurial more information for future merges with ancestral information.
That said there is an extension you can use that would clean up your history a bit (but won't resolve your issue with needing to pull before you push). It is called the rebase extension, it is shipped with Mercurial but disabled by default. It adds a new arumument to pull that looks like:
hg pull --rebase
This will pull new changes and moves your local changeset linearly above them without having a merge changset. However, I would urge against using this since you do lose information about your repository since you are re-writing its history. Read this post for information about some issues that this may cause.
Well, you could try using rebase, which will avoid the merge commits, but it is not without its own perils. You can also collapse to one step by doing "hg pull --update", rather than separate hg pull; hg update commands.
As for why you must merge on your computer: this is a direct consequence of mercurial being a distributed version control system. There is no central server which can be considered canonical (unless you create one by convention), so there is no other "place" where the merge could occur. You are the only one who can decide how the information in your repo should be combined with the information in the remote repo. The results of these decisions must be recorded, and that is the origin of the merge commit.
Also, in your example the merge would happen without user interaction since there are no conflicts (the same would be true with rebase), so I don't see why that is a problem.
Because having changes in disjunct files does not guarantee that they are independent.
When you pull in changes, even if they are in files that are untouched by your local changes, it can cause your local changes to stop working. E.g. an interface that you access from newly written code could have been changed.
This is why there is always a merge step inbetween, so that a human can review the changes, test for issues, and address them before integrating the changes back into the main repository. This step is very important, because skipping it risks blocking all those 50-100 colleagues (which is very expensive).
I would take Lasse’s advice and push less often. Merging isn’t a big deal if you only need to do it twice or thrice a day. Also maybe create smaller team repositories (or branches) that are merged with the main repository daily by a designated person.

mercurial temporarily ignore versioned files

My question is essentially the same as here but applies to mercurial. I have a set of files that are under version control, and one save operation changes quite a lot of files. Some of the resulting changes are important for revision control, and some of the changes are just junk. I can "partition" off the junk into separate files. These junk files need to be part of a basic checkout in order for it to work, but their contents (and changes over time) aren't that important for revision control. Right now I just tell all our developers not to commit these files, but we all forget and it creates a lot of extra baggage in the repository. I don't really like the svn solution proposed because there are quite a lot of files and I want a simple clone to just work without all this extra manual work, so I was wondering if mercurial has a better alternative. It's kind of like hg shelve but not quite, and kind of like ignore, but not quite. Is there some hg extension that allows for this? Can git do it?
Mercurial doesn't support this. The correct way to do it is to commit thefile.sample and then have your developers (or better you deploy script) do a copy from thefile.sample to thefile if thefile doesn't exist. That way anyone can update the example file, but there's no risk of them committing their local changes (say their personal database connect string).
Aha! So TortoiseHG's repository and global settings have an Auto Exclude List where you can define a list of files that will be unchecked by default when the status, commit, and shelve dialogs open. So they still show up, but the user has to check them in order to actually do a commit. The setting is stored in hgrc, but it's under the [tortoisehg] heading so it's not supported by mercurial per se. Nevertheless, it fits my needs.
One solution to this is to use nested tree support (submodule in git), where the "junk" would be put in a different repository (to avoid cluttering the main repo), while enabling checking out the whole thing out in a consistent manner (right version of both repos in sync).
https://www.mercurial-scm.org/wiki/Subrepository?action=show&redirect=subrepos
In git, submodules are one solution to this issue - but they are not that great UI-wise. What I do instead is to keep two completely independent repositories, and using the subtree merge strategy when I need to update the main repo with the junk repo: http://progit.org/book/ch6-7.html

A mercurial merge chose the wrong changes, what is the correct way to fix this?

Changes were made to our .vcproj to fix an issue on the build machine (changeset 1700). Later, a developer merged his changes (changes 1710 through 1715) into the trunk, but the mercurial auto-merge overwrote the changes from 1700. I assume this happened because he chose the wrong branch as the "parent" of the merge (see part 2 of the question).
1) What is the "correct" mercurial way to fix this issue, considering out of all the merged files, only one file was merged incorrectly, and
2) what should the developer have done differently in order to make sure this didn't occur? Are there ways we can enforce the "correct" way?
Edit: I probably wasn't clear enough on what happened. Developer A modified a line in our .vcproj file that removed an option for the compiler. His check-in became changeset 1700. Developer B, working from a previous parent (let's say changeset 1690), made some changes to completely different parts of the project, but he did touch the .vcproj file (just not anywhere near the changes made by Developer A). When Developer B merged his changes (becoming changes 1710 through 1715), the merge process overwrote the changes from 1700.
To fix this, I just re-modified the .vcproj file to include the change again, and checked it in. I just wanted to know why Mercurial thought that it shouldn't keep the changes in 1700, and whether or not there was an "official" way to fix this.
Edit the second: Developer B swears up and down that Mercurial merged the .vcproj file without prompting him for conflict resolution, but it is of course possible that he's just misremembering, in which case this whole exercise is academic.
I will address the 2nd part of you question first...
If there is a conflict, the automated merge tools should force the programmer to decide how the merge happens. But the general assumption is that a conflict will involve two edits to the same set of lines. If somehow a conflict arises because of edits to lines that are not close to each other the automated merge will blithely choose both of the edits and a bug will appear.
The general case of a merge tool always merging properly is very hard to solve, and really can't be with current technology. Here is an example of what I mean from C:
int i; // Someone replaces this with 'short i' in one changeset stating
// that a short is more efficient.
// ... lots of code;
// Someone else replaces all the 65000s with 100000s in another changeset,
// saying that more precision is needed.
for (i = 0; i < 65000; ++i) {
integral_approximation_piece(start + i/65000.0, end + (i + 1) / 65000.0);
}
No merge tool is going to catch this kind of conflict. The tool would have to actually compile the code to see that those two parts of the code have anything to do with eachother, and while that would likely be enough in this case, I can construct an example that would require the code to be run and the results examined to catch the conflict.
This means that what you really ought to do is rigorously test your code after a merge, just like you should after any other change. The vast majority of merges will result in obvious conflicts that a developer will have to resolve (even though that resolution is often fairly obvious), or will merge cleanly. But the very few merges that don't fit either category can't easily be handled in an automated fashion.
This can also be fixed by development practices that encourage locality. For example a coding standard that states "Variables should be declared near where they're used.".
I'm guessing that .vcproj files are particularly prone to this problem since they are not well understood by developers and so if conflicts do appear they will not be sure what to do with them. My guess is that this happened and your developer simply did a revert back to the revision (s)he checked in.
As for part 1...
What to do in this case depends a lot on your development process. You can either strip the merge changeset out and redo it, though that won't work very well if lots of people have already pulled it, and it will work especially poorly if there are lots of changesets that have already been checked in that are based on the merge changeset.
You can also check in a new change that fixes the problem with the merge.
Those are basically your two options.
The tone of your post seems to me to indicate that you may have some politics surrounding this issue in your organization, and people are blaming this error on the frequent merges of Mercurial. So I will point out that any change control system can have this problem. In the case of Subversion, for example, every time a developer does an update while they have outstanding changes in their working directory, they are doing a merge, and this kind of problem can arise with any merge.
In mercurial a merge doesn't have a single parent, it by definition has two and only two parents. When someone is merging they're making two choices:
What two changesets will constitute the two changes
Which of those changesets will be the left-parent and which will be the right-parent
Of those two questions the first is very important, and the second barely matters at all, though it took me a while to come to understand that.
You select the left-parent by using hg update X. That changes the output of hg parents (or in newer versions hg summary) and essentially determines what's in your working directory before the merge.
You select the right-parent by using hg merge Y. That says merge X (the working directory's parent) with changeset Y. As a special case, if there are only two heads in your repository and your parent is already one of them then Y will default to the the other.
I'd have to see your resulting graph to know just what the developer did, but it's possible he didn't update to one head or another before invoking merge, which would have him merging one head with some point back in history.
If your developer picked the right parents for the merge then the left vs. right doesn't much matter -- the only real difference is that when one uses hg diff or hg log -p or some other command that shows the patch for a merge changeset, it's displayed relative to the left-parent. That's, however, mostly a factor in display only. Functionally they're pretty much identical.
Assuming your developer picked the right changesets then what he should have done was test the result of the merge before committing it. Merging is software development, not an annoying VCS side effect, and not testing before committing is the error.
Fixing
To fix this, just re-do the merge correctly. Use hg update to set one parent, use hg merge to pick the other. Make sure your current working directory is correct and then commit. You can get rid of his bad merge using something like hg strip or better, just close down his branch with hg commit --close-branch after updating to it.
Avoiding
You say "mercurial auto-merge", but mercurial doesn't really auto-merge. It does a premerge which is an extremely cautious combination of obvious changes, but it's so careful it won't even merge for you if each merge parent adds code in the same region because it can't know which block of code you'd rather have first.
You can disable this premerge entirely or on a file-by-file basis using the merge tool configuration options:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration?highlight=premerge