Hold on, this one's going to be rough. I'm attempting to clean up a Mercurial repo that I can only describe as pathological, and it appears the simplest 'fix' is to swap one directory between two branches. I'm not sure if I can adequately explain the situation, but I'll give it a shot.
We have a tool that generates raw source from a large set of input. (Hundreds or thousands of input files, similar number of output files.) This output is then hand-tweaked and optimized as needed, before being declared ready for deployment. The tool is run inside a directory that offers the configuration files, etc.
Something like:
proj/.hg
/input
/output
/config-env
Now, what has happened is that a branch was used for the tweaking, instead of the (to me) logical choice of a shared mq approach. (The team wasn't aware of mq at the time, are mostly CVS trained, but to their credit, they are working to improve the workflow.)
Here's where it gets grody.
The named branch is the raw output branch, and the default branch is the hand tweaked branch.
The output of the tool was continually created, then hg update raw, then hg merge, to try and merge tool refinements to the output into the raw branch, then that was merged back into the 'default' branch to incorporate both the refinements and prior hand patches, using the merge failure list as the pick list for what needed to be further hand edited.
ie, there's this crazy cycle between the two branches that has made development a minor nightmare of fragility and errors. That's what I'm trying to clean out. What I want is for the default branch to be the raw output, and a new 'deploy' branch that contains the hand patches, with merges going only one way, from default to deploy. Still not optimal, IMO, but until I can convince the powers that be that mq is the way to go, it's a rather large improvement.
But. There's more. The config-env and input directories were altered as time went on, but the changes were not kept clean and in sync between the two branches. The changes to the input and config-env are on the default branch, but the matching raw output is on the raw branch, with stale input and config-env that don't really belong there. I've inquired about removing the stale files as unnecessary, but there are some arguing that we need to retain them. For that reason, I want to swap only the output directory between the two branches.
I considered something like hg convert --branchmap swap.txt --filemap only-output.txt --datesort but then realized I would get only the output directory in the resulting repo. I might be able to literally strip out the output directories from the two branches into their own repos with the branch names swapped, then merge the two repos together (no-output, output-with-swapped-branches), but I figured that perhaps someone on here would have a better idea.
Of course, what I'm aiming for is to have one branch with an mq patch list to store the hand patches, and that patch list be part of the shared repo, but babysteps are in order.
If push comes to shove, I think we can eliminate most of the history of the output directories on both branches, but we do require certain tagged releases to be retained on both branches.
Related
I have a mercurial repo that now effectively contains two repos.
At some point in the past, a branch was split off from default (let's call it default-other).
Both default and default-other now have a lot of commits and there are no plans to ever merge default-other back to default. Some commits have been grafted from default to default-other.
Now, I want to create a new repo from the default-other branch while retaining the full history. i.e. a repo that contains
all commits from default up to the point where the default-other branch was created
all commits from default-other
all branches that have been branched off from default-other
no commits from default after default-other was branched off
Optimally, I would like to use different strategies for different branches. Some branches are release branches which must be kept, whereas others are feature branches which I would like to fold into the new default.
I tried using the convert extension, but I cannot figure out how to correctly set up the branch map.
I think you can do almost everything you stated using hg strip. Something like this:
Clone the original repo to a new repo
hg up null to remove anything from the working directory (probably quicker to work with it like that)
Identify all branches (named or anonymous) and changesets that you want to remove. To be precise, what you want is the changeset ID of each commit and its descendants that should be removed.
hg strip -r 12345678890 for each of those commits. This removes the commit & all its descendants (potentially including merges).
If needed, you can do this procedure twice, but the second time strip the orthogonal set of changesets (those which pertain to default-other but not to default).
That said, there are a few reasons you might still want to use hg convert:
There are files in the repo that are not relevant. For instance, if default branch had some old stuff which has nothing to do with default-other. You could use convert to eradicate history of such files.
You want default and default-other to look like they were always the same branch. You can use convert to rename one or both of them. (branchmap)
You want to ensure the new repo(s) cannot be pushed/pulled with the original which could mess up all your well-laid plans. By "converting" the entire repo, IIRC all the changeset IDs will be renewed and the repo will not look like a relative of the original.
My advice would be to work in stages:
Clone the original to clone1
Use strip to get as much done as you can
Make clone2 from clone1
Run convert task #1
Make clone3 from clone2
Run convert task #2
etc.
I recommend not batching many different convert things together - run them in small increments. This may seem like it will take longer, but its much easier to get right this way.
Ultimately you can remove all the intermediate clones, but through experience I found this was the most productive & safest way to make big structural changes.
Just use described in wiki format of branchmap. In your case it will be easy plain
default-other default
...
renamed-child1 default
renamed-child2 default
...
Addition
Because Convert extension can't stop conversion of condition "before ", only "after " (see --rev option), you must to prepare repository for conversion before convert process (clone into new intermediate repo only subset or original changesets without default, only default-other and descendants?)
I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.
For larger teams, having to pull/update/merge then commit each time makes no sense to me, specifically when the files that were changed by other developers have nothing to do with my changeset files.
i.e. I change file1.txt, and someone else changes file10.txt. Why must I merge on my computer before being allowed to push?
It makes pushing a big pain, as you have to constantly pull/update/merge if many developers are commiting.
Also, it makes your changeset look much larger than it was since it shows your merges as seperate commits.
Mercurial makes you do this since its atomic unit isn't a file but a changeset. That is a node containing a group of changes. Each changeset is an individual node in history and represents what that person did. This does result in you having to merge even if no common files where changes (which would be a simple automatic merge). These merge nodes are important since they are part of your repositories history and gives Mercurial more information for future merges with ancestral information.
That said there is an extension you can use that would clean up your history a bit (but won't resolve your issue with needing to pull before you push). It is called the rebase extension, it is shipped with Mercurial but disabled by default. It adds a new arumument to pull that looks like:
hg pull --rebase
This will pull new changes and moves your local changeset linearly above them without having a merge changset. However, I would urge against using this since you do lose information about your repository since you are re-writing its history. Read this post for information about some issues that this may cause.
Well, you could try using rebase, which will avoid the merge commits, but it is not without its own perils. You can also collapse to one step by doing "hg pull --update", rather than separate hg pull; hg update commands.
As for why you must merge on your computer: this is a direct consequence of mercurial being a distributed version control system. There is no central server which can be considered canonical (unless you create one by convention), so there is no other "place" where the merge could occur. You are the only one who can decide how the information in your repo should be combined with the information in the remote repo. The results of these decisions must be recorded, and that is the origin of the merge commit.
Also, in your example the merge would happen without user interaction since there are no conflicts (the same would be true with rebase), so I don't see why that is a problem.
Because having changes in disjunct files does not guarantee that they are independent.
When you pull in changes, even if they are in files that are untouched by your local changes, it can cause your local changes to stop working. E.g. an interface that you access from newly written code could have been changed.
This is why there is always a merge step inbetween, so that a human can review the changes, test for issues, and address them before integrating the changes back into the main repository. This step is very important, because skipping it risks blocking all those 50-100 colleagues (which is very expensive).
I would take Lasse’s advice and push less often. Merging isn’t a big deal if you only need to do it twice or thrice a day. Also maybe create smaller team repositories (or branches) that are merged with the main repository daily by a designated person.
I currently use SVN for a number of things that aren't exactly code, for instance xml files, report templates, miscellaneous files, etc. I have several non-developers who are comfortable using TortoiseSVN for this. They typically work as follows:
Person A - does an SVN Update on the folder of interest to them. Or perhaps just on a single file.
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else is probably working on different files at this point
Person A - does an SVN Commit to save their changes to the repository.
Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine.
I'd like to move to Mercurial, however something holding me back is the prospect of having do 'merge' all the time, because Mercurial looks at the state of the entire repository, not just the files of interest at a particular time. e.g. the workflow would be like this:
Person A - does a pull and update on the repository. (let's assume there are no local changes so this is straightforward).
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else edits, commits, and pushes a different file at this point
Person A - commits changes. Tries to push. Gets an error about multiple heads.
Person A - does a pull and update. update doesn't work: merge required.
Person A - does a merge. If using TortoiseHg it's a bit confusing working out what to click on to do the merge. I guess this is simpler on the command line, provided there are no complications.
Person A - commits the merge.
Person A - pushes the changes.
My resistance is that there are more steps, and the merge step is somewhat hard to get your head around if you're not a developer. Is there a way I can put these steps together to make the process nice and simple?
"Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine."
If this is the case why do you want to use a DVCS? Mercurial is great, but the benefits of a DVCS come from the ability to merge and fork and the ease of doing either, if your workflow requires neither why would you want to switch toolset?
Sounds like the rebase extension might work for you. The workflow becomes:
hg clone
make changes
hg commit
hg pull --rebase
hg push
The local revisions get "rebased" onto the latest tip on pull, which avoids the merge.
One possible approach is to have a point person who does all the real work of merging. I'm not a big fan of letting everyone push to one shared repos, expecially if they don't know what they are doing. An alternative approach is that A has local repos A, B has local repos B, and there is repos S, which combines A and B. Then, don't let A or B push to S. Instead let an expert pull from A and B, and do the merging in S. Then A and B never have to push to S. If they coordinate with the expert, then he/she will already have merged their changes into S by the time they pull updates from S, and so A and B will not have to merge either when pulling. This is actually the default mode in which DVCS works, since by default all repositories are read-only except by their owner.
...so I've gotten used to the simple stuff with Mercurial (add, commit, diff) and found out about the .hgignore file (yay!) and have gotten the hang of creating and switching between branches (branch, update -C).
I have two major questions though:
If I'm in branch "Branch1" and I want to pull in some but not all of the changes from branch "Branch2", how would I do that? Particularly if all the changes are in one subdirectory. (I guess I could just clone the whole repository, then use a directory-merge tool like Beyond Compare to pick&choose my edits. Seems like there ought to be a way to just isolate the changes in one file or one directory, though.)
Switching between branches with update -C seems so easy, I'm wondering why I would bother using clone. I can only think of a few reasons (see below) -- are there some other reasons I'm missing?
a. if I need to act on two versions/branches at once (e.g. do a performance-metric diff)
b. for a backup (clone the repository to a network drive in a physically different location)
c. to do the pick&choose merge like I've mentioned above.
I use clone for:
Short-lived local branches
Cloning to different development machines and servers
The former use is pretty rare for me - mainly when I'm trying an idea I might want to totally abandon. If I want to merge, I'll want to merge ALL the changes. This sort of branching is mainly for tracking different developers' branches so they don't disturb each other. Just to clarify this last point:
I keep working on my changes and pull my fellow devs changes and they pull mine.
When it's convenient for me I'll merge ALL of the changes from one (or all) of these branches into mine.
For feature branches, or longer lived branches, I use named branches which are more comfortably shared between repositories without merging. It also "feels" better when you want to selectively merge.
Basically I look at it this way:
Named branches are for developing different branches or versions of the app
Clones are for managing different contributions to the same version of the app.
That's my take, though really it's a matter of policy.
For question 1, you need to be a little clearer about what you mean by "changes". Which of these do you mean:
"I want to pull some, but not all, of the changesets in a different branch into this one."
"I want to pull the latest version of some, but not all, of the files in a different branch into this one."
If you mean item 1, you should look into the Transplant extension, specifically the idea of cherrypicking a couple of changesets.
If you mean item 2, you would do the following:
Update to the branch you want to pull the changes into.
Use hg revert -r <branch you want to merge> --include <files to update> to change the contents of those files to the way they are on the other branch.
Use hg commit to commit those changes to the branch as a new changeset.
As for question 2, I never use repository clones for branching myself, so I don't know. I use named branches or anonymous branches (sometimes with bookmarks).
I have another option for you to look into: mercurial queues.
The idea is, to have a stack of patches (no commits, "real" patches) ontop of your current working directory. Then, you can add or remove the applied patches, add one, remove it, add another other one, etc. One single patch or a subset of them ends up to be a new "feature" as you probably want to do with branches. After that, you can apply the patch as usual (since it is a change). Branches are probably more useful if you work with somebody else... ?