Help understanding the benefits of branching in Mercurial - mercurial

I've struggled to understand how branching is beneficial. I can't push to a repo with 2 heads, or 2 branches... so why would I ever need/use them?

First of all, you can push even with two heads, but since you probably don't want to do that, the default behavior is to prevent you from doing it. You can, however, force the push to go through.
Now, as for branching, let's take a simple scenario in a non-distributed version control system, like Subversion.
Let's assume you have a colleague that is working in the same project as you. The current latest changeset in the Subversion repository is revision 100, you both update to this locally so that now both of you have the same files.
Ok, now your colleague has already been working on his changes for a couple of hours now, and so he commits. This brings the central repository up to revision 101. You're still on revision 100 locally, and you're still working on your changes.
At some point, you complete, and you want to commit, but Subversion won't let you. It says you have to update first, so you start the update process.
The update process wants to take your changes, and pretend you actually started with revision 101 instead of 100. If your changes are not in conflict with whatever it was your colleague committed, all is hunky dory, but if your changes are in conflict, you have a problem.
Now you have to merge your changes with his changes, and things can go haywire. For instance, you might end up merging one file OK, the second file OK, or so you think, and then the third file, and you suddenly discover that you've got some of the details wrong, it would've been better to merge the second file differently.
Unless you made a backup of your changes before updating, and sooner or later you will forget, you have a problem.
Now, the above scenario is actually quite common. Well, perhaps not the merging part, it depends on how many is working in the same area or files at the same time, but the "must update before committing" part is quite common with Subversion.
So how does Mercurial do it?
Well, Mercurial commits locally, it doesn't talk to any remote repository at all, so it won't stop you from committing.
So, let's try the above scenario again, just in Mercurial this time.
The tipmost changeset in the remote repository is revision 100. You both have cloned this down, and you're both starting to work on the changes, from revision 100.
Your colleague completes his changes and commits, locally. He then pushes his changeset up to the central repository, bringing the tip there up to revision 101.
You then complete your changes, and commit, also locally, and then you want to push, but you get the error message you've already discovered, and is asking about.
So how is this different?
Well, your changes are now committed, there is no way, unless you try really hard to accidentally lose them or destroy them.
Here's the 3 repositories in play and their current state:
Colleague ---98---99---100---A
Central ---98---99---100---A
You ---98---99---100---B
If you were to push, and was allowed to do this (or force the push through), the Central repository would look like this:
Central ---98---99---100---A
\
+--B
Two heads. If your colleague now pulled, which one should he continue working from? This question is the reason Mercurial will by default prevent you from causing this.
So instead you pull, and you get the above state in your own repository.
In other words, you can chose to impact your own repository and create multiple heads there, but you are not imposing that problem on anyone else.
You then merge, the same type of operation you had to do in Subversion, except your changeset is safe, it was committed, and you won't accidentally corrupt or destroy it. If, mid-merge, you want to start over, you can, nothing lost, no harm done.
After the merge, your local repository looks like this:
You ---98---99---100---A----M
\ /
+--B--+
This is now safe to push, and if your colleague now pulls, he knows that he has to continue from the M changeset, the one that merged his and your changes.
The above description is what happens due to Mercurials distributed nature.
You can also name branches, to make them more permanent. For instance, you might want to name a branch "stable", to signal that any changesets on that branch have been thoroughly tested and is safe for release to customers or to put into production. Then you would only merge changes onto that branch when said testing has been completed.
The nature, however, is the same as the above description. Whenever more than one person works on a project with Mercurial, you will get branches, and that's a good thing.

Whenever more than one clone of a repo is made and commits are made in those clones, branches happen, whether you name them by using the hg branch command or not. My philosophy is, you might as well give them a name. It makes things less confusing.
A good explanation of mercurial branches: http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Pull commits on repo post-rebase

I'm looking for a simple way to pull in additional commits after rebasing or a good reason to tell someone not to rebase.
Essentially we have a project, crons. I make changes to this frequently, and the maintainer of the project pulls in changes when I request it and rebases every time.
This is usually okay, but it can lead to problems in two scenarios:
Releasing from two branches simultaneously
Having to release an additional commit afterwards.
For example, I commit revision 1000. Maintainer pulls and rebases to create revision 1000', but at around the same time I realize a horrible mistake and create revision 1001 (child of 1000). Since 1000 doesn't exist in the target branch, this creates an unusable merge, and the maintainer usually laughs at me and tells me to try again (which requires me getting a fresh checkout of the main branch at 1000' and creating and importing a patch manually from the other checkout). I'm sure you can see how the same problem could occur with me trying to release from two separate branches simultaneously as well.
Anyway, once the main branch has 1000', is there anything that can be done to pull in 1001 without having to merge the same changes again? Or does rebasing ruin this? Regardless is there anything I can say to get Maintainer to stop rebasing? Is he using it incorrectly?
Tell your maintainer to stop being a jacka**.
Rebasing is something that should only be done by you, the one that created the changesets you want to rebase, and not done to changesets that are:
already shared with someone else
gotten from someone else
Your maintainer probably wants a non-distributed version control system, like Subversion, where changesets follows a straight line, instead of the branchy nature of a DVCS. In that respect, the choice of Mercurial is wrong, or the usage of Mercurial is wrong.
Also note that rebasing is one way of changing history, and since Mercurial discourages that (changing history), rebasing is only available as an extension, not available "out of the box" of a vanilla Mercurial configuration.
So to answer your question: No, since your maintainer insists on breaking the nature of a DVCS, the tools will fight against you (and him), and you're going to have a hard time getting the tools to cooperate with you.
Tell your maintainer to embrace how a DVCS really works. Now, he may still insist on not accepting new branches or heads in his repository, and insist on you pulling and merging before pushing back a single head to his repository, but that's OK.
Rebasing shared changesets, however, is not.
If you really want to use rebasing, the correct way to do it is like this:
You pull the latest changes from some source repository
You commit a lot of changesets locally, fixing bugs, adding new features, whatnot
You then try to push, gets told that this will create new heads in the target repository. This tells you that there are new changesets in the target repository that you did not get when you last pulled, because they have been added after that
Instead, you pull, this will add a new head in your local repository. Now you have the head that was created from your new changesets, and the head that was retrieved from the source repository created by others.
You then rebase your changesets on top of the ones you got from the source repository, in essence moving your changesets in the history to appear that you started your work from the latest changeset in the current source repository
You then attempt a new push, succeeding
The end result is that the target repository, and your own repository, will have a more linear changeset history, instead of a branch and then a merge.
However, since multiple branches is perfectly fine in a DVCS, you don't have to go through all of this. You can just merge, and continue working. This is how a DVCS is supposed to work. Rebasing is just an extra tool you can use if you really want to.

Is there any way of deleting a branch that has been pushed to a repo shared by all developers?

I have accidentally pushed a branch to a repo. Is there anyway I could alter the repo ( and remove the branch )? Closing it is not a solution.
You got a couple of options, none of them easy, and none of them will leave you with a "phew, saved by the bell" feeling afterwards.
The only real way to fix this problem is to try to avoid it in the first place.
Having said that, let's explore the options here:
Eradicate the changesets
Introduce further changesets that "undo" the changes
The first option, to eradicate the changesets, is hard. Since you pushed the changesets to your central repository, you need direct access to the repositories on that server.
If this is a server where you don't have direct access to the repositories, only through a web interface, or through push/pull/clone, then your option is to hope that the web interface have methods for eradicating those changesets, otherwise go to option 2.
In order to get rid of the changesets, you can either make a new clone of the repository with the changesets, and specify options that stop just shy of introducing the changesets you want to get rid of, or you can use the MQ extension and strip the offending changesets out.
Either is good, but personally I like the clone option.
However, this option hinges on the fact that any and all developers that are using the central repository either:
Have not already pulled the offending changesets from the central repository.
Or are prepared to get rid of said changesets locally as well.
For instance, you could instruct all your developers to kill their local clones, and reclone a fresh copy after you have stripped away the changesets in the central repository.
Here's the important part:
If you cannot get all developers to help with this, you should drop this line of thought and go to option 2 instead
Why? Because now you have two problems:
You need to introduce barriers that ensure no developers can push the same changesets onto the server again, after you got rid of them. Note that relying on the warning by the server to prevent new branches being pushed is perhaps not good enough, as developers might have branches of their own they want to push, and thus not notice that they'll be pushing yours as well.
Any work any developer has done based on any of the offending changesets must either be rebased to a new place, or eradicated as well.
In short, this will give you lots of extra work. I would not do this unless the offending changesets were super-critial to get rid of.
Option 2, on the other hand, comes with its own problems, but is a bit easier to carry out.
Basically you use the hg backout command to introduce a new changeset that reverses the modifications done by the offending changesets, and commit and push that.
The problem here is that if at some point you really want to introduce those changesets, you will have to fight a bit with Mercurial in order to get the merges right.
However, there will be no more work for your fellow developers. The next time they pull, they'll get your correction changeset as well.
Let me just restate this option in different words:
Instead of getting rid of the changesets, keep them, but introduce another changeset that reverses the changes.
Neither option is good, both will generate some extra work.
We've ran into a similar problem once, when we had to remove a branch from the server repo from which all devs regularly pull. Backout wasn't an option because the problematic branch had already been pulled by everyone.
We stripped (hg strip from the MQ extension) the branch in the server repo. From now on, if a developer tried to push, he had a message “push creates new remote branches”, even though they didn't actually created any. We created a batch file with the strip command, distributed it among the devs and explained the “new remote branches” is a signal to run the batch file.
This approach takes some time and effort before everybody gets rid of the branch, but it works.
If the 'backout' option described in Jason's comment above doesn't do it for you, you can remake the repo up until the point of your mistaken push using hg convert, which (despite its name) also works with hg.
eg hg convert -r before-mistaken-push /path/to/original /path/to/new
You might have to play with the usebranchnames and clonebranches settings.

Correct (best-practise?) procedure to stay in sync with a remote Mercurial repository?

As a former user of Subversion, we've decide to move over to Mercurial for SCM and it is confusing us a little. Although Mercurial is a distributed SCM tool we are using a remote repo to keep changes we make backed up on a server but we are finding a few teething troubles.
For example, when two or three of us work on our local repo's, we commit then push to the remote repo, we find that a lot of heads(?) are created. This confused the hell out of us and we had to do some merging etc to sort it out.
What is the best way to avoid so many heads and to keep a remote repo in sync with a number of developers?
Today, i've been working like this:
Change a File.
Pull from remote repo.
Update local working copy.
Merge? (why?)
Commit my changes to local repo.
Push to the remote repo.
Is this the best proceedure?
Although this has worked fine today, i can't help that feeling that i'm doing it wrong! To be honest i don't understand why merging even needs to be done at the pull stage because other people are working on different files?
Other than to tell me to RTFM have you any tips for using Mercurial is such a way? Any good online resources for information on why we get so many heads?
NOTE: I have read the manual but it doesn't really give much detail and i don't think i want to start another book at the minute.
You should definitely find some learning resources.
I can recommend the following:
hginit.com
Tekpub: Mercurial
As for your concrete question, "is this the best procedure", then I would have to say no.
Here's some tips.
First of all, you don't need to stay "in sync" with the central repository at all times. Instead, follow these guidelines:
Push from your local repository to the central one when you're happy with the changes you've committed. Remember, this can be several changesets
Pull if you need changes others have done right away, ie. there's a bugfix a colleague of yours has fixed, that you need, in order to continue with your own work.
Pull before push
Merge any extra heads you pulled down with your own changes, before you push, or continue working
In other words, here's a typical day.
You pull the latest changes when you come in in the morning, so that you got an up to date local clone. You might not always do this, if you're in the middle of bigger changes that you didn't finish yesterday.
Then you start working. You commit small changesets with isolated changes That isn't to say that you split up a larger bugfix into many smaller commits just because you modify multiple files, but try to avoid fixing more than one bug at a time, or implementing more than one feature at a time. Try to stay focused.
Then, when you're happy with all the changesets you've added locally, you decide to push to the server. When you try to do this, you get an abort message saying that extra heads would be pushed to the server, and this isn't allowed, so the push is aborted.
Instead you pull. This can always be done, but will of course now add extra heads in your local clone, instead of at the server.
Then you merge, the extra head that you got from the server, with your own head, the one that you created during the day by committing new changesets to your clone. You resolve any merge conflicts.
Then you push, and now it should succeed. On the off chance that someone has managed to push more changesets to the central repository while you were busy merging, you will get another abort and have to rinse and repeat.
The history will now show multiple parallel branches of development, but should always stay at max 1 head in your central repository. If, later on, you start using named branches, you can have 1 head per named branch, but try to avoid this until you get the hang of just the default branch.
As for why you need to merge? Well, Mercurial always work with revisions that are snapshots of the entire project, which means two branches, even though they contain changes to different files, are really considered two different versions of the entire project, and you need to tell Mercurial that it should combine them to get back to one version.
For one, you can pull at any time; pulling does just add changesets to your repo, but not change your local working files (except if you have enabed the post-pull update).
Merging is necessary if someone else has commited changes to the same branch you're currently working on. This created an implicit branch, and merging merely brings them back together. You can see this nicely with the "railroad track" in the repository view. Basically, as long as you don't merge, you stay on your own "private" track, and when you want to add your changes (can be any amount of changesets) you merge it back into the destination branch (typically "default"). It's painless - nothing like merging in older SVN versions!
So the workflow is not as rigid as you displayed it; it's more like this:
Pull as much as you like
Make changes and commit locally as often as you like
When your changes should be integrated, merge with the destination branch (can be a lower revision than the newest), commit and push
This workflow can be tuned somewhat, for instance by using named branches and sometimes by using rebase. However, you and your team should decide on the workflow to be used; Mercurial is quite flexible in this regard.
http://hginit.com has a good tutorial.
In particular, you'll find the list of steps you have here: http://hginit.com/02.html (at the bottom of the page)
The difference between those steps and yours is that you should commit after step 1. In fact you will typically commit several times on your local repository before moving onto the pull/merge/push step. You don't need to share every commit with the rest of developers right away. It'll often make sense to do several related changes and then push that whole thing.

Mercurial repositories with many active developers?

I'm going through Bitbucket and I can't seem to find any Mercurial repositories that look like what I suspect our repository would look like, provided we switch to Mercurial.
As such, I'm wondering, is there a workflow that we're not considering here?
The thing I'm talking about is that I did a small automated test. We're 14 people that work on the same project, split into 4 scrum teams. To simulate 14 (I picked 10, round number) people working in parallel on the code, using Mercurial DVCS, pushing to the same central master repository, I wrote a script.
I created a new "master" repository, and then cloned it for 10 virtual people
I then ran a 1000 iteration loop, picking a random clone, and doing one of the following:
10% of the time, do a pull from master, merge, commit merge, and push
90% of the time, do a local change and commit
Note that I ensured that there would never be merge conflicts by simply making each virtual person work on his own file.
This would simulate people working locally by doing 1+ commits before pulling, merging, and pushing (to avoid 2+ heads in the master repo). It might be that this workflow is wrong.
This is a sample of what the repository now looks like (screenshot + link to repo):
The repository can be found here: http://hg.vkarlsen.no/hgweb.cgi/parallel_test/graph. Unfortunately this repository is no longer available and I no longer have a copy of the code due to an unfortunate backup incident, but this was just an example for people to visit, it should not be important any more
This looks awfully messy, and as I said, I can't seem to find any repositories that have similar history. By "messy", I mean that it looks like older history of the project will almost always have 10 parallel branches. Close to the top, it tapers off of course, but it will expand as people that are currently working in their local repository pushes to the master.
So I have two questions:
Can anyone show me a repository that has similar history? Since I can't seem to find any, I'm starting to wonder about what kind of conclusions I can draw from that...
Is there something wrong with our workflow (that is, the workflow I've laid out here)? Should we rebase/squash/transplant, delegate push responsibility to one person, other things, instead of the way it was done here?
Impressive preparation!
It always looks messy if you go back a bit and look at all old commits at the same time. It always tapers of, even looking at a small bit old history. See http://hg.intevation.org/mercurial/crew/graph/12402?revcount=120 for instance. This is not the most recent commit, but shows all history up to that commit.
Rebase helps quite a lot, especially if persons are working on separate areas. (I usually check the incoming commits to see if there are potential file or functionality conflicts, and if not, I do rebase.)
Rebase is not fool-proof though, so merge is the preferred "safe" action, but it leaves more "garbage" in the history. A trade-off.
Rebase is sort-of like the bog standard SVN update. The existing stuff is made baseline and your changes go on top, cross your fingers it still works. It's useful, but there are times when you feel safer having yours, theirs and the merge as separate commits in the history.
There is also commit-squashing as an option (histedit extension maybe), which squashes all in-between commits to one. This is useful when you're about to push and want to transferring many partials commits in your own repo as a single commit to the main.
I have 12 developers working in the same Mercurial repository at work, and our history looks nothing like that. There are occasional merge commits, but most merges are from merging actual branches, i.e there might be a merge in our main development branch bringing in changes from a bugfix release made on the production/release branch.
This is very easy to achieve, developers hack and commit to their local repository and when they have something stable enough to share with the rest of the team they push.
If nothing has been committed since they started committing the push goes through without problems.
If someone else has committed a change, Mercurial complains that the push will create remote heads. The developer then does a hg pull --rebase and retries the push. The push goes through and everyone is happy.
If you are using continuous integration with developers regularly pushing to a shared repository, this is the way to go. Knowing whether you have pushed changes or not is easy and you avoid lots of useless merge commits cluttering up your history.