Ponderings of a Subversion User: What is a "branch" in Mercurial terms?

Ponderings of a Subversion User: What is a "branch" in Mercurial terms? - mercurial

I'm a Subversion user, and I think I've got my head mostly around it all now. So of course now we're thinking of switching to Mercurial, and I need to start again.
In our single repository, we have the typical branches, tags, trunk layout. When I want to create a feature branch I:
Use the repo browser to copy trunk to branches/Features/[FeatureName].
Checkout a new working copy from branches/Features/[FeatureName].
Start working on it.
Occasionally commit, merge trunk in, resolve conflicts and commit.
When complete, one more merge of trunk, then "Reintegrate" the feature branch into trunk.
(Please note this process is simplified as it doesn't take into account release candidate branches etc).
So I have questions about how I'd fulfil the same requirements (i.e. feature branches rather than working on trunk) in Mercurial:
In Mercurial, is a branch still within the repository, or is it a whole new local repository?
If we each have a copy of the whole repository, does that mean we all have copies of each other's various feature branches (that's a lot of data transfer)?
I know Mercurial is a DCVS, but does that mean we push/pull changes from each other directly, rather than via a peer repository on a server?

I recommend reading this guide
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial//

In Mercurial, is a branch still within
the repository, or is it a whole new
local repository?
The equivalent of the subversion way of working would be a repository with multiple heads in mercurial. However, this is not the idiomatic way of doing things. Typically you will have only one head in a given repository, so separate repositories for each branch.
If we each have a copy of the whole
repository, does that mean we all have
copies of each other's various feature
branches (that's a lot of data
transfer)?
Yes, if you look at the history of the head of your local repository, then you'll be able to see all the feature branches that were merged in. But mercurial repositories are remarkably space efficient. For example, I have done a hg clone https://www.mercurial-scm.org/repo/hg to get the source for mercurial itself, and it is only 34.3 MB on an NTFS file system (compared to the source code download, which is 1.8 MB). Mercurial will also make use of hardlinks if your file system supports it, so there is little overhead if you clone a repository to another location on the same disk.
I know Mercurial is a DCVS, but does
that mean we push/pull changes from
each other directly, rather than via a
peer repository on a server?
One way of working is indeed to have each developer expose a public repository in which he pushes his own changes. All other developers can then pull what they want.
However, typically you'll have one or more "blessed" repositories where all the changes are integrated. All developers then only need to pull from the blessed repository. Even if you didn't explicitly have such a blessed repository I imagine people would automatically organize themselves like that, e.g. by all pulling from a lead developer.

Steve Losh's article on branching in mercurial linked above is fantastic. I also got into some explaining of branching and how the DAG works in a presentation I gave a couple of months ago on mercurial that's out on slideshare. The pertinent slides start at slide #43.
I think that understanding that all commits to the same repository are stored in a DAG (Directed Acyclic Graph) with some simple rules really helps demystify what's going on.
a node with no child nodes is a "head"
the root node has no parents
regular nodes have a single parent
nodes that are the result of a merge have two parents
if a merge node's parents are from different branches, the child node's branch is inherited from the first parent
Named branches are really just metadata labels on commits, but really aren't any different than the anonymous branches that happen when you merge someone elses work into your repository, or if you go back to an earlier version and then make a commit there to make a new head (which you can later merge).

Related

Synchronizing actual version of Mercurial repository for multiple workplaces

I have three different Linux-based working places, each with a different computer. I need to have a repository synchronized to keep coding on the latest version each time I move from a workplace to another. You can always commit and push to, say, bitbucket and then pull from another computer, but this is not the purpose of a commit.
Other similar posts did not help, like Synchronizing a collection of Mercurial repositories.
Any suggestion?

Your two primary options for exchanging temporary work between repositories are Mercurial Queues and the evolve extension.
Mercurial Queues are documented fairly extensively here. To use them for your purpose, you have to put the patches under version control (explained near the bottom of the chapter) and can then push them to/pull them from a shared patch repository. Note that the book is a few years old and Mercurial has added some convenience features in the meantime. These days you can do operations on the patch repository directly via the --mq option (e.g., hg init --mq, hg commit --mq, hg push --mq) and don't need a bash alias for convenience.
Evolve is probably more intuitive; it provides a fairly straightforward approach to shared mutable history. You can commit changes in one repository, push the changes to a shared repository, pull from another and uncommit or alter them, then push them back.
In order to set this up, you need a shared repository somewhere that is declared as non-publishing. You do this by adding the following lines to its .hg/hgrc:
[phases]
publish = False
This prevents changesets exchanged through this repository from becoming public (at which point, they'd become immutable).
You will also need to install the extension first (unlike MQ, which is part of core Mercurial).
Note that Bitbucket currently does not support obsolescence markers, which are crucial for the functioning of changeset evolution, so you will need to host the shared repository in a different place. Evolve functions not by deleting outdated changesets, but by marking them as obsolete and hiding them (obsolescence markers also track how old and new changesets are related). Because Bitbucket does not support these markers, obsolete changesets will become visible again if pushed there. (Note that you can still use evolve locally or between evolve-aware repositories and use Bitbucket for public stuff.)

Slightly different ways:
Handwork
MQ with MQCollab extension
Commits with "classic" exchange between repos using MuliRepo extension (just don't forget hg pull on every workplace before pull - and add all remote repos into [multirepo] section on each workplace)
Automated way
Create additional "central hub" and use AutoSync extension

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?

Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.

Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.

Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

HgFlow and multiple developers

I really like the Hg Flow for Mercurial repositories. we are currently using Bitbucket, and in each product multiple developers are working. basically they can work as below:
a team might work on a single feature.
another team might work on a release/hot fix.
So do i keep the "develop" branch in BitBucket or local repositories. and how about feature branches, should i push them to the central repository and remove when required. i assume we should do so right?
Thanks

I personally neither use git flow or hg flow as tools, but I do use some of the methods for my own projects (manually).
Before going into detail, you always need to provide branches in the main/bitbucket repository when multiple people need to merge or branch from them.
This definately includes "develop" and probably also features/fixes multiple people need to work on (unless you have another repository or method to exchange branches/commits between them)
The difference between using git and mercurial/hg is relevant here, since the branching models are quite different.
See A Guide to Branching in Mercurial for details. Using hg bookmarks would be quite similar to what git does with branches, but there is no full support for the bookmark branching model on BitBucket (see this ticket).
hg flow (the tool) uses named branches. In contrast to git branches, these are not at all light-weight, but permanent and global (they can at least be closed now).
This means whenever any commit created on any (named) branch other than "default" is pushed to bitbucket (even after merging) this will create the branch in the bitbucket repository.
So you don't have any other choice than keeping all branches in the main repository.
However, You can decide when to push and when to close these.
I would advise using hg push -r to push only the branches/heads you want to push and only pushing these when they are either needed by somebody else or finished and merged.
Branches should be closed as soon they are not needed anymore. (This is probably done by hg flow automatically)
You should close branches locally whenever possible. This way they might not even appear in the bitbucket interface. Some might reach the bitbucket repository only in closed state (which hides them from the interface).
Obviously you should often push any branches multiple people need to merge from.
In my understanding of the workflow the "develop" branch is always exactly one branch per project that should be pushed frequently (after local testing).
In case you are either not using hg-flow or named branches things are a bit different.
Both, using forks/clones or bookmarks as a branching method doesn't generate permanent or necessarily global branches.
Like mentioned above, you can't use bookmarks (reliably) when you also want to use bitbucket pull requests. You have to push bookmarks separately. A normal push will only update (a head of) the branch so you might miss commits from other team members when marging later. Hg will tell you when a new head is created. In that case you might want to merge the branch with the remote bookmark into your branch before pushing.
When using forks as branches it works a bit like with bookmarks, but bitbucket has full support for that. You need to have a new fork on bitbucket for every branch.
You naturally only want to create extra forks if you need different people to work on it and you don't have other means of commit exchange for them. You will need at least a separate "develop" repository then.
I personally wouldn't use the full "flow" with hg on bitbucket.
For my projects the "develop" branch is the same as master/default, since I don't roll out releases with git (other than development builds, that wouldn't use the release branch anyways). I don't need a separate "production" branch, since tags can mostly be used for production usage.
I also don't create a separate "release-preparation" branch. There is only a point in time when I only apply bugfixes on develop and stop merging features. That obviously won't work when you need to work at the same time on features that are dependendant on features not to be released in the next release.
Always using the full "git flow" is easy because git branching is easy and light-weight.
Depending on the branching model you use and how supportive the other tools are,
using the full "hg flow" might not be "worth it".
The hg guide actually discourages use of named branches for short-lived branches.
See Feature separation through named branches.
The "easy" branching concept promoted in the guide is forking/cloning. Bookmarks would be the natural way to translate git flow if the tool/bitbucket support would be better (and bookmarks longer a core hg feature).
Disclaimer:
I prefer git when I can choose. I do use hg, but not as my personal choice.
You also might have considered most of this, but since you didn't state any of these details and accept an answer (in the comments) that is quite different to what you are asking, I wanted to elaborate a bit.
Edit:
To follow-up on the comments:
I think hg bookmarks are comparable to git branches because both are just movable pointers to commits.
The main difference is, that when you delete a branch in git, the commits are possibly lost (when not part of other branches or pointed to in a another branch before they are garbage collected). When you delete a bookmark in hg, then the commits are still part of the repository (part of the (named or default) branch) unless manually stripped.
Anonymous heads are related, but only as something the bookmarks point to. Without bookmarks pointing to them the anonymous heads are not usable as a branch to work with (for more than just a local merge) and share. When you have anonymous heads in a repository you don't know what they are supposed to be or where they came from, unless you remember or have other clues. In my eyes anonymous heads are only a workaround for late implementation of bookmarks and no good implementation of remotes/remote heads.
Named branches are rather unrelated, as the only thing they have in common with git branches is having a name. They are light-weight in comparision to cloning the whole repository (forking as branch model), but not in terms of "you can't get rid of them". They are permanent.
Most places tell you not to use named branches unless you have a very good reason or it is a long-running branch.

Mercurial and code reviews; good workflow?

I'm in a small distributed team using Mercurial for a central repository. We each clone it via ssh onto our own linux boxes. Our intent is to review each others' work before pushing changes up to the central repository, to help keep central's tip clean. What is a good way to share code between developers on different linux boxes? I'm new to Mercurial. The options I can think of (through reading, not experience) are:
1: Author commits all local changes and updates working clone with central's tip. Author uses hg bundle, if there's a way to specify which local revs to include in the bundle. (an experiment showed me "bundle" only grabs uncommited changes, even if there are previous local commits that central doesn't know about) Author gets bundle file to reviewer. Reviewer creates a new clean clone from central's tip, and imports the bundle into that clone.
or,
2: After both author and reviewer fetch from central's tip, author uses patch and reviewer imports the patch.
or,
3: Author pushes to reviewer or reviewer pulls from author (but how, exactly? What I read is only about pushing and pulling to/from the original served repository, and/or on the same box instead of between different linux boxes.)
4: Forget reviewing the code prior to pushing to central; go ahead and push, using tags to identify what's been reviewed or not, and use Hudson (already working) to tag the latest safe build so team members can know which one to pull from.
If your team uses Mercurial and does code reviews, how do you do get the reviewer to see your changes?

Most of these are possible, some are more tedious than others.
You can use bundle by specifying the tip of the central repo as the --base:
hg bundle --base 4a3b2c1d review.bundle
Might as well just use bundle. That way, the changeset data is also included.
You can push (and pull) to (from) any repository that has a common ancestor(s). If you want to pull from one of your colleagues, they just need to run hg serve on their machine, and you will be able to pull.
This also works, but you will have to maintain multiple heads and be careful about merging. If you don't, it can become easy to base a stable change on top of an unreviewed changeset, which will make it hard to undo if you need to fix that unreviewed changeset later.
Of the options you presented, #1 and #3 are probably easiest, just depending on whether or not you can reach each other's boxes.
On a related note: This is the question that got my colleague and I started on developing Kiln, our (Fog Creek's) Mercurial hosting and code review tool. Our plan, and the initial prototype, would keep multiple repositories around, one "central" repository, and a bunch of "review" repositories. The review process would be started by cloning the central repo into a review repo on the server, and then running a full repo diff between the two, with a simple web interface for getting and viewing the diffs.
We've evolved that workflow quite a bit, but the general idea, having a branch repo to push unreviewed changes to and an interface to review them before you push them into the central repo, is still the same. I don't want to advertise here, but I do recommend giving it a try.

Half answer to this question is using ReviewBoard with Mercurial extention. It allows to push certain revisions for review by issuing the following command
hg postreview tip

I'll add a 5th option - do all development work on named branches, preferably one per task. Allow anything to be committed to a "development" named branch, whether it's in working state or not.
Push to the central repository, have reviewer pull the branch. Perform the review on the branch.
When the review has passed, merge the development work into the appropriate feature branch.
This workflow, which is (to me) surprisingly unpopular, has many advantages:
All work gets committed - you do not have to wait for a review to be done before committing.
You will not build off the wrong version. You only ever build from the feature branch.
In-progress work does not interfere with other developers.
From the development branch, you can either look at the latest changes (e.g. the changesets addressing review comments), compare with the branch point, or compare with the latest feature branch - all of which can give useful information.

Mercurial Branch Repositories with SUBREPOS

I'm trying to determine how people use "branch repositories" while also using subrepos.
Let's say I have repo Main containing a solution file (.NET), and populated with subrepos A, B, C:
/Main
- A
- B
- C
MainSolution.sln
A, B, and C, while being shared between other "Main" repos, are very tightly integrated into Main project. Thus, a major feature to the Main repo will require modifications to the subrepos (i.e., they are shared libraries, but are very actively developed).
Now it is time to add a feature. This feature is too big for one person to handle, and thus the code will need to be pushed to the central repo so others can help. We'd also need to be able to go back to the last "stable" code before the feature development began in case a bugfix is needed. I believe I have two options at this point: (1) create a named branch in the Main repo, or (2) create a new clone of Main. Since there are subrepos, both of these options have repercussions not present typically.
Option 1) Creating a named branch will, I presume, allow modifications to the subrepos to be committed/pushed, but only other people who have also updated to that branch in their clone of Main will be affected, since the .hgsubstate file is tracked. However, the subrepos will get a new head, and thus the (possibly) experimental feature would end up getting pushed to the central repo. Am I understanding this correctly?
Option 2) There are numerous advocates for the "don't use named branches, use 'branch repositories'", which are literally clones of the main repo, but named differently and existing on the central server. This is a little appealing to me, as it seems to keep things separated (and thus detached from disaster as co-workers --and myself!-- are still learning Mercurial). But this workflow seems completely broken when subrepositories are involved, since creating a clone of the Main repo does not create new, separated clones of the subrepos. It's a new clone, but it's still pointing at the same subrepos, and thus changes made to them will find their way back into the subrepos! I realize this is by design, and it's one of the really cool things (to me) about Mercurial. But how on earth do people use this branch repository workflow with subrepositories? It is completely inconceivable that, for each feature/experiment/version/whatever, I'm going to create a new clone (on the central server) of the Main repo, AND create clones (on the central server) of the subrepos, AND modify all the .hgrc/.hgsub paths to point to the proper central repos.
At this point, I'm just trying to understand HOW people work on a complicated project and use subrepos with branch repositories. Any thoughts?

You have other options as well. You could use bookmarks, for example. Since version 1.9, bookmarks can be pushed and pulled, they're not just local anymore. Since you often don't want a development "branch" to stick around as a named branch after that new feature is completed, bookmarks are often a better choice for that kind of thing. I tend to use bookmarks for new development and save real branches for released versions.
You should also be aware that subrepositories don't have to be shared between multiple main repositories in the way you describe. You can actually have the subrepositories stored inside a main repository (as opposed to having them at the same level as the main repos, or stored in some other location entirely), which would make them unique to each main repository, except you can push and pull from the subrepos in other main repos when you want to share those changes. This is the way I usually do it.
Unfortunately much of this is difficult to explain without a whiteboard, so please let me know if this isn't clear.

I prefer named branches for features that will most likely eventually get merged into the default branch. It is much easier to switch branches than switch repos.
With named branches you never need to worry about accidentally pushing your unstable branch of development into the stable repo. The named branch is already there, but won't be retrieved via an update unless a developer asks for it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008