Creating multiple heads in remote repository

Creating multiple heads in remote repository - mercurial

We are looking to move our team (~10 developers) from SVN to mercurial. We are trying to figure out how to manage our workflow. In particular, we are trying to see if creating remote heads is the right solution.
We currently have a very large repository with multiple, related projects. They share a lot of code, but pieces of the project are deployed by different teams (3 teams) independent of other portions of the code-base. So each team is working on concurrent large features.
The way we currently handles this in SVN are branches. Team1 has a branch for Feature1, same deal for the other teams. When Team1 finishes their change, it gets merged into the trunk and deployed out. The other teams follow suite when their project is complete, merging of course.
So my initial thought are using Named Branches for these situations. Team1 makes a Feature1 branch off of the default branch in Hg. Now, here is the question. Should the team PUSH that branch, in it's current/half-state to the repository. This will create a second head in the core repo.
My initial reaction was "NO!" as it seems like a bad idea. Handling multiple heads on our repository just sounds awful, but there are some advantages...
First, the teams want to setup Continuous Integration to build this branch during their development cycle (months long). This will only work if the CI can pull this branch from the repo. This is something we do now with SVN, copy a CI build and change the branch. Easy.
Second, it makes it easier for any team member to jump onto the branch and start working. Without pushing to the core repo, they would have to receive a push from a developer on that team with the changeset information. It is also possible to lose local commits to hardware failure. The chances increase a lot if it's a branch by a single developer who has followed the "don't push until finished" approach.
And lastly is just for ease of use. The developers can easily just commit and push on their branch at any time without consequence (as they do today, in their SVN branches).
Is there a better way to handle this scenario that I may be missing? I just want a veteran's opinion before moving forward with the strategy.
For bug fixes we like the general workflow of Mercurial, anonymous branches that only consist of 1-2 commits. The simplicity is great for those cases.
By the way, I've read this, great article which seems to favor Named branches.

You're definitely thinking about this right, and it sounds like you're going down a good path. I'm a branches as clones person, but named branches have come a long way.
Having a central-ish repo to which all named branches are pushed is convenient for control and backups. Teams working on only branch X can easily create their own branch X only repo by doing hg clone -r X central-ish repo.
The best thing you can do to help the teams out is to let them do clones themselves somewhere that's sitting behind a hgwebdir.cgi instance (as, presumably your central-ish repo will be). You'll find not just teams, but sub-teams and pairs of teams will set up their own repos for mini-efforts you never new about. They'll put them on the named branches that make sense to them and merge back into central as appropriate.

I would make the decision if these three projects should go into one repository by the coupling between these projects (and how many patches are interchanged within them). The more independent they are the less are the advantages of having them in one repo (backup and management aside). There are some different kind of setups:
As you showed, one repository, with one branch for the shared code, and one branch for each project. When the projects itself are generated by forking the shared code base care must be taken when merging back to common (cherry-picking). When inside of each project-branch updates to the common-branch are generated as direct ancestors of the common-branch, and get merges into the project-branch, chances are good they can also be merged back into common. But if changes to common are developed on top of the project branch, merging back will require cherry-picking. I don't have experiences with such a setup, but I fear that the merges can get problematic.
one repo for the shared code and one for each project, connected by symlinks or as subrepo. Here care must be taken to not step on each others feed. In my experience this kind of usage has the potential to grow into a very big PITA. OTOH you seem to have this setup already and your fellow developers can work with it.
one repo for shared and one for each project, with the code from the shared one used as internal releases. I would go for this setup when there are not big regular changes on the shared code base.
All these situations can also be combined with customization-branches for each project within the common part. But I would try to minimize the number of currently active branches, since every new branch requires additional care of merges.
I'm sorry to not give a concrete answer, but "The right thing" (TM) depends to much on the local details.

Related

Mercurial: How to synchronize multiple people with the intention to pull/merge/push from/with/to a central repository?

Given a central mercurial respository and mutliple teams with access to this repository, every team for once in a while has to synchronize its changes to the centralized one.
Let's say team A and team B are in the state and have the intention to pull/merge/push to the centralized repository.
A pulls.
B pulls.
A merges
B merges.
B pushes. => A has to pull and merge again before pushing!
Is there a (maybe technical) way to avoid this conflict? Maybe some kind of lock? Or is this something one has to live with? :-)
By the way this "conflict" occures in a smaller scale within the teams, too. But since the team members are in the same room one can solve this by just "shouting" his intentions throughthe room to avoid the conflict.

The simplest way may to use hg pull --rebase when pulling -- this will rebase local changes on top of the changes from the main repository and avoid having too many merges (although this actually isn't a problem for Mercurial, it may be a problem for developers trying to understand the history).
In your example, if you change B merges to B rebases, then A will still have to pull and merge again, but there will be fewer unnamed branches to try to follow.
In any case, whether rebasing or merging following a pull, I would recommend doing an immediate push so that the changes are available to everybody.
But again, I would like to emphasize that this "problematic" scenario is only problematic because of the inconvenience for the developer. This is actually the workflow DVCS's like Mercurial are designed to handle. The "standard" workflow has always been to "pull and merge/rebase" before pushing -- hence Mercurial's warning about creating new heads if you try to push without merging.

Usually you can live with it, IMHO it is common sense to work this way.
You can have a workaround, but it misuses the meaning of branches.
A and b creating branches, leaving the tip/default/master untouched.
Both can commit and push their changes, but this will create branches on the
central repository.
This could cause serious conflicts if the naming is equal, there a naming convention might help.
Someone, mostly the build engineer or system architect has to merge the changes from the different branches.
If both working on different features on the same component, it might be a good way to do so.
I would recommend you to read this chapter http://hgbook.red-bean.com/read/managing-releases-and-branchy-development.html.

Mercurial repositories with many active developers?

I'm going through Bitbucket and I can't seem to find any Mercurial repositories that look like what I suspect our repository would look like, provided we switch to Mercurial.
As such, I'm wondering, is there a workflow that we're not considering here?
The thing I'm talking about is that I did a small automated test. We're 14 people that work on the same project, split into 4 scrum teams. To simulate 14 (I picked 10, round number) people working in parallel on the code, using Mercurial DVCS, pushing to the same central master repository, I wrote a script.
I created a new "master" repository, and then cloned it for 10 virtual people
I then ran a 1000 iteration loop, picking a random clone, and doing one of the following:
10% of the time, do a pull from master, merge, commit merge, and push
90% of the time, do a local change and commit
Note that I ensured that there would never be merge conflicts by simply making each virtual person work on his own file.
This would simulate people working locally by doing 1+ commits before pulling, merging, and pushing (to avoid 2+ heads in the master repo). It might be that this workflow is wrong.
This is a sample of what the repository now looks like (screenshot + link to repo):
The repository can be found here: http://hg.vkarlsen.no/hgweb.cgi/parallel_test/graph. Unfortunately this repository is no longer available and I no longer have a copy of the code due to an unfortunate backup incident, but this was just an example for people to visit, it should not be important any more
This looks awfully messy, and as I said, I can't seem to find any repositories that have similar history. By "messy", I mean that it looks like older history of the project will almost always have 10 parallel branches. Close to the top, it tapers off of course, but it will expand as people that are currently working in their local repository pushes to the master.
So I have two questions:
Can anyone show me a repository that has similar history? Since I can't seem to find any, I'm starting to wonder about what kind of conclusions I can draw from that...
Is there something wrong with our workflow (that is, the workflow I've laid out here)? Should we rebase/squash/transplant, delegate push responsibility to one person, other things, instead of the way it was done here?

Impressive preparation!
It always looks messy if you go back a bit and look at all old commits at the same time. It always tapers of, even looking at a small bit old history. See http://hg.intevation.org/mercurial/crew/graph/12402?revcount=120 for instance. This is not the most recent commit, but shows all history up to that commit.
Rebase helps quite a lot, especially if persons are working on separate areas. (I usually check the incoming commits to see if there are potential file or functionality conflicts, and if not, I do rebase.)
Rebase is not fool-proof though, so merge is the preferred "safe" action, but it leaves more "garbage" in the history. A trade-off.
Rebase is sort-of like the bog standard SVN update. The existing stuff is made baseline and your changes go on top, cross your fingers it still works. It's useful, but there are times when you feel safer having yours, theirs and the merge as separate commits in the history.
There is also commit-squashing as an option (histedit extension maybe), which squashes all in-between commits to one. This is useful when you're about to push and want to transferring many partials commits in your own repo as a single commit to the main.

I have 12 developers working in the same Mercurial repository at work, and our history looks nothing like that. There are occasional merge commits, but most merges are from merging actual branches, i.e there might be a merge in our main development branch bringing in changes from a bugfix release made on the production/release branch.
This is very easy to achieve, developers hack and commit to their local repository and when they have something stable enough to share with the rest of the team they push.
If nothing has been committed since they started committing the push goes through without problems.
If someone else has committed a change, Mercurial complains that the push will create remote heads. The developer then does a hg pull --rebase and retries the push. The push goes through and everyone is happy.
If you are using continuous integration with developers regularly pushing to a shared repository, this is the way to go. Knowing whether you have pushed changes or not is easy and you avoid lots of useless merge commits cluttering up your history.

Moving from Subversion to Mercurial - how to adapt the workflow and staging/integration systems?

We got all psyched about from from svn to hg and as the development workflow is more or less flushed out, here remains the most difficult part - staging and integration system.
Hopefully this question goes a bit further then your common 'how do I move from xxx to Mercurial'. Please forgive long and probably poorly written question :)
We are web shop that does a lot of projects(mainly PHP and Zend), so we have one huge svn repo, with like 100+ folders, each representing a project with it's own tags,branches and trunk of course. On our integration and testing server(where QA and clients look at work results and test stuff) everything is pretty much automated - Apache is set to pick up new projects automatically creating vhost for each project/trunk; mysql migration scripts right there in trunk too and developers can apply them through simple web-interface. Long story short our workflow is this now:
Checkout code, do work, commit
Run update on the server via web interface(this basically does svn up on server on a particular project and also run db-migration script if needed)
QA changes on the server
This approach is certainly suboptimal for large projects when we have 2+ developers working on the same code. Branching in svn was only causing more headaches, well, hence moving to Mercurial. And here is where the question lies - how does one organize efficient staging/integration/testing server for this type of work(where you have many projects, say single developer could be working on 3 different projects in 1 day).
We decided to have 'default' branch tracking production essentially and then make all changes in individual branches. In this case though how can we automate staging updates for each branch? If earlier for one project we almost always were working on trunk, so we needed one DB, one vhost, etc. now we potentially talking about N-databases per project, N-vhost configs and etc. Then what about CI stuff(such as running phpDocumentor and/or unit tests)? Should it only be done on the 'default'? On branches?
I wonder how other teams solve this issue, perhaps some best practices that we're not using or overlooking?
Additional notes:
Probably worth mentioning that we've picked Kiln as a repo hosting service(mostly since we're using FogBugz anyway)

This is by no means the complete answer you'll eventually pick, but here are some tools that will likely factor into it:
repositories without working directories -- if you clone -U or hg update null you get a repository with no working directory (only the .hg). They're better on the server because they take up less room and no one is tempted to edit there
changegroup hooks
For that last one the changegroup hook runs whenever one or more changesets arrive via push or pull and you can have it do some interesting things such as:
push the changesets on to another repo depending on what has arrived
update the receiving repo's working directory
For example one could automate something like this using only the tools described above:
developer pushes five changesets to central-repo/project1/main
last changeset is on branch 'my-experiment' so csets are automatually re-pushed to optionally created repo central-repo/project1/my-experiment
central-repo/project1/my-experiment automatically does hg update tip which is certain to be on the my-expiriment branch
central-repo/project1/my-experiment automatically runs tests in its working dir and if they pass does a 'make dist' that deploys, which might set up database and vhost too
The biggie, and chapter 10 in the mercurial book covers this, is to not have the user waiting on that process. You want the user to push to a repo that contains possibly-okay-code and the automated processed do the CI and deploy work, which if it passes ends up being a likely-okay repo.
In the largest mercurial setup in which I've worked (20 or so developers) we got to the point where our CI system (Hudson) was pulling from the maybe-ok repos for each periodically then building and testing, and handling each branch separately.
Bottom line: all the tools you need to setup whatever you'd like probably already exist, but gluing them together will be one-off sort of work.

What you need to remember is that DVCS (vs. CVCS) introduces another dimension to versioning:
You don't have to rely anymore only on branching (and get a staging workspace from the right branch)
You now have with DVCS the publication workflow (push/pull between repo)
Meaning your staging environment is now a repo (with the full history of the project), checked out at a certain branch:
Many developers can push many different branches to that staging repo: the reconciliation process can be done in isolation within that repo, in a "main" branch of your choice.
Or they can pull that staging branch in their repo and test things out before pushing back.
From Joel's tutorial on Mercurial HgInit
A developer don't necessary have to commit for other to see: the publication process in a DVCS allows for him/her to pull the staging branch first, reconcile any conflict locally, and then push to the staging repo.

Mercurial Branch Repositories with SUBREPOS

I'm trying to determine how people use "branch repositories" while also using subrepos.
Let's say I have repo Main containing a solution file (.NET), and populated with subrepos A, B, C:
/Main
- A
- B
- C
MainSolution.sln
A, B, and C, while being shared between other "Main" repos, are very tightly integrated into Main project. Thus, a major feature to the Main repo will require modifications to the subrepos (i.e., they are shared libraries, but are very actively developed).
Now it is time to add a feature. This feature is too big for one person to handle, and thus the code will need to be pushed to the central repo so others can help. We'd also need to be able to go back to the last "stable" code before the feature development began in case a bugfix is needed. I believe I have two options at this point: (1) create a named branch in the Main repo, or (2) create a new clone of Main. Since there are subrepos, both of these options have repercussions not present typically.
Option 1) Creating a named branch will, I presume, allow modifications to the subrepos to be committed/pushed, but only other people who have also updated to that branch in their clone of Main will be affected, since the .hgsubstate file is tracked. However, the subrepos will get a new head, and thus the (possibly) experimental feature would end up getting pushed to the central repo. Am I understanding this correctly?
Option 2) There are numerous advocates for the "don't use named branches, use 'branch repositories'", which are literally clones of the main repo, but named differently and existing on the central server. This is a little appealing to me, as it seems to keep things separated (and thus detached from disaster as co-workers --and myself!-- are still learning Mercurial). But this workflow seems completely broken when subrepositories are involved, since creating a clone of the Main repo does not create new, separated clones of the subrepos. It's a new clone, but it's still pointing at the same subrepos, and thus changes made to them will find their way back into the subrepos! I realize this is by design, and it's one of the really cool things (to me) about Mercurial. But how on earth do people use this branch repository workflow with subrepositories? It is completely inconceivable that, for each feature/experiment/version/whatever, I'm going to create a new clone (on the central server) of the Main repo, AND create clones (on the central server) of the subrepos, AND modify all the .hgrc/.hgsub paths to point to the proper central repos.
At this point, I'm just trying to understand HOW people work on a complicated project and use subrepos with branch repositories. Any thoughts?

You have other options as well. You could use bookmarks, for example. Since version 1.9, bookmarks can be pushed and pulled, they're not just local anymore. Since you often don't want a development "branch" to stick around as a named branch after that new feature is completed, bookmarks are often a better choice for that kind of thing. I tend to use bookmarks for new development and save real branches for released versions.
You should also be aware that subrepositories don't have to be shared between multiple main repositories in the way you describe. You can actually have the subrepositories stored inside a main repository (as opposed to having them at the same level as the main repos, or stored in some other location entirely), which would make them unique to each main repository, except you can push and pull from the subrepos in other main repos when you want to share those changes. This is the way I usually do it.
Unfortunately much of this is difficult to explain without a whiteboard, so please let me know if this isn't clear.

I prefer named branches for features that will most likely eventually get merged into the default branch. It is much easier to switch branches than switch repos.
With named branches you never need to worry about accidentally pushing your unstable branch of development into the stable repo. The named branch is already there, but won't be retrieved via an update unless a developer asks for it.

Ponderings of a Subversion User: What is a "branch" in Mercurial terms?

I'm a Subversion user, and I think I've got my head mostly around it all now. So of course now we're thinking of switching to Mercurial, and I need to start again.
In our single repository, we have the typical branches, tags, trunk layout. When I want to create a feature branch I:
Use the repo browser to copy trunk to branches/Features/[FeatureName].
Checkout a new working copy from branches/Features/[FeatureName].
Start working on it.
Occasionally commit, merge trunk in, resolve conflicts and commit.
When complete, one more merge of trunk, then "Reintegrate" the feature branch into trunk.
(Please note this process is simplified as it doesn't take into account release candidate branches etc).
So I have questions about how I'd fulfil the same requirements (i.e. feature branches rather than working on trunk) in Mercurial:
In Mercurial, is a branch still within the repository, or is it a whole new local repository?
If we each have a copy of the whole repository, does that mean we all have copies of each other's various feature branches (that's a lot of data transfer)?
I know Mercurial is a DCVS, but does that mean we push/pull changes from each other directly, rather than via a peer repository on a server?

I recommend reading this guide
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial//

In Mercurial, is a branch still within
the repository, or is it a whole new
local repository?
The equivalent of the subversion way of working would be a repository with multiple heads in mercurial. However, this is not the idiomatic way of doing things. Typically you will have only one head in a given repository, so separate repositories for each branch.
If we each have a copy of the whole
repository, does that mean we all have
copies of each other's various feature
branches (that's a lot of data
transfer)?
Yes, if you look at the history of the head of your local repository, then you'll be able to see all the feature branches that were merged in. But mercurial repositories are remarkably space efficient. For example, I have done a hg clone https://www.mercurial-scm.org/repo/hg to get the source for mercurial itself, and it is only 34.3 MB on an NTFS file system (compared to the source code download, which is 1.8 MB). Mercurial will also make use of hardlinks if your file system supports it, so there is little overhead if you clone a repository to another location on the same disk.
I know Mercurial is a DCVS, but does
that mean we push/pull changes from
each other directly, rather than via a
peer repository on a server?
One way of working is indeed to have each developer expose a public repository in which he pushes his own changes. All other developers can then pull what they want.
However, typically you'll have one or more "blessed" repositories where all the changes are integrated. All developers then only need to pull from the blessed repository. Even if you didn't explicitly have such a blessed repository I imagine people would automatically organize themselves like that, e.g. by all pulling from a lead developer.

Steve Losh's article on branching in mercurial linked above is fantastic. I also got into some explaining of branching and how the DAG works in a presentation I gave a couple of months ago on mercurial that's out on slideshare. The pertinent slides start at slide #43.
I think that understanding that all commits to the same repository are stored in a DAG (Directed Acyclic Graph) with some simple rules really helps demystify what's going on.
a node with no child nodes is a "head"
the root node has no parents
regular nodes have a single parent
nodes that are the result of a merge have two parents
if a merge node's parents are from different branches, the child node's branch is inherited from the first parent
Named branches are really just metadata labels on commits, but really aren't any different than the anonymous branches that happen when you merge someone elses work into your repository, or if you go back to an earlier version and then make a commit there to make a new head (which you can later merge).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008