Mercurial and code reviews; good workflow? - mercurial

I'm in a small distributed team using Mercurial for a central repository. We each clone it via ssh onto our own linux boxes. Our intent is to review each others' work before pushing changes up to the central repository, to help keep central's tip clean. What is a good way to share code between developers on different linux boxes? I'm new to Mercurial. The options I can think of (through reading, not experience) are:
1: Author commits all local changes and updates working clone with central's tip. Author uses hg bundle, if there's a way to specify which local revs to include in the bundle. (an experiment showed me "bundle" only grabs uncommited changes, even if there are previous local commits that central doesn't know about) Author gets bundle file to reviewer. Reviewer creates a new clean clone from central's tip, and imports the bundle into that clone.
or,
2: After both author and reviewer fetch from central's tip, author uses patch and reviewer imports the patch.
or,
3: Author pushes to reviewer or reviewer pulls from author (but how, exactly? What I read is only about pushing and pulling to/from the original served repository, and/or on the same box instead of between different linux boxes.)
4: Forget reviewing the code prior to pushing to central; go ahead and push, using tags to identify what's been reviewed or not, and use Hudson (already working) to tag the latest safe build so team members can know which one to pull from.
If your team uses Mercurial and does code reviews, how do you do get the reviewer to see your changes?

Most of these are possible, some are more tedious than others.
You can use bundle by specifying the tip of the central repo as the --base:
hg bundle --base 4a3b2c1d review.bundle
Might as well just use bundle. That way, the changeset data is also included.
You can push (and pull) to (from) any repository that has a common ancestor(s). If you want to pull from one of your colleagues, they just need to run hg serve on their machine, and you will be able to pull.
This also works, but you will have to maintain multiple heads and be careful about merging. If you don't, it can become easy to base a stable change on top of an unreviewed changeset, which will make it hard to undo if you need to fix that unreviewed changeset later.
Of the options you presented, #1 and #3 are probably easiest, just depending on whether or not you can reach each other's boxes.
On a related note: This is the question that got my colleague and I started on developing Kiln, our (Fog Creek's) Mercurial hosting and code review tool. Our plan, and the initial prototype, would keep multiple repositories around, one "central" repository, and a bunch of "review" repositories. The review process would be started by cloning the central repo into a review repo on the server, and then running a full repo diff between the two, with a simple web interface for getting and viewing the diffs.
We've evolved that workflow quite a bit, but the general idea, having a branch repo to push unreviewed changes to and an interface to review them before you push them into the central repo, is still the same. I don't want to advertise here, but I do recommend giving it a try.

Half answer to this question is using ReviewBoard with Mercurial extention. It allows to push certain revisions for review by issuing the following command
hg postreview tip

I'll add a 5th option - do all development work on named branches, preferably one per task. Allow anything to be committed to a "development" named branch, whether it's in working state or not.
Push to the central repository, have reviewer pull the branch. Perform the review on the branch.
When the review has passed, merge the development work into the appropriate feature branch.
This workflow, which is (to me) surprisingly unpopular, has many advantages:
All work gets committed - you do not have to wait for a review to be done before committing.
You will not build off the wrong version. You only ever build from the feature branch.
In-progress work does not interfere with other developers.
From the development branch, you can either look at the latest changes (e.g. the changesets addressing review comments), compare with the branch point, or compare with the latest feature branch - all of which can give useful information.

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

How to setup a DVCS system that keeps some of the stuff we like about our centralized VCS?

Right now, we have a small team of developers using TFS for our version control. I'm evaluating the possibility of us moving to a DVCS, and am wondering if we'd need to give up some of the stuff we like about our current system if we moved to DVCS, or if we can find a way to support it.
Right now we a Stable branch, and 1 branch for each developer (you can think of each dev's branch as a feature branch that is reused from feature to feature). The stuff we like is:
1) Each time a dev checks in changes to his branch, we do a build and test of everything on the servers, followed by automatic deployment of all projects to that developers test environment.
2) Merging from any dev's branch to Stable is done by me, so that I can have 1 last check on what is happening to our stable branch before committing the changes.
3) If I want to help a dev with something, I can just grab latest from their branch and look at it on my machine.
I'm trying to understand how this could work with a DVCS (specifically we are testing with Mercurial).
I'm hoping to be able to pull off something like this:
1) We setup a central repository, and we create Main and Release branches in addition to 1 branch for each developer.
2) All devs clone the repository to their local machine.
3) All work is done in their personal branch against the local repository.
4) When they are done, the would pull from the central repository, and perform a local forward integration merge from Main to to their branch, to integrate any changes that have happened in the past to Main.
5) They would then push their changes to the central repository.
6) Some CI service would pickup this change, causing a build/test/deploy-to-dev of all our projects in that branch.
7) If everything was ok the dev would shoot me an email saying their branch was ready for merging to Main.
8) I could then merge in their changes, either by somehow connecting directly to the remove repository, or by doing a pull->merge->push.
So to deal with our requests:
1) I'm assuming there is some CI tools that can watch a branch in Mercurial, and kick off a build/test/deploy process (like CC.Net).
2) I can still manage the final merge process from DevX to Main either by connecting to the remote repo, or by pulling, merging and pushing through my local repo.
3) I believe I could either pull changes directly from another dev's repo, or I could just pull from the central repo, and then update my working directory to work on their code.
So do I have this mostly right?
Yep, you have all that right. Regarding you final assumptions:
1) There's a Mercurial Source Control Block for CruiseControl.NET. Another CI server I've heard of in use with Mercurial is Jenkins.
2) Correct. For integration with Main, I would prefer pulling (from either) and merging on my own machine before pushing, rather than merging on the server.
3) Exactly so.
It sounds like your developers are fairly disciplined, but just in case you need better control certain aspects of your operations:
You can use hooks to issue warnings when someone tries to merge their branch to Main. In-process hooks have to be written in Python, but they have access to the Mercurial API that way. You could also place hooks on the server that reject pushes containing a merge to Main not done by certain users.
One way some organizations control integration is a pull-only scenario. Only a few developers can push to the official repository and other developers send them pull requests. The Mercurial book's Chapter 6 covers this a bit, too.
A branch per developer is good. A branch per feature is also useful, allowing each developer to work on multiple things in parallel, then merging each to their branch when done. They just have to remember to close that feature branch before doing so, so the branch name doesn't keep popping up. This can be done with with clones as well, but I find myself preferring named branches since I have to keep work/backup/laptop development clones all synced so I can work on whateve, whenever. I still do expendable work as a clone first.

Is there a way to review someone's code before they push in Mercurial?

I would like to be able to review other developers code before they push it to central repository. The developers are at remote locations so going to their desk is not an option.
Currently they just push and if there are issues they would rollback. But this is not a good approach since someone can pull before they get a chance to rollback.
Mercurial is distributed, and as such should be able to adapt to any workflow. Try designating someone as an integration manager or use the dictator and lieutenants workflow.
How about review repository between the developers and the main repository? Only you push from there to main.
I upvoted kelloti's answer as this is just an expansion of it, but just used tiers of repositories. Have people push their un-reviewed changesets to a needs-review central-ish repository, and have reviewers push reviewed works from there to the needs-QA central-ish repository and have QA people push the release candidate central-ish repositories.
With a distributed version control system you can do a plurality of centralized repos as easily as you can do a plurality of developer repositories.
On my last project, we followed a very branchy development model - every task had a branch named with the task number. Code reviews were performed against the named branch. We explicitly wanted these pushed to the central repository and developers pulled them.
However, no task named branch was merged to the integration branch (in our case default, but it could have been any feature branch) until it had passed code review.
A lot of mercurial developers don't like to use short-lived branches that remain in the repository, but I find it makes it easier to follow the history, especially when looking at the history of a single change - you know that the changes for a particular task are on the associated named branch.
Perhaps using a shelve extension is a good solution? I'm not very familiar with Mercurial but this might work for you.
https://www.mercurial-scm.org/wiki/ShelveExtension

Moving from Subversion to Mercurial - how to adapt the workflow and staging/integration systems?

We got all psyched about from from svn to hg and as the development workflow is more or less flushed out, here remains the most difficult part - staging and integration system.
Hopefully this question goes a bit further then your common 'how do I move from xxx to Mercurial'. Please forgive long and probably poorly written question :)
We are web shop that does a lot of projects(mainly PHP and Zend), so we have one huge svn repo, with like 100+ folders, each representing a project with it's own tags,branches and trunk of course. On our integration and testing server(where QA and clients look at work results and test stuff) everything is pretty much automated - Apache is set to pick up new projects automatically creating vhost for each project/trunk; mysql migration scripts right there in trunk too and developers can apply them through simple web-interface. Long story short our workflow is this now:
Checkout code, do work, commit
Run update on the server via web interface(this basically does svn up on server on a particular project and also run db-migration script if needed)
QA changes on the server
This approach is certainly suboptimal for large projects when we have 2+ developers working on the same code. Branching in svn was only causing more headaches, well, hence moving to Mercurial. And here is where the question lies - how does one organize efficient staging/integration/testing server for this type of work(where you have many projects, say single developer could be working on 3 different projects in 1 day).
We decided to have 'default' branch tracking production essentially and then make all changes in individual branches. In this case though how can we automate staging updates for each branch? If earlier for one project we almost always were working on trunk, so we needed one DB, one vhost, etc. now we potentially talking about N-databases per project, N-vhost configs and etc. Then what about CI stuff(such as running phpDocumentor and/or unit tests)? Should it only be done on the 'default'? On branches?
I wonder how other teams solve this issue, perhaps some best practices that we're not using or overlooking?
Additional notes:
Probably worth mentioning that we've picked Kiln as a repo hosting service(mostly since we're using FogBugz anyway)
This is by no means the complete answer you'll eventually pick, but here are some tools that will likely factor into it:
repositories without working directories -- if you clone -U or hg update null you get a repository with no working directory (only the .hg). They're better on the server because they take up less room and no one is tempted to edit there
changegroup hooks
For that last one the changegroup hook runs whenever one or more changesets arrive via push or pull and you can have it do some interesting things such as:
push the changesets on to another repo depending on what has arrived
update the receiving repo's working directory
For example one could automate something like this using only the tools described above:
developer pushes five changesets to central-repo/project1/main
last changeset is on branch 'my-experiment' so csets are automatually re-pushed to optionally created repo central-repo/project1/my-experiment
central-repo/project1/my-experiment automatically does hg update tip which is certain to be on the my-expiriment branch
central-repo/project1/my-experiment automatically runs tests in its working dir and if they pass does a 'make dist' that deploys, which might set up database and vhost too
The biggie, and chapter 10 in the mercurial book covers this, is to not have the user waiting on that process. You want the user to push to a repo that contains possibly-okay-code and the automated processed do the CI and deploy work, which if it passes ends up being a likely-okay repo.
In the largest mercurial setup in which I've worked (20 or so developers) we got to the point where our CI system (Hudson) was pulling from the maybe-ok repos for each periodically then building and testing, and handling each branch separately.
Bottom line: all the tools you need to setup whatever you'd like probably already exist, but gluing them together will be one-off sort of work.
What you need to remember is that DVCS (vs. CVCS) introduces another dimension to versioning:
You don't have to rely anymore only on branching (and get a staging workspace from the right branch)
You now have with DVCS the publication workflow (push/pull between repo)
Meaning your staging environment is now a repo (with the full history of the project), checked out at a certain branch:
Many developers can push many different branches to that staging repo: the reconciliation process can be done in isolation within that repo, in a "main" branch of your choice.
Or they can pull that staging branch in their repo and test things out before pushing back.
From Joel's tutorial on Mercurial HgInit
A developer don't necessary have to commit for other to see: the publication process in a DVCS allows for him/her to pull the staging branch first, reconcile any conflict locally, and then push to the staging repo.

Ponderings of a Subversion User: What is a "branch" in Mercurial terms?

I'm a Subversion user, and I think I've got my head mostly around it all now. So of course now we're thinking of switching to Mercurial, and I need to start again.
In our single repository, we have the typical branches, tags, trunk layout. When I want to create a feature branch I:
Use the repo browser to copy trunk to branches/Features/[FeatureName].
Checkout a new working copy from branches/Features/[FeatureName].
Start working on it.
Occasionally commit, merge trunk in, resolve conflicts and commit.
When complete, one more merge of trunk, then "Reintegrate" the feature branch into trunk.
(Please note this process is simplified as it doesn't take into account release candidate branches etc).
So I have questions about how I'd fulfil the same requirements (i.e. feature branches rather than working on trunk) in Mercurial:
In Mercurial, is a branch still within the repository, or is it a whole new local repository?
If we each have a copy of the whole repository, does that mean we all have copies of each other's various feature branches (that's a lot of data transfer)?
I know Mercurial is a DCVS, but does that mean we push/pull changes from each other directly, rather than via a peer repository on a server?
I recommend reading this guide
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial//
In Mercurial, is a branch still within
the repository, or is it a whole new
local repository?
The equivalent of the subversion way of working would be a repository with multiple heads in mercurial. However, this is not the idiomatic way of doing things. Typically you will have only one head in a given repository, so separate repositories for each branch.
If we each have a copy of the whole
repository, does that mean we all have
copies of each other's various feature
branches (that's a lot of data
transfer)?
Yes, if you look at the history of the head of your local repository, then you'll be able to see all the feature branches that were merged in. But mercurial repositories are remarkably space efficient. For example, I have done a hg clone https://www.mercurial-scm.org/repo/hg to get the source for mercurial itself, and it is only 34.3 MB on an NTFS file system (compared to the source code download, which is 1.8 MB). Mercurial will also make use of hardlinks if your file system supports it, so there is little overhead if you clone a repository to another location on the same disk.
I know Mercurial is a DCVS, but does
that mean we push/pull changes from
each other directly, rather than via a
peer repository on a server?
One way of working is indeed to have each developer expose a public repository in which he pushes his own changes. All other developers can then pull what they want.
However, typically you'll have one or more "blessed" repositories where all the changes are integrated. All developers then only need to pull from the blessed repository. Even if you didn't explicitly have such a blessed repository I imagine people would automatically organize themselves like that, e.g. by all pulling from a lead developer.
Steve Losh's article on branching in mercurial linked above is fantastic. I also got into some explaining of branching and how the DAG works in a presentation I gave a couple of months ago on mercurial that's out on slideshare. The pertinent slides start at slide #43.
I think that understanding that all commits to the same repository are stored in a DAG (Directed Acyclic Graph) with some simple rules really helps demystify what's going on.
a node with no child nodes is a "head"
the root node has no parents
regular nodes have a single parent
nodes that are the result of a merge have two parents
if a merge node's parents are from different branches, the child node's branch is inherited from the first parent
Named branches are really just metadata labels on commits, but really aren't any different than the anonymous branches that happen when you merge someone elses work into your repository, or if you go back to an earlier version and then make a commit there to make a new head (which you can later merge).