Simplest workflow for non-developers using mercurial, working on different files, without having to think about merging? - mercurial

I currently use SVN for a number of things that aren't exactly code, for instance xml files, report templates, miscellaneous files, etc. I have several non-developers who are comfortable using TortoiseSVN for this. They typically work as follows:
Person A - does an SVN Update on the folder of interest to them. Or perhaps just on a single file.
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else is probably working on different files at this point
Person A - does an SVN Commit to save their changes to the repository.
Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine.
I'd like to move to Mercurial, however something holding me back is the prospect of having do 'merge' all the time, because Mercurial looks at the state of the entire repository, not just the files of interest at a particular time. e.g. the workflow would be like this:
Person A - does a pull and update on the repository. (let's assume there are no local changes so this is straightforward).
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else edits, commits, and pushes a different file at this point
Person A - commits changes. Tries to push. Gets an error about multiple heads.
Person A - does a pull and update. update doesn't work: merge required.
Person A - does a merge. If using TortoiseHg it's a bit confusing working out what to click on to do the merge. I guess this is simpler on the command line, provided there are no complications.
Person A - commits the merge.
Person A - pushes the changes.
My resistance is that there are more steps, and the merge step is somewhat hard to get your head around if you're not a developer. Is there a way I can put these steps together to make the process nice and simple?

"Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine."
If this is the case why do you want to use a DVCS? Mercurial is great, but the benefits of a DVCS come from the ability to merge and fork and the ease of doing either, if your workflow requires neither why would you want to switch toolset?

Sounds like the rebase extension might work for you. The workflow becomes:
hg clone
make changes
hg commit
hg pull --rebase
hg push
The local revisions get "rebased" onto the latest tip on pull, which avoids the merge.

One possible approach is to have a point person who does all the real work of merging. I'm not a big fan of letting everyone push to one shared repos, expecially if they don't know what they are doing. An alternative approach is that A has local repos A, B has local repos B, and there is repos S, which combines A and B. Then, don't let A or B push to S. Instead let an expert pull from A and B, and do the merging in S. Then A and B never have to push to S. If they coordinate with the expert, then he/she will already have merged their changes into S by the time they pull updates from S, and so A and B will not have to merge either when pulling. This is actually the default mode in which DVCS works, since by default all repositories are read-only except by their owner.

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Why does Mercurial make you pull/update/merge for unrelated files?

For larger teams, having to pull/update/merge then commit each time makes no sense to me, specifically when the files that were changed by other developers have nothing to do with my changeset files.
i.e. I change file1.txt, and someone else changes file10.txt. Why must I merge on my computer before being allowed to push?
It makes pushing a big pain, as you have to constantly pull/update/merge if many developers are commiting.
Also, it makes your changeset look much larger than it was since it shows your merges as seperate commits.
Mercurial makes you do this since its atomic unit isn't a file but a changeset. That is a node containing a group of changes. Each changeset is an individual node in history and represents what that person did. This does result in you having to merge even if no common files where changes (which would be a simple automatic merge). These merge nodes are important since they are part of your repositories history and gives Mercurial more information for future merges with ancestral information.
That said there is an extension you can use that would clean up your history a bit (but won't resolve your issue with needing to pull before you push). It is called the rebase extension, it is shipped with Mercurial but disabled by default. It adds a new arumument to pull that looks like:
hg pull --rebase
This will pull new changes and moves your local changeset linearly above them without having a merge changset. However, I would urge against using this since you do lose information about your repository since you are re-writing its history. Read this post for information about some issues that this may cause.
Well, you could try using rebase, which will avoid the merge commits, but it is not without its own perils. You can also collapse to one step by doing "hg pull --update", rather than separate hg pull; hg update commands.
As for why you must merge on your computer: this is a direct consequence of mercurial being a distributed version control system. There is no central server which can be considered canonical (unless you create one by convention), so there is no other "place" where the merge could occur. You are the only one who can decide how the information in your repo should be combined with the information in the remote repo. The results of these decisions must be recorded, and that is the origin of the merge commit.
Also, in your example the merge would happen without user interaction since there are no conflicts (the same would be true with rebase), so I don't see why that is a problem.
Because having changes in disjunct files does not guarantee that they are independent.
When you pull in changes, even if they are in files that are untouched by your local changes, it can cause your local changes to stop working. E.g. an interface that you access from newly written code could have been changed.
This is why there is always a merge step inbetween, so that a human can review the changes, test for issues, and address them before integrating the changes back into the main repository. This step is very important, because skipping it risks blocking all those 50-100 colleagues (which is very expensive).
I would take Lasse’s advice and push less often. Merging isn’t a big deal if you only need to do it twice or thrice a day. Also maybe create smaller team repositories (or branches) that are merged with the main repository daily by a designated person.

Correct (best-practise?) procedure to stay in sync with a remote Mercurial repository?

As a former user of Subversion, we've decide to move over to Mercurial for SCM and it is confusing us a little. Although Mercurial is a distributed SCM tool we are using a remote repo to keep changes we make backed up on a server but we are finding a few teething troubles.
For example, when two or three of us work on our local repo's, we commit then push to the remote repo, we find that a lot of heads(?) are created. This confused the hell out of us and we had to do some merging etc to sort it out.
What is the best way to avoid so many heads and to keep a remote repo in sync with a number of developers?
Today, i've been working like this:
Change a File.
Pull from remote repo.
Update local working copy.
Merge? (why?)
Commit my changes to local repo.
Push to the remote repo.
Is this the best proceedure?
Although this has worked fine today, i can't help that feeling that i'm doing it wrong! To be honest i don't understand why merging even needs to be done at the pull stage because other people are working on different files?
Other than to tell me to RTFM have you any tips for using Mercurial is such a way? Any good online resources for information on why we get so many heads?
NOTE: I have read the manual but it doesn't really give much detail and i don't think i want to start another book at the minute.
You should definitely find some learning resources.
I can recommend the following:
hginit.com
Tekpub: Mercurial
As for your concrete question, "is this the best procedure", then I would have to say no.
Here's some tips.
First of all, you don't need to stay "in sync" with the central repository at all times. Instead, follow these guidelines:
Push from your local repository to the central one when you're happy with the changes you've committed. Remember, this can be several changesets
Pull if you need changes others have done right away, ie. there's a bugfix a colleague of yours has fixed, that you need, in order to continue with your own work.
Pull before push
Merge any extra heads you pulled down with your own changes, before you push, or continue working
In other words, here's a typical day.
You pull the latest changes when you come in in the morning, so that you got an up to date local clone. You might not always do this, if you're in the middle of bigger changes that you didn't finish yesterday.
Then you start working. You commit small changesets with isolated changes That isn't to say that you split up a larger bugfix into many smaller commits just because you modify multiple files, but try to avoid fixing more than one bug at a time, or implementing more than one feature at a time. Try to stay focused.
Then, when you're happy with all the changesets you've added locally, you decide to push to the server. When you try to do this, you get an abort message saying that extra heads would be pushed to the server, and this isn't allowed, so the push is aborted.
Instead you pull. This can always be done, but will of course now add extra heads in your local clone, instead of at the server.
Then you merge, the extra head that you got from the server, with your own head, the one that you created during the day by committing new changesets to your clone. You resolve any merge conflicts.
Then you push, and now it should succeed. On the off chance that someone has managed to push more changesets to the central repository while you were busy merging, you will get another abort and have to rinse and repeat.
The history will now show multiple parallel branches of development, but should always stay at max 1 head in your central repository. If, later on, you start using named branches, you can have 1 head per named branch, but try to avoid this until you get the hang of just the default branch.
As for why you need to merge? Well, Mercurial always work with revisions that are snapshots of the entire project, which means two branches, even though they contain changes to different files, are really considered two different versions of the entire project, and you need to tell Mercurial that it should combine them to get back to one version.
For one, you can pull at any time; pulling does just add changesets to your repo, but not change your local working files (except if you have enabed the post-pull update).
Merging is necessary if someone else has commited changes to the same branch you're currently working on. This created an implicit branch, and merging merely brings them back together. You can see this nicely with the "railroad track" in the repository view. Basically, as long as you don't merge, you stay on your own "private" track, and when you want to add your changes (can be any amount of changesets) you merge it back into the destination branch (typically "default"). It's painless - nothing like merging in older SVN versions!
So the workflow is not as rigid as you displayed it; it's more like this:
Pull as much as you like
Make changes and commit locally as often as you like
When your changes should be integrated, merge with the destination branch (can be a lower revision than the newest), commit and push
This workflow can be tuned somewhat, for instance by using named branches and sometimes by using rebase. However, you and your team should decide on the workflow to be used; Mercurial is quite flexible in this regard.
http://hginit.com has a good tutorial.
In particular, you'll find the list of steps you have here: http://hginit.com/02.html (at the bottom of the page)
The difference between those steps and yours is that you should commit after step 1. In fact you will typically commit several times on your local repository before moving onto the pull/merge/push step. You don't need to share every commit with the rest of developers right away. It'll often make sense to do several related changes and then push that whole thing.

How to manage concurrent development with mercurial?

This is a best practice question, and I expect the answer to be "it depends". I just hope to learn more real world scenarios and workflows.
First of all, I'm talking about different changes for the same project, so no subrepo please.
Let's say you have your code base in an hg repository. You start to work on a complicated new feature A, then a complicated bug B is reported by your trusted tester (you have testers, right?).
It's trivial if (the fix for) B depends on A. You simlply ci A then ci B.
My question is what to do when they are independent (or at least it seems now).
I can think of the following ways:
Use a separate clone for B.
Use anonymous or named branches, or bookmarks, in the same repository.
Use MQ (with B patch on top of A).
Use branched MQ (I'll explain later).
Use multiple MQ (since 1.6)
1 and 2 are covered by an excellent blog by #Steve Losh linked from a slightly related question.
The one huge advantage of 1 over the other choices is that it doesn't require any rebuild when you switch from working on one thing to the other, because the files are physically separated and independent. So it's really the only choice if, for example, A and/or B touches a header file that defines a tri-state boolean and is included by thousands of C files (don't tell me you haven't seen such a legacy code base).
3 is probably the easiest (in terms of setup and overhead), and you can flip the order of A and B if B is a small and/or urgent fix. However it can get tricky if A and B touches the same file(s). It's easy to fix patch hunks that failed to apply if A and B changes are orthogonal within the same file(s), but conceptually it's still a bit risky.
4 can make you dizzy but it's the most powerful and flexible and scalable way. I default hg qinit with -c since I want to mark work-in-progress patches and push/pull them, but it does take a conceptual leap to realize that you can branch in MQ repo too. Here are the steps (mq = hg --mq):
hg qnew bugA; make changes for A; hg qref
mq branch branchA; hg qci
hg qpop; mq up -rtip^
hg qnew bugB; make changes for B; hg qref
mq branch branchB; hg qci
To work on A again: hg qpop; mq up branchA; hg qpush
It seems crazy to take so many steps, and whenever you need to switch work you must hg qci; hg qpop; mq up <branch>; hg qpush. But consider this: you have several named release branches in the same repository, and you need to work on several projects and bug fixes at the same time for all of them (you'd better get guaranteed bonus for this kind of work). You'd get lost very soon with the other approaches.
Now my fellow hg lovers, are there other/better alternatives?
(UPDATE) qqueue almost makes #4 obsolete. See Steve Losh's elegant description here.
I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code. Whether to have one clone or two sitting on your disk is generally an easy one, given my working style, at least:
Does your project lack a build process, so that you can test and run things right from the source code? Then I will be tempted to have just one clone, and hg up back and forth when I need to work on another branch.
But if you have a buildout, virtualenv, or other structure that gets built, and that might diverge between the two branches, then doing an hg up then waiting for the build process to re-run can be a big pain, especially if things like setting up a sample database are involved. In that case I would definitely use two clones, one sitting at the tip of trunk, and one sitting at the tip of the emergency feature branch.
It seems like there's no more or better choices than the ones I listed in the question. So here they are again.
Use one clone per project.
Pros: total separation, thus no rebuild when switching projects.
Cons: toolchain needs to switch between two clones.
Use anonymous or named branches, or bookmarks, in the same repository.
Pros: standard hg (or any DVCS) practice; clean and clear.
Cons: must commit before switching and rebuild after.
Use MQ with one patch (or multiple consecutive patches) per project.
Pros: simple and easy.
Cons: must qrefresh before switching and rebuild after; tricky and risky if projects are not orthogonal.
Use one MQ branch (or qqueue in 1.6+) per project.
Pros: ultra flexible and scalable (for the number of concurrent projects)
Cons: must qrefresh and qcommit before switching and rebuild after; feels complicated.
Like always, there's no silver bullet, so pick and choose the one right for the job.
(UPDATE) For anyone who's in love with MQ, using MQ on top of regular branches (#2 + #3) is probably the most common and preferable practice.
If you have two concurrent projects with baseline on two branches (for example next release and current release), it's trivial to hop between them like this:
hg qnew; {coding}; hg qrefresh; {repeat}
hg qfinish -a
hg update -r <branch/bookmark/rev>
hg qimport -r <rev>; {repeat}
For the last step, qimport should add a -a option to import a line of changesets at once. I hope Meister Geisler notices this :)
So the question is, at the point when you are told to stop working on feature A, and begin independent feature B, what alternative options are there, for: How to manage concurrent development with mercurial?
Let's look at the problem with concurrency removed, the same way you write threaded code- define a simple work flow for solving any problem given to you, and apply it to each problem. Mercurial will join the work, once it's done. So, programmer A will work on feature A. Programmer B will work on feature B. Both just happen to be you. (If only we had multi-core brains:)
I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code.
I agree with Brandon's sentiment, but I wonder if he overlooked that feature A has not been tested? In the worst case, the code compiles and passes unit tests, but some methods implement the previous requirements, and some methods implement the new ones. A diff against the previous check-in is the tool I would use to help me get back on track with feature A.
Is your code for feature A at a point when you would normally check it in? Switching from feature A to working on feature B is not a reason to commit code to the head or to a branch. Only check in code that compiles and passes your tests. My reason is, if programmer C needs to begin feature C, a fresh checkout of this branch is no longer the best place to start. Keeping your branch heads healthy, means you can respond quickly, with more reliable bug fixes.
The goal is to have your (tested and verified) code running, so you want all your code to end up merged into the head (of your development and legacy branches). My point seems to be, I've seen branching used inefficiently: code becomes stale and then not used, the merge becomes harder than the original problem.
Only your option 1 makes sense to me. In general:
You should think your code works, before someone else sees it.
Favor the head over a branch.
Branch and check-in if someone else is picking up the problem.
Branch if your automated system or testers need your code only.
Branch if you are part of a team, working on a problem. Consider it the head, see 1-4.
With the exception of config files, the build processes should be a checkout and a single build command. It should not be any more difficult to switch between clones, than for a new programmer to join the project. (I'll admit my project needs some work here.)

best practices in mercurial: branch vs. clone, and partial merges?

...so I've gotten used to the simple stuff with Mercurial (add, commit, diff) and found out about the .hgignore file (yay!) and have gotten the hang of creating and switching between branches (branch, update -C).
I have two major questions though:
If I'm in branch "Branch1" and I want to pull in some but not all of the changes from branch "Branch2", how would I do that? Particularly if all the changes are in one subdirectory. (I guess I could just clone the whole repository, then use a directory-merge tool like Beyond Compare to pick&choose my edits. Seems like there ought to be a way to just isolate the changes in one file or one directory, though.)
Switching between branches with update -C seems so easy, I'm wondering why I would bother using clone. I can only think of a few reasons (see below) -- are there some other reasons I'm missing?
a. if I need to act on two versions/branches at once (e.g. do a performance-metric diff)
b. for a backup (clone the repository to a network drive in a physically different location)
c. to do the pick&choose merge like I've mentioned above.
I use clone for:
Short-lived local branches
Cloning to different development machines and servers
The former use is pretty rare for me - mainly when I'm trying an idea I might want to totally abandon. If I want to merge, I'll want to merge ALL the changes. This sort of branching is mainly for tracking different developers' branches so they don't disturb each other. Just to clarify this last point:
I keep working on my changes and pull my fellow devs changes and they pull mine.
When it's convenient for me I'll merge ALL of the changes from one (or all) of these branches into mine.
For feature branches, or longer lived branches, I use named branches which are more comfortably shared between repositories without merging. It also "feels" better when you want to selectively merge.
Basically I look at it this way:
Named branches are for developing different branches or versions of the app
Clones are for managing different contributions to the same version of the app.
That's my take, though really it's a matter of policy.
For question 1, you need to be a little clearer about what you mean by "changes". Which of these do you mean:
"I want to pull some, but not all, of the changesets in a different branch into this one."
"I want to pull the latest version of some, but not all, of the files in a different branch into this one."
If you mean item 1, you should look into the Transplant extension, specifically the idea of cherrypicking a couple of changesets.
If you mean item 2, you would do the following:
Update to the branch you want to pull the changes into.
Use hg revert -r <branch you want to merge> --include <files to update> to change the contents of those files to the way they are on the other branch.
Use hg commit to commit those changes to the branch as a new changeset.
As for question 2, I never use repository clones for branching myself, so I don't know. I use named branches or anonymous branches (sometimes with bookmarks).
I have another option for you to look into: mercurial queues.
The idea is, to have a stack of patches (no commits, "real" patches) ontop of your current working directory. Then, you can add or remove the applied patches, add one, remove it, add another other one, etc. One single patch or a subset of them ends up to be a new "feature" as you probably want to do with branches. After that, you can apply the patch as usual (since it is a change). Branches are probably more useful if you work with somebody else... ?