Cheap cloning/local branching in Mercurial

Cheap cloning/local branching in Mercurial - mercurial

Just started working with Mercurial a few days ago and there's something I don't understand.
I have an experimental thing I want to do, so the normal thing to do would be to clone my repository, work on the clone and if eventually I want to keep those changes, I'll push them to my main repository.
Problem is cloning my repository takes alot of time (we have alot of code) and just compiling the cloned copy would take up to an hour.
So I need to somehow work on a different repository but still in my original working copy.
Enter local branches.
Problem is just creating a local branch takes forever, and working with them isn't all that fun either. Because when moving between local branches doesn't "revert" to the target branch state, I have to issue a hg purge (to remove files that were added in the moved from branch) and then hg update -c (to revert modified files in the moved from branch). (note: I did try PK11 fork of local branch extension, it a simple local branch creation crashes with an exception)
At the end of the day, this is just too complex. What are my options?

There are several ways of working with local branches besides cloning:
Bookmarks
Named branches
Anonymous branches
You may be interested in reading a very insightful guide to branching in Mercurial. I guess the bookmarks extension is the most appropriate way of branching in the context you've described.

Probably not the answer you want to hear, but I would say maybe you should consider splitting your repository into a little more manageable chunks :). E.g. does that documentation and those design files really need to be included in the same repository, does that editor tool not deserve its own repository, etc.
Because as cloning creates hard links, it’s pretty much already as fast as it can get; if cloning takes 5-10 minutes already, making a file system copy must be even worse. (Tip: keep in mind hg clone -U if you do not need a working copy, it will be much much faster.)
Otherwise, yeah, anonymous branches with bookmarks is the usual way to do in-place switching.

how long does cloning a local repo take?
it sounds like you build process might be the weak link to me - perhaps you need something like ccache? so that you can clone and build quickly

couldn't you make your experiments on top of your existing clone and when you want to make some changes to the main line go back and update to the last revision before you started the experiments?

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?

Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.

Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.

Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Mercurial: Incomplete central repository possible?

I want to realize the following setup:
AtWork:MercurialRepo <-> Internet:MercurialRepo <-> AtHome:MercurialRepo
Problem is the repository is several gigs. I already have the entire repo at home (through bundling->cdrom->unbundling). The thing is, I do not want to store the whole repository on the internet. Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet? How do I best accomplish my goal? From time to time I would need to do the tedious bundling -> cdrom -> unbundling route, just to update everything else, but in general I do want to use the internet route and do not want to store the whole repo there.

So, as you've found out by now you can't selectively clone some files from a repository. The best you can do is clone a subset of all branches; but you will get the entire past history of these branches, for all files in the repository. So, unless a lot of the big files are only known in some branches and not others, this won't help you.
Since your problem is the large size of files (rather than a long and bulky history), you probably need to break it down into several "subrepositories" of manageable size. Note that the subset you are interested in cloning must be a subrepository; cloning the main repo necessarily includes the subrepositories. The mercurial subrepository documentation recommends that you make a trivial ("thin shell") main repo, and put all your project code in subrepositories.
Subrepositories are a complex solution, and are considered a "feature of last resort" by the mercurial team. It's a complex setup, there are various limitations (see the docs), and you'll have the extra complication of trying to convert your repo in a way that will preserve file history. So, it's worth considering ways to avoid this:
a) It would be best if you can avoid the middle copy of your repo; is there no way you can set up ssh access or a proxy so that your home repo can talk to your work repo directly? (Or vice versa; it's enough if one of the locations is able to contact the other).
b) You could carry the repo on a USB stick, as #vaclav's answer suggests.
c) Or maybe you should just bite the bullet and clone the entire repo on the internet.

Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet?
Not folders, but some parts of repo - yes
You can push -b (only some branch(es)) or push -r (revision with ancestors: for latest work it will be -r tip), but final size of transfer is heavy dependent from type of your DAG - in case of a lot of cross-branch merges you probably skip only small part of changesets

I have small idea, bit different from what you asked, but...
If I have same issue, I would thing of using usb flash as whole repository (if you are about 10 or 20 gig it should be cheap). So at work you can copy, or clone whole repo to usb, pull new changes from it at home, and after your home working is done, push it to repo on flash, then pull it to repo at work(I use even temporary commits for undone work which I revert to working directory and strip, so I can continue where I ended).
But definitely easiest way, is to try get some connection to work servers, or to your machine at work. Or get bigger space for repo at internet. So, just another Ideat. HTH

Is not really possible. The closest thing would be to use sub-repositories which will effectively allow you to have only part of your big repo on the net.

Correct (best-practise?) procedure to stay in sync with a remote Mercurial repository?

As a former user of Subversion, we've decide to move over to Mercurial for SCM and it is confusing us a little. Although Mercurial is a distributed SCM tool we are using a remote repo to keep changes we make backed up on a server but we are finding a few teething troubles.
For example, when two or three of us work on our local repo's, we commit then push to the remote repo, we find that a lot of heads(?) are created. This confused the hell out of us and we had to do some merging etc to sort it out.
What is the best way to avoid so many heads and to keep a remote repo in sync with a number of developers?
Today, i've been working like this:
Change a File.
Pull from remote repo.
Update local working copy.
Merge? (why?)
Commit my changes to local repo.
Push to the remote repo.
Is this the best proceedure?
Although this has worked fine today, i can't help that feeling that i'm doing it wrong! To be honest i don't understand why merging even needs to be done at the pull stage because other people are working on different files?
Other than to tell me to RTFM have you any tips for using Mercurial is such a way? Any good online resources for information on why we get so many heads?
NOTE: I have read the manual but it doesn't really give much detail and i don't think i want to start another book at the minute.

You should definitely find some learning resources.
I can recommend the following:
hginit.com
Tekpub: Mercurial
As for your concrete question, "is this the best procedure", then I would have to say no.
Here's some tips.
First of all, you don't need to stay "in sync" with the central repository at all times. Instead, follow these guidelines:
Push from your local repository to the central one when you're happy with the changes you've committed. Remember, this can be several changesets
Pull if you need changes others have done right away, ie. there's a bugfix a colleague of yours has fixed, that you need, in order to continue with your own work.
Pull before push
Merge any extra heads you pulled down with your own changes, before you push, or continue working
In other words, here's a typical day.
You pull the latest changes when you come in in the morning, so that you got an up to date local clone. You might not always do this, if you're in the middle of bigger changes that you didn't finish yesterday.
Then you start working. You commit small changesets with isolated changes That isn't to say that you split up a larger bugfix into many smaller commits just because you modify multiple files, but try to avoid fixing more than one bug at a time, or implementing more than one feature at a time. Try to stay focused.
Then, when you're happy with all the changesets you've added locally, you decide to push to the server. When you try to do this, you get an abort message saying that extra heads would be pushed to the server, and this isn't allowed, so the push is aborted.
Instead you pull. This can always be done, but will of course now add extra heads in your local clone, instead of at the server.
Then you merge, the extra head that you got from the server, with your own head, the one that you created during the day by committing new changesets to your clone. You resolve any merge conflicts.
Then you push, and now it should succeed. On the off chance that someone has managed to push more changesets to the central repository while you were busy merging, you will get another abort and have to rinse and repeat.
The history will now show multiple parallel branches of development, but should always stay at max 1 head in your central repository. If, later on, you start using named branches, you can have 1 head per named branch, but try to avoid this until you get the hang of just the default branch.
As for why you need to merge? Well, Mercurial always work with revisions that are snapshots of the entire project, which means two branches, even though they contain changes to different files, are really considered two different versions of the entire project, and you need to tell Mercurial that it should combine them to get back to one version.

For one, you can pull at any time; pulling does just add changesets to your repo, but not change your local working files (except if you have enabed the post-pull update).
Merging is necessary if someone else has commited changes to the same branch you're currently working on. This created an implicit branch, and merging merely brings them back together. You can see this nicely with the "railroad track" in the repository view. Basically, as long as you don't merge, you stay on your own "private" track, and when you want to add your changes (can be any amount of changesets) you merge it back into the destination branch (typically "default"). It's painless - nothing like merging in older SVN versions!
So the workflow is not as rigid as you displayed it; it's more like this:
Pull as much as you like
Make changes and commit locally as often as you like
When your changes should be integrated, merge with the destination branch (can be a lower revision than the newest), commit and push
This workflow can be tuned somewhat, for instance by using named branches and sometimes by using rebase. However, you and your team should decide on the workflow to be used; Mercurial is quite flexible in this regard.

http://hginit.com has a good tutorial.
In particular, you'll find the list of steps you have here: http://hginit.com/02.html (at the bottom of the page)
The difference between those steps and yours is that you should commit after step 1. In fact you will typically commit several times on your local repository before moving onto the pull/merge/push step. You don't need to share every commit with the rest of developers right away. It'll often make sense to do several related changes and then push that whole thing.

Mercurial repositories with many active developers?

I'm going through Bitbucket and I can't seem to find any Mercurial repositories that look like what I suspect our repository would look like, provided we switch to Mercurial.
As such, I'm wondering, is there a workflow that we're not considering here?
The thing I'm talking about is that I did a small automated test. We're 14 people that work on the same project, split into 4 scrum teams. To simulate 14 (I picked 10, round number) people working in parallel on the code, using Mercurial DVCS, pushing to the same central master repository, I wrote a script.
I created a new "master" repository, and then cloned it for 10 virtual people
I then ran a 1000 iteration loop, picking a random clone, and doing one of the following:
10% of the time, do a pull from master, merge, commit merge, and push
90% of the time, do a local change and commit
Note that I ensured that there would never be merge conflicts by simply making each virtual person work on his own file.
This would simulate people working locally by doing 1+ commits before pulling, merging, and pushing (to avoid 2+ heads in the master repo). It might be that this workflow is wrong.
This is a sample of what the repository now looks like (screenshot + link to repo):
The repository can be found here: http://hg.vkarlsen.no/hgweb.cgi/parallel_test/graph. Unfortunately this repository is no longer available and I no longer have a copy of the code due to an unfortunate backup incident, but this was just an example for people to visit, it should not be important any more
This looks awfully messy, and as I said, I can't seem to find any repositories that have similar history. By "messy", I mean that it looks like older history of the project will almost always have 10 parallel branches. Close to the top, it tapers off of course, but it will expand as people that are currently working in their local repository pushes to the master.
So I have two questions:
Can anyone show me a repository that has similar history? Since I can't seem to find any, I'm starting to wonder about what kind of conclusions I can draw from that...
Is there something wrong with our workflow (that is, the workflow I've laid out here)? Should we rebase/squash/transplant, delegate push responsibility to one person, other things, instead of the way it was done here?

Impressive preparation!
It always looks messy if you go back a bit and look at all old commits at the same time. It always tapers of, even looking at a small bit old history. See http://hg.intevation.org/mercurial/crew/graph/12402?revcount=120 for instance. This is not the most recent commit, but shows all history up to that commit.
Rebase helps quite a lot, especially if persons are working on separate areas. (I usually check the incoming commits to see if there are potential file or functionality conflicts, and if not, I do rebase.)
Rebase is not fool-proof though, so merge is the preferred "safe" action, but it leaves more "garbage" in the history. A trade-off.
Rebase is sort-of like the bog standard SVN update. The existing stuff is made baseline and your changes go on top, cross your fingers it still works. It's useful, but there are times when you feel safer having yours, theirs and the merge as separate commits in the history.
There is also commit-squashing as an option (histedit extension maybe), which squashes all in-between commits to one. This is useful when you're about to push and want to transferring many partials commits in your own repo as a single commit to the main.

I have 12 developers working in the same Mercurial repository at work, and our history looks nothing like that. There are occasional merge commits, but most merges are from merging actual branches, i.e there might be a merge in our main development branch bringing in changes from a bugfix release made on the production/release branch.
This is very easy to achieve, developers hack and commit to their local repository and when they have something stable enough to share with the rest of the team they push.
If nothing has been committed since they started committing the push goes through without problems.
If someone else has committed a change, Mercurial complains that the push will create remote heads. The developer then does a hg pull --rebase and retries the push. The push goes through and everyone is happy.
If you are using continuous integration with developers regularly pushing to a shared repository, this is the way to go. Knowing whether you have pushed changes or not is easy and you avoid lots of useless merge commits cluttering up your history.

Doing without partial commits the "Mercurial way"

Subversion shop considering switching to Mercurial, trying to figure out in advance what all the complaints from developers are going to be. There's one fairly common use case here that I can't see how to handle.
I'm working on some largish feature, and I have a significant part of the code -- or possibly several significant parts of the code -- in pieces all over the garage floor, totally unsuitable for checkin, maybe not even compiling.
An urgent bugfix request comes in. The fix is nice and local and doesn't touch any of the code I've been working on.
I make the fix in my working copy.
Now what?
I've looked at "Mercurial cherry picking changes for commit" and "best practices in mercurial: branch vs. clone, and partial merges?" and all the suggestions seem to be extensions of varying complexity, from Record and Shelve to Queues.
The fact that there apparently isn't any core functionality for this makes me suspect that in some sense this working style is Doing It Wrong. What would a Mercurial-like solution to this use case look like?
Edited to add: git, by contrast, seems designed for this workflow: git add the bugfix files, don't git add anything else (or git reset HEAD anything you might have already added), git commit.

Here's how I would handle the case:
have a dev branch
have feature branches
have a personal branch
have a stable branch.
In your scenario, I would be committing frequently to my branch off the feature branch.
When the request came in, I would hg up -r XYZ where XYZ is the rev number that they are running, then branch a new feature branch off of that(or up branchname, whatever).
Perform work, then merge into the stable branch after the work is tested.
Switch back to my work and merge up from the top feature branch commit node, thus integrating the two streams of effort.

Lots of useful functionality for Mercurial is provided in the form of extensions -- don't be afraid to use them.
As for your question, record provides what you call partial commits (it allows you to select which hunks of changes you want to commit). On the other hand, shelve allows to temporarily make your working copy clean, while keeping the changes locally. Once you commit the bug fix, you can unshelve the changes and continue working.
The canonical way to go around this (i.e. using only core) would probably be to make a clone (note that local clones are cheap as hardlinks are created instead of copies).

You would clone the repository (i.e. create a bug-fix branch in SVN terms) and do the fix from there.
Alternatively if it really is a quick fix you can use the -I option on commit to explicitly check-in individual files.

Like any DVCS, branching is your friend. Branching a repository multiple ways is the bread and butter of these system. Here's a git model you might consider adopting that works quite well with Mercurial, also.

In addition to what Santa said about branching being your friend...
Small-granularity commits are your friend. Rather than making lots of code changes in a single commit, make each logically self-contained code change in its own commit. Then it will be a lot easier to cherry-pick changes to merge between branches.

Don't use Mercurial without using the Mq Extension (it comes pre-packaged in the default installation). In addition to solving your specific problem, it solves a lot of other general problems and really should be the default way that you work (especially if you're using an IDE that doesn't integrate directly with Hg, making switching branches on the fly a difficult way to work).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008