Mercurial sub-repositories

Mercurial sub-repositories - mercurial

I read the tutorial many times and I feel that I am still missing something.
I'll just try to give a concrete scenario. Please help me find where I'm
wrong.
Suppose I have a repository which everyone considers as "central". This
means that every new developer clones from it and pull/push from/to it.
Central contains three folders-
Infra (which is about to be a shared code)
.hg
infra.txt
dev1
dev1.txt
.hgsub (in which there's a line --> infra = (path of infra) )
infra (subrepo)
.hg
infra.txt
dev2
dev2.txt
.hgsub (the same as in dev 1 - infra = (path to infra) )
infra (subrepo)
.hg
infra.txt
Now, suppose that one developer clones dev1, and another one clones dev2.
What I see is that when the developer of dev1 changes infra and pushes the
changes to the repository in central, the only way of dev2 developer to know
about the change in infra is to manually search for incoming change-sets in
infra as a sub-repository. Generally, It means that if my project has many
sub-repositories (that may themselves contain some more sub-repositories) ,
I have no way to know about the changes except for going over my
sub-repositories manually.
I think that's not the way to work...
Can anyone help?
Thanks in advance,
Eyal

I think I have found something better.
You can use --subrepos flag when checking for incoming change-sets in a repository.
This will search for incoming change-sets recursively, and show us the sub-repositories in which change-sets can be pulled.
This way, one can control on which sub-repositories are changed, and whether she wants to get up-to date files in those sub-repositories.

You are going to have to pull for each repository. You might think this tedious but there's no way mercurial is going to make the decision to pull changes into your repository for you - this is a good thing.
What you can do is create a simple batch script that runs a 'hg pull' command against each repository. That at least automates the process so it feels less tedious when you really want to pull from all repos.
We moved all our subrepos into one repository which makes it much simpler to manager a change/new feature that requires alterations to all our libraries.
I like subrepos but I think they are best suited for pulling in entire repositories that others look after that remain pretty stable. When there's a lot of changes, you need a lot of discipline and a certain amount of scripting to keep manual work down to a minimum.

Related

Mercurial: Incomplete central repository possible?

I want to realize the following setup:
AtWork:MercurialRepo <-> Internet:MercurialRepo <-> AtHome:MercurialRepo
Problem is the repository is several gigs. I already have the entire repo at home (through bundling->cdrom->unbundling). The thing is, I do not want to store the whole repository on the internet. Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet? How do I best accomplish my goal? From time to time I would need to do the tedious bundling -> cdrom -> unbundling route, just to update everything else, but in general I do want to use the internet route and do not want to store the whole repo there.

So, as you've found out by now you can't selectively clone some files from a repository. The best you can do is clone a subset of all branches; but you will get the entire past history of these branches, for all files in the repository. So, unless a lot of the big files are only known in some branches and not others, this won't help you.
Since your problem is the large size of files (rather than a long and bulky history), you probably need to break it down into several "subrepositories" of manageable size. Note that the subset you are interested in cloning must be a subrepository; cloning the main repo necessarily includes the subrepositories. The mercurial subrepository documentation recommends that you make a trivial ("thin shell") main repo, and put all your project code in subrepositories.
Subrepositories are a complex solution, and are considered a "feature of last resort" by the mercurial team. It's a complex setup, there are various limitations (see the docs), and you'll have the extra complication of trying to convert your repo in a way that will preserve file history. So, it's worth considering ways to avoid this:
a) It would be best if you can avoid the middle copy of your repo; is there no way you can set up ssh access or a proxy so that your home repo can talk to your work repo directly? (Or vice versa; it's enough if one of the locations is able to contact the other).
b) You could carry the repo on a USB stick, as #vaclav's answer suggests.
c) Or maybe you should just bite the bullet and clone the entire repo on the internet.

Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet?
Not folders, but some parts of repo - yes
You can push -b (only some branch(es)) or push -r (revision with ancestors: for latest work it will be -r tip), but final size of transfer is heavy dependent from type of your DAG - in case of a lot of cross-branch merges you probably skip only small part of changesets

I have small idea, bit different from what you asked, but...
If I have same issue, I would thing of using usb flash as whole repository (if you are about 10 or 20 gig it should be cheap). So at work you can copy, or clone whole repo to usb, pull new changes from it at home, and after your home working is done, push it to repo on flash, then pull it to repo at work(I use even temporary commits for undone work which I revert to working directory and strip, so I can continue where I ended).
But definitely easiest way, is to try get some connection to work servers, or to your machine at work. Or get bigger space for repo at internet. So, just another Ideat. HTH

Is not really possible. The closest thing would be to use sub-repositories which will effectively allow you to have only part of your big repo on the net.

Keeping subrepos in sync on central server

I keep thinking I understand subrepos and how I can make them work for my teams workflow, but clearly I don't, because every time I try to implement some basic workflow, something doesn't end up working right.
I've read pretty much everything there is to read about subrepos online, and I follow all of the trivial examples people post, but when I try to do something more complicated. Or maybe I do understand it perfectly well, and what I'm trying to do is just not something that works well.
Lets get the basics out of the way. Lets say I have a remote "blessed" collection of repos.
http://acme.com/BlessedRepos/ProjA
/LibA
/LibB
So I do a clone of /ProjA to C:\ProjA and clone /LibA to C:\ProjA\LibA and /LibB to C:\ProjA\LibB. I create my .hgsub file with
LibA = http://acme.com/BlessedRepos/LibA
LibB = http://acme.com/BlessedRepos/LibB
I commit everything. I can then push ProjA and all is well.
So now someone on my team can go clone /PrjoA to C:\dev\ProjA and it will bring down LibA and LibB too as subrepos. This person can easily push/pull from the "blessed repo" just like I can. So good so far.
Now, I say: Ok, ProjA Team, stop pushing to the blessed repo, that's for me to do after reviewing your work. Starting now, I want you all to push your changes to the ProjA dev and ProjA QA remote repos located at:
http://acme.com/Dev/ProjA
http://acme.com/QA/ProjA
This is where we stop. Trying to push to http://acme.com/Dev/ProjA will only push /ProjA, while /ProjA/LibA and /ProjA/LibB get pushed back to their original location in the blessed repo and not the desired location of http://acme.com/Dev/ProjA.
Now, I could have setup my .hgsub file as LibA = ../LibA. This would work initially, but if I were to do a clone of ProjA from the blessed repo, it fails to get LibA or LibB, I believe because it's expecting to find local repos LibA and LibB as siblings to the ProjA repo I'm cloning. What I mean is if I'm cloning to http://acme.com/BlessedRepos/ProjA to C:\Test\ProjA it will fail because it expect to find an existing repo at C:\Test\LibA.
I could also have setup my hgsub as LibA = LibA. But doing this fails when you try to push to the blessed repo as LibA is not a nested of ProjA in the blessed space. I could create them, but then I'm never pushing back to http://acme.com/BlessedRepos/LibA, only to http://acme.com/BlessedRepos/ProjA/LibA, and then it seems that has defeated the purpose of the subrepo to begin with.
I'm pretty sure my first method could work if I also had some script that I would run that would go through and change out all the values in the .hgsub file from "blessed" remote locations to the "dev" and "QA" locations, but this seems less than ideal.
So. If there is anyone out there that really groks this stuff, could you either explain to me where I've gone wrong, or how I could achieve my original workflow using subrepos, or maybe just confirm that I am going after something that isn't really suited for subrepos. If it helps to understand the situation, we have probably something like 15-20 "products/solutions" and 50 "shared" projects. Any of the 15-20 products can make use of N number of the 50 shared projects in it's solution.

The key part you are missing is that you can expose the LibA and LibB repositories multiple times on the server without having multiple copies on the server. Please see my answer to another question about subrepos for the details.
Also, just come talk to us at #mercurial if you have problems like that -- that's much better than writing long posts on StackOverflow since that's not where the Mercurial community is anyway. You can also use the mailinglists we have.

Help understanding the benefits of branching in Mercurial

I've struggled to understand how branching is beneficial. I can't push to a repo with 2 heads, or 2 branches... so why would I ever need/use them?

First of all, you can push even with two heads, but since you probably don't want to do that, the default behavior is to prevent you from doing it. You can, however, force the push to go through.
Now, as for branching, let's take a simple scenario in a non-distributed version control system, like Subversion.
Let's assume you have a colleague that is working in the same project as you. The current latest changeset in the Subversion repository is revision 100, you both update to this locally so that now both of you have the same files.
Ok, now your colleague has already been working on his changes for a couple of hours now, and so he commits. This brings the central repository up to revision 101. You're still on revision 100 locally, and you're still working on your changes.
At some point, you complete, and you want to commit, but Subversion won't let you. It says you have to update first, so you start the update process.
The update process wants to take your changes, and pretend you actually started with revision 101 instead of 100. If your changes are not in conflict with whatever it was your colleague committed, all is hunky dory, but if your changes are in conflict, you have a problem.
Now you have to merge your changes with his changes, and things can go haywire. For instance, you might end up merging one file OK, the second file OK, or so you think, and then the third file, and you suddenly discover that you've got some of the details wrong, it would've been better to merge the second file differently.
Unless you made a backup of your changes before updating, and sooner or later you will forget, you have a problem.
Now, the above scenario is actually quite common. Well, perhaps not the merging part, it depends on how many is working in the same area or files at the same time, but the "must update before committing" part is quite common with Subversion.
So how does Mercurial do it?
Well, Mercurial commits locally, it doesn't talk to any remote repository at all, so it won't stop you from committing.
So, let's try the above scenario again, just in Mercurial this time.
The tipmost changeset in the remote repository is revision 100. You both have cloned this down, and you're both starting to work on the changes, from revision 100.
Your colleague completes his changes and commits, locally. He then pushes his changeset up to the central repository, bringing the tip there up to revision 101.
You then complete your changes, and commit, also locally, and then you want to push, but you get the error message you've already discovered, and is asking about.
So how is this different?
Well, your changes are now committed, there is no way, unless you try really hard to accidentally lose them or destroy them.
Here's the 3 repositories in play and their current state:
Colleague ---98---99---100---A
Central ---98---99---100---A
You ---98---99---100---B
If you were to push, and was allowed to do this (or force the push through), the Central repository would look like this:
Central ---98---99---100---A
\
+--B
Two heads. If your colleague now pulled, which one should he continue working from? This question is the reason Mercurial will by default prevent you from causing this.
So instead you pull, and you get the above state in your own repository.
In other words, you can chose to impact your own repository and create multiple heads there, but you are not imposing that problem on anyone else.
You then merge, the same type of operation you had to do in Subversion, except your changeset is safe, it was committed, and you won't accidentally corrupt or destroy it. If, mid-merge, you want to start over, you can, nothing lost, no harm done.
After the merge, your local repository looks like this:
You ---98---99---100---A----M
\ /
+--B--+
This is now safe to push, and if your colleague now pulls, he knows that he has to continue from the M changeset, the one that merged his and your changes.
The above description is what happens due to Mercurials distributed nature.
You can also name branches, to make them more permanent. For instance, you might want to name a branch "stable", to signal that any changesets on that branch have been thoroughly tested and is safe for release to customers or to put into production. Then you would only merge changes onto that branch when said testing has been completed.
The nature, however, is the same as the above description. Whenever more than one person works on a project with Mercurial, you will get branches, and that's a good thing.

Whenever more than one clone of a repo is made and commits are made in those clones, branches happen, whether you name them by using the hg branch command or not. My philosophy is, you might as well give them a name. It makes things less confusing.
A good explanation of mercurial branches: http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/

Correct (best-practise?) procedure to stay in sync with a remote Mercurial repository?

As a former user of Subversion, we've decide to move over to Mercurial for SCM and it is confusing us a little. Although Mercurial is a distributed SCM tool we are using a remote repo to keep changes we make backed up on a server but we are finding a few teething troubles.
For example, when two or three of us work on our local repo's, we commit then push to the remote repo, we find that a lot of heads(?) are created. This confused the hell out of us and we had to do some merging etc to sort it out.
What is the best way to avoid so many heads and to keep a remote repo in sync with a number of developers?
Today, i've been working like this:
Change a File.
Pull from remote repo.
Update local working copy.
Merge? (why?)
Commit my changes to local repo.
Push to the remote repo.
Is this the best proceedure?
Although this has worked fine today, i can't help that feeling that i'm doing it wrong! To be honest i don't understand why merging even needs to be done at the pull stage because other people are working on different files?
Other than to tell me to RTFM have you any tips for using Mercurial is such a way? Any good online resources for information on why we get so many heads?
NOTE: I have read the manual but it doesn't really give much detail and i don't think i want to start another book at the minute.

You should definitely find some learning resources.
I can recommend the following:
hginit.com
Tekpub: Mercurial
As for your concrete question, "is this the best procedure", then I would have to say no.
Here's some tips.
First of all, you don't need to stay "in sync" with the central repository at all times. Instead, follow these guidelines:
Push from your local repository to the central one when you're happy with the changes you've committed. Remember, this can be several changesets
Pull if you need changes others have done right away, ie. there's a bugfix a colleague of yours has fixed, that you need, in order to continue with your own work.
Pull before push
Merge any extra heads you pulled down with your own changes, before you push, or continue working
In other words, here's a typical day.
You pull the latest changes when you come in in the morning, so that you got an up to date local clone. You might not always do this, if you're in the middle of bigger changes that you didn't finish yesterday.
Then you start working. You commit small changesets with isolated changes That isn't to say that you split up a larger bugfix into many smaller commits just because you modify multiple files, but try to avoid fixing more than one bug at a time, or implementing more than one feature at a time. Try to stay focused.
Then, when you're happy with all the changesets you've added locally, you decide to push to the server. When you try to do this, you get an abort message saying that extra heads would be pushed to the server, and this isn't allowed, so the push is aborted.
Instead you pull. This can always be done, but will of course now add extra heads in your local clone, instead of at the server.
Then you merge, the extra head that you got from the server, with your own head, the one that you created during the day by committing new changesets to your clone. You resolve any merge conflicts.
Then you push, and now it should succeed. On the off chance that someone has managed to push more changesets to the central repository while you were busy merging, you will get another abort and have to rinse and repeat.
The history will now show multiple parallel branches of development, but should always stay at max 1 head in your central repository. If, later on, you start using named branches, you can have 1 head per named branch, but try to avoid this until you get the hang of just the default branch.
As for why you need to merge? Well, Mercurial always work with revisions that are snapshots of the entire project, which means two branches, even though they contain changes to different files, are really considered two different versions of the entire project, and you need to tell Mercurial that it should combine them to get back to one version.

For one, you can pull at any time; pulling does just add changesets to your repo, but not change your local working files (except if you have enabed the post-pull update).
Merging is necessary if someone else has commited changes to the same branch you're currently working on. This created an implicit branch, and merging merely brings them back together. You can see this nicely with the "railroad track" in the repository view. Basically, as long as you don't merge, you stay on your own "private" track, and when you want to add your changes (can be any amount of changesets) you merge it back into the destination branch (typically "default"). It's painless - nothing like merging in older SVN versions!
So the workflow is not as rigid as you displayed it; it's more like this:
Pull as much as you like
Make changes and commit locally as often as you like
When your changes should be integrated, merge with the destination branch (can be a lower revision than the newest), commit and push
This workflow can be tuned somewhat, for instance by using named branches and sometimes by using rebase. However, you and your team should decide on the workflow to be used; Mercurial is quite flexible in this regard.

http://hginit.com has a good tutorial.
In particular, you'll find the list of steps you have here: http://hginit.com/02.html (at the bottom of the page)
The difference between those steps and yours is that you should commit after step 1. In fact you will typically commit several times on your local repository before moving onto the pull/merge/push step. You don't need to share every commit with the rest of developers right away. It'll often make sense to do several related changes and then push that whole thing.

Creating multiple heads in remote repository

We are looking to move our team (~10 developers) from SVN to mercurial. We are trying to figure out how to manage our workflow. In particular, we are trying to see if creating remote heads is the right solution.
We currently have a very large repository with multiple, related projects. They share a lot of code, but pieces of the project are deployed by different teams (3 teams) independent of other portions of the code-base. So each team is working on concurrent large features.
The way we currently handles this in SVN are branches. Team1 has a branch for Feature1, same deal for the other teams. When Team1 finishes their change, it gets merged into the trunk and deployed out. The other teams follow suite when their project is complete, merging of course.
So my initial thought are using Named Branches for these situations. Team1 makes a Feature1 branch off of the default branch in Hg. Now, here is the question. Should the team PUSH that branch, in it's current/half-state to the repository. This will create a second head in the core repo.
My initial reaction was "NO!" as it seems like a bad idea. Handling multiple heads on our repository just sounds awful, but there are some advantages...
First, the teams want to setup Continuous Integration to build this branch during their development cycle (months long). This will only work if the CI can pull this branch from the repo. This is something we do now with SVN, copy a CI build and change the branch. Easy.
Second, it makes it easier for any team member to jump onto the branch and start working. Without pushing to the core repo, they would have to receive a push from a developer on that team with the changeset information. It is also possible to lose local commits to hardware failure. The chances increase a lot if it's a branch by a single developer who has followed the "don't push until finished" approach.
And lastly is just for ease of use. The developers can easily just commit and push on their branch at any time without consequence (as they do today, in their SVN branches).
Is there a better way to handle this scenario that I may be missing? I just want a veteran's opinion before moving forward with the strategy.
For bug fixes we like the general workflow of Mercurial, anonymous branches that only consist of 1-2 commits. The simplicity is great for those cases.
By the way, I've read this, great article which seems to favor Named branches.

You're definitely thinking about this right, and it sounds like you're going down a good path. I'm a branches as clones person, but named branches have come a long way.
Having a central-ish repo to which all named branches are pushed is convenient for control and backups. Teams working on only branch X can easily create their own branch X only repo by doing hg clone -r X central-ish repo.
The best thing you can do to help the teams out is to let them do clones themselves somewhere that's sitting behind a hgwebdir.cgi instance (as, presumably your central-ish repo will be). You'll find not just teams, but sub-teams and pairs of teams will set up their own repos for mini-efforts you never new about. They'll put them on the named branches that make sense to them and merge back into central as appropriate.

I would make the decision if these three projects should go into one repository by the coupling between these projects (and how many patches are interchanged within them). The more independent they are the less are the advantages of having them in one repo (backup and management aside). There are some different kind of setups:
As you showed, one repository, with one branch for the shared code, and one branch for each project. When the projects itself are generated by forking the shared code base care must be taken when merging back to common (cherry-picking). When inside of each project-branch updates to the common-branch are generated as direct ancestors of the common-branch, and get merges into the project-branch, chances are good they can also be merged back into common. But if changes to common are developed on top of the project branch, merging back will require cherry-picking. I don't have experiences with such a setup, but I fear that the merges can get problematic.
one repo for the shared code and one for each project, connected by symlinks or as subrepo. Here care must be taken to not step on each others feed. In my experience this kind of usage has the potential to grow into a very big PITA. OTOH you seem to have this setup already and your fellow developers can work with it.
one repo for shared and one for each project, with the code from the shared one used as internal releases. I would go for this setup when there are not big regular changes on the shared code base.
All these situations can also be combined with customization-branches for each project within the common part. But I would try to minimize the number of currently active branches, since every new branch requires additional care of merges.
I'm sorry to not give a concrete answer, but "The right thing" (TM) depends to much on the local details.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008