Keeping subrepos in sync on central server - mercurial

I keep thinking I understand subrepos and how I can make them work for my teams workflow, but clearly I don't, because every time I try to implement some basic workflow, something doesn't end up working right.
I've read pretty much everything there is to read about subrepos online, and I follow all of the trivial examples people post, but when I try to do something more complicated. Or maybe I do understand it perfectly well, and what I'm trying to do is just not something that works well.
Lets get the basics out of the way. Lets say I have a remote "blessed" collection of repos.
http://acme.com/BlessedRepos/ProjA
/LibA
/LibB
So I do a clone of /ProjA to C:\ProjA and clone /LibA to C:\ProjA\LibA and /LibB to C:\ProjA\LibB. I create my .hgsub file with
LibA = http://acme.com/BlessedRepos/LibA
LibB = http://acme.com/BlessedRepos/LibB
I commit everything. I can then push ProjA and all is well.
So now someone on my team can go clone /PrjoA to C:\dev\ProjA and it will bring down LibA and LibB too as subrepos. This person can easily push/pull from the "blessed repo" just like I can. So good so far.
Now, I say: Ok, ProjA Team, stop pushing to the blessed repo, that's for me to do after reviewing your work. Starting now, I want you all to push your changes to the ProjA dev and ProjA QA remote repos located at:
http://acme.com/Dev/ProjA
http://acme.com/QA/ProjA
This is where we stop. Trying to push to http://acme.com/Dev/ProjA will only push /ProjA, while /ProjA/LibA and /ProjA/LibB get pushed back to their original location in the blessed repo and not the desired location of http://acme.com/Dev/ProjA.
Now, I could have setup my .hgsub file as LibA = ../LibA. This would work initially, but if I were to do a clone of ProjA from the blessed repo, it fails to get LibA or LibB, I believe because it's expecting to find local repos LibA and LibB as siblings to the ProjA repo I'm cloning. What I mean is if I'm cloning to http://acme.com/BlessedRepos/ProjA to C:\Test\ProjA it will fail because it expect to find an existing repo at C:\Test\LibA.
I could also have setup my hgsub as LibA = LibA. But doing this fails when you try to push to the blessed repo as LibA is not a nested of ProjA in the blessed space. I could create them, but then I'm never pushing back to http://acme.com/BlessedRepos/LibA, only to http://acme.com/BlessedRepos/ProjA/LibA, and then it seems that has defeated the purpose of the subrepo to begin with.
I'm pretty sure my first method could work if I also had some script that I would run that would go through and change out all the values in the .hgsub file from "blessed" remote locations to the "dev" and "QA" locations, but this seems less than ideal.
So. If there is anyone out there that really groks this stuff, could you either explain to me where I've gone wrong, or how I could achieve my original workflow using subrepos, or maybe just confirm that I am going after something that isn't really suited for subrepos. If it helps to understand the situation, we have probably something like 15-20 "products/solutions" and 50 "shared" projects. Any of the 15-20 products can make use of N number of the 50 shared projects in it's solution.

The key part you are missing is that you can expose the LibA and LibB repositories multiple times on the server without having multiple copies on the server. Please see my answer to another question about subrepos for the details.
Also, just come talk to us at #mercurial if you have problems like that -- that's much better than writing long posts on StackOverflow since that's not where the Mercurial community is anyway. You can also use the mailinglists we have.

Related

Mercurial: Incomplete central repository possible?

I want to realize the following setup:
AtWork:MercurialRepo <-> Internet:MercurialRepo <-> AtHome:MercurialRepo
Problem is the repository is several gigs. I already have the entire repo at home (through bundling->cdrom->unbundling). The thing is, I do not want to store the whole repository on the internet. Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet? How do I best accomplish my goal? From time to time I would need to do the tedious bundling -> cdrom -> unbundling route, just to update everything else, but in general I do want to use the internet route and do not want to store the whole repo there.
So, as you've found out by now you can't selectively clone some files from a repository. The best you can do is clone a subset of all branches; but you will get the entire past history of these branches, for all files in the repository. So, unless a lot of the big files are only known in some branches and not others, this won't help you.
Since your problem is the large size of files (rather than a long and bulky history), you probably need to break it down into several "subrepositories" of manageable size. Note that the subset you are interested in cloning must be a subrepository; cloning the main repo necessarily includes the subrepositories. The mercurial subrepository documentation recommends that you make a trivial ("thin shell") main repo, and put all your project code in subrepositories.
Subrepositories are a complex solution, and are considered a "feature of last resort" by the mercurial team. It's a complex setup, there are various limitations (see the docs), and you'll have the extra complication of trying to convert your repo in a way that will preserve file history. So, it's worth considering ways to avoid this:
a) It would be best if you can avoid the middle copy of your repo; is there no way you can set up ssh access or a proxy so that your home repo can talk to your work repo directly? (Or vice versa; it's enough if one of the locations is able to contact the other).
b) You could carry the repo on a USB stick, as #vaclav's answer suggests.
c) Or maybe you should just bite the bullet and clone the entire repo on the internet.
Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet?
Not folders, but some parts of repo - yes
You can push -b (only some branch(es)) or push -r (revision with ancestors: for latest work it will be -r tip), but final size of transfer is heavy dependent from type of your DAG - in case of a lot of cross-branch merges you probably skip only small part of changesets
I have small idea, bit different from what you asked, but...
If I have same issue, I would thing of using usb flash as whole repository (if you are about 10 or 20 gig it should be cheap). So at work you can copy, or clone whole repo to usb, pull new changes from it at home, and after your home working is done, push it to repo on flash, then pull it to repo at work(I use even temporary commits for undone work which I revert to working directory and strip, so I can continue where I ended).
But definitely easiest way, is to try get some connection to work servers, or to your machine at work. Or get bigger space for repo at internet. So, just another Ideat. HTH
Is not really possible. The closest thing would be to use sub-repositories which will effectively allow you to have only part of your big repo on the net.

Can I work in the repository in a single user Mercurial workflow?

I use Mercurial in a single-user workflow to have the option to roll back changes if my coding or writing goes horribly wrong (I primarily use the Stata and R statistics packages and LaTeX). While working only locally, this has been easy since all I have is the main repo.
Recently I have started ssh-ing into a Linux server for more computational power. So far I have been manually copying files back and forth and using Mercurial only locally, but I would like to use Mercurial to take care of this and keep these two workflows synchronized. Also, I like the ability to code both locally (on my laptop or desktop) and on the server.
Do I need to work on a clone of the main repo on the server and keep the main repo untouched? Or can I work directly in the main repo when I am on the server? In this question #gizmo points to this workflow guide; the "single developer" discussion is helpful, but it's still not clear to me that I can work in the main repo while I'm on the server without causing some major problem that I don't yet understand.
Thanks!
Edit: I should add that I have worked through Joel Spolsky's HgInit.com tutorial and I'm comfortable pushing/pulling/cloning/etc over ssh, but I am still not sure if I can work in the main repo without causing heartache later. Or maybe this is more a philosophical question? Thanks!
Mercurial is DVCS, it means - in each location you have both: local working copy and local repository
Mercurial is DVCS, it means - you can freely exchange (pull|push) data between repos (if they provide remote-access methods).
If you
comfortable pushing/pulling/cloning/etc over ssh
and don't forget perform pull|push cycle around your work at home (in order to don't run hg serve at home-host and sync from server as source) you don't get any headache at all with perfect linear aggregated history on each place. And even you forget to sync repo sometimes, you get in worst case two heads later, which you'll be able to merge easy (doesn't know formats of Stata and R data-files, but LaTeX, as text, is mergeable)
There is no problem with working directly in the repository on your server. From Mercurial's point of view, the "main" repository is just another random repository — Mercurial doesn't consider it to be special.
You don't say this directly, but one thing that people ask is "What happens when I push to the server?" The answer is that hg push only sends data into the repository (the .hg/ folder). The working copy is not touched on the server when you push to it. Since you push new changesets to the server, you might need to run hg update the next time you work on the server. This is just like if you had run hg pull on the server — there you'll also merge or update afterwards.
I have this situation all the time: I create a repository at home and clone it to my computer at work. I change files in either location and push/pull between the two repositories. If I need to share my work with others, then I make a repository at Bitbucket and push the code there. That way Bitbucket serves as a nice canonical repository for the code and I typically change the default path to Bitbucket in the repositories at home and at work. So at home I would have:
[paths]
default = httsp://bitbucket.org/mg/<repo>/
work = ssh://mg#work/<repo>
so that I can do hg push to send things to Bitbucket and hg pull work to grab things directly from work (in case I forgot to push to Bitbucket before leaving).

Mercurial sub-repositories

I read the tutorial many times and I feel that I am still missing something.
I'll just try to give a concrete scenario. Please help me find where I'm
wrong.
Suppose I have a repository which everyone considers as "central". This
means that every new developer clones from it and pull/push from/to it.
Central contains three folders-
Infra (which is about to be a shared code)
.hg
infra.txt
dev1
dev1.txt
.hgsub (in which there's a line --> infra = (path of infra) )
infra (subrepo)
.hg
infra.txt
dev2
dev2.txt
.hgsub (the same as in dev 1 - infra = (path to infra) )
infra (subrepo)
.hg
infra.txt
Now, suppose that one developer clones dev1, and another one clones dev2.
What I see is that when the developer of dev1 changes infra and pushes the
changes to the repository in central, the only way of dev2 developer to know
about the change in infra is to manually search for incoming change-sets in
infra as a sub-repository. Generally, It means that if my project has many
sub-repositories (that may themselves contain some more sub-repositories) ,
I have no way to know about the changes except for going over my
sub-repositories manually.
I think that's not the way to work...
Can anyone help?
Thanks in advance,
Eyal
I think I have found something better.
You can use --subrepos flag when checking for incoming change-sets in a repository.
This will search for incoming change-sets recursively, and show us the sub-repositories in which change-sets can be pulled.
This way, one can control on which sub-repositories are changed, and whether she wants to get up-to date files in those sub-repositories.
You are going to have to pull for each repository. You might think this tedious but there's no way mercurial is going to make the decision to pull changes into your repository for you - this is a good thing.
What you can do is create a simple batch script that runs a 'hg pull' command against each repository. That at least automates the process so it feels less tedious when you really want to pull from all repos.
We moved all our subrepos into one repository which makes it much simpler to manager a change/new feature that requires alterations to all our libraries.
I like subrepos but I think they are best suited for pulling in entire repositories that others look after that remain pretty stable. When there's a lot of changes, you need a lot of discipline and a certain amount of scripting to keep manual work down to a minimum.

Mercurial repositories with many active developers?

I'm going through Bitbucket and I can't seem to find any Mercurial repositories that look like what I suspect our repository would look like, provided we switch to Mercurial.
As such, I'm wondering, is there a workflow that we're not considering here?
The thing I'm talking about is that I did a small automated test. We're 14 people that work on the same project, split into 4 scrum teams. To simulate 14 (I picked 10, round number) people working in parallel on the code, using Mercurial DVCS, pushing to the same central master repository, I wrote a script.
I created a new "master" repository, and then cloned it for 10 virtual people
I then ran a 1000 iteration loop, picking a random clone, and doing one of the following:
10% of the time, do a pull from master, merge, commit merge, and push
90% of the time, do a local change and commit
Note that I ensured that there would never be merge conflicts by simply making each virtual person work on his own file.
This would simulate people working locally by doing 1+ commits before pulling, merging, and pushing (to avoid 2+ heads in the master repo). It might be that this workflow is wrong.
This is a sample of what the repository now looks like (screenshot + link to repo):
The repository can be found here: http://hg.vkarlsen.no/hgweb.cgi/parallel_test/graph. Unfortunately this repository is no longer available and I no longer have a copy of the code due to an unfortunate backup incident, but this was just an example for people to visit, it should not be important any more
This looks awfully messy, and as I said, I can't seem to find any repositories that have similar history. By "messy", I mean that it looks like older history of the project will almost always have 10 parallel branches. Close to the top, it tapers off of course, but it will expand as people that are currently working in their local repository pushes to the master.
So I have two questions:
Can anyone show me a repository that has similar history? Since I can't seem to find any, I'm starting to wonder about what kind of conclusions I can draw from that...
Is there something wrong with our workflow (that is, the workflow I've laid out here)? Should we rebase/squash/transplant, delegate push responsibility to one person, other things, instead of the way it was done here?
Impressive preparation!
It always looks messy if you go back a bit and look at all old commits at the same time. It always tapers of, even looking at a small bit old history. See http://hg.intevation.org/mercurial/crew/graph/12402?revcount=120 for instance. This is not the most recent commit, but shows all history up to that commit.
Rebase helps quite a lot, especially if persons are working on separate areas. (I usually check the incoming commits to see if there are potential file or functionality conflicts, and if not, I do rebase.)
Rebase is not fool-proof though, so merge is the preferred "safe" action, but it leaves more "garbage" in the history. A trade-off.
Rebase is sort-of like the bog standard SVN update. The existing stuff is made baseline and your changes go on top, cross your fingers it still works. It's useful, but there are times when you feel safer having yours, theirs and the merge as separate commits in the history.
There is also commit-squashing as an option (histedit extension maybe), which squashes all in-between commits to one. This is useful when you're about to push and want to transferring many partials commits in your own repo as a single commit to the main.
I have 12 developers working in the same Mercurial repository at work, and our history looks nothing like that. There are occasional merge commits, but most merges are from merging actual branches, i.e there might be a merge in our main development branch bringing in changes from a bugfix release made on the production/release branch.
This is very easy to achieve, developers hack and commit to their local repository and when they have something stable enough to share with the rest of the team they push.
If nothing has been committed since they started committing the push goes through without problems.
If someone else has committed a change, Mercurial complains that the push will create remote heads. The developer then does a hg pull --rebase and retries the push. The push goes through and everyone is happy.
If you are using continuous integration with developers regularly pushing to a shared repository, this is the way to go. Knowing whether you have pushed changes or not is easy and you avoid lots of useless merge commits cluttering up your history.

Small, temporary branch in Mercurial

I've read a lot about Mercurial and branching in it, however, I am still very much a version control newbie.
I'm currently working on a project, where I have been tasked to work on a new module.
I have a "main" repository, which contains the latest code from the rest of the project, and a cloned repository (call it "task") where I am doing my work now.
I am a bunch of commits into my task, and find that I would like to do a little "experiment" with the way my program reads/stores/handles configuration data.
Now, if I understand VC best-practices correctly, this would be a great time to branch.
If I start into this experiment, and I like where it's going, I will want to merge it back into my "task" repository on the "default" branch pretty quickly.
On the other hand, if I don't like how it's going, I'll probably just scrap the branch.
The way I am most comfortable branching is through cloning, however I don't think this would be the best approach in this situation, as I'll only be changing a few files, but apparently using named branches is permanent, which doesn't seem appropriate here either.
What is your advice / best practice for this kind of situation?
I'm relatively new to Mercurial, but I know exactly the situation you are describing. I did some research on this before, and my conclusion was that the easiest way was to clone my repository.
See this answer for some more insight.
Also, this is a great guide to branching in Mercurial :)
Go with a clone, no doubt about it. A named branch in Mercurial is something that even the Mercurial folks say you don't need all that often. One of the beautiful things about DVCS is the fact that you can easily clone the repo and try some new and different things, and if they work, great, merge it back in to the main repo, otherwise, delete it all.
I personally use a "Branch By Feature" approach with Mercurial, which means that I will make a clone of my primary repo for each feature I'm working on. This includes spikes and experiments.