How do I use Mercurial to integrate major revisions? - mercurial

First, I am new to Mercurial and distributed source control systems as a whole. Generally I have used perforce, so I'm going to use perforce terminology in order to keep what I'm trying to say clear.
My issue is that I'm making a game based on an open source engine, and that engine has regular code drops. However, I am also making some changes to the engine code myself, in my own depot. I need to set things up so that I can easily merge changes from code drops in to my own code, without losing my changes, and without having to examine every single file manually.
In Perforce, what I'd do is have a branch for just the engine code, and then my main branch, and all engine code drops would be submitted to the engine code branch, and then I would integrate the engine code branch in to the main code branch. Resolve problems, submit, and voila.
I feel like this is pretty close to how it would work in Mercurial, only I'm missing some minor piece of understanding to help me figure it out. First, I'm not sure if my engine code should be in a branch, or a completely separate repository. And even if I did know that, I'm not clear as to how I'd move code back and forth and keep them properly separate.
Sorry if this is kind of a kitchen sink question. I tend to learn by tossing myself in the deep end.

First of all, I would separate the engine and the game in two repository. It helps if you want to use the modified engine elsewhere, if you want to contribute back to the original project, if you want to put someone on the engine but not on the game(s),... And to bring them back together, simply use the subrepo feature.
Now in the field of the game engine modifications, as long as there is no conflicting change, you simply have to pull, merge then commit.
Let's hypothesis a scenario:
1----2----4----5---------8----A----B <---- your changes
\ / / /
---3-------6----7----9----/ <---- original changes
One day you begin to use the engine (1). The engine is updated (2) but it is ok for you and you use it like that. In fact no, you have to change something (4), in the same time, changed are made on the original one (3). No problem, just fetch them (5) pull->merge->commit. Oh, they made a change (6) and another one (7). OK, let's include them (8) pull->merge->commit. And so on, they made changes (9), you make changes (A) and you merge them (B).
One unnatural thing to remember when switching from centralized to distributed version control is that branching and merging is a normal (and lightweight) process, not an exceptional one. Some people merge hundreds of time per day.
For more understanding try to search for "mercurial workflow" (here I exposed a minimal one) and read the excellent book Mercurial: The Definitive Guide by Bryan O'Sullivan
Follow up about comments
Consider a minimal project like this one:
mygame/
├── .hg/
├── .hgsub
├── lib/
│   └── engine/
│   ├── enginefile.cpp
│   └── .hg/
├── mygame.proj
└── src/
└── mygamefile.cpp
And now your comments:
Also, I would like to be able to work
on all my game's content in the same
repository[...]
If I understand well, in fact, you want to "be able to work on all [your] game's content in the same [project]". Correct me if I made a false guess.
Here, the two directories containing a .hg subdirectory are separate repositories (mygame and engine). But you can nest them without making separated projects in your IDE. So two nested repositories, but only one project. In your build configuration (Makefiles, solutions, ...), you can even make references from mygame to engine as the engine sub-repository is always present (typically to use headers from the engine in your game).
[...] would it be possible to get it
slightly more specific? Example
commands, repositories, paths, etc?
For the paths, look at the second figure.
To update the engine, in the engine directory (cd lib/engine): hg pull, hg merge, hg commit -m "merge new original with my modifications", cd .., hg commit -m "updated to new engine version", now you have the new version with your changes included.
For other basic use, it looks like other version control system. In your case, this article could be useful to map perforce to mercurial commands.

It sounds like you could do almost exactly the same thing you do with perforce using named mercurial branches in a single repository:
hg branch engine-code
hg ci -m "created engine code branch"
# add the engine code drop
hg ci -m "new drop"
hg update default
hg merge engine-code # merges engine-code drop into your default branch
# test the result of the merge, then commit it:
hg commit -m "merged new engine drop"
That's the initial setup. After that, when you want to add a new drop it's similar:
hg update engine-code
# add the new drop
hg ci -m "nother new drop"
hg update default
hg merge engine-code
hg ci -m "merged another new engine drop"

What you want is called a "vendor branch" -- a branch where you keep the clean code drops from your upstream vendor. You then make your own modifications in another branch and regularly merge in code drops that you have put into the vendor branch. Very much like you described it yourself in the question how krupan described it in his answer.
I have written a set of slides that explain how vendor branches work Mercurial. You should be able to follow them with your Perforce background. Searching the Mercurial mailinglist for "vendor branch" also shows many good hits.

Related

Mercurial: how do I create a new repository containing a subrange of revisions from an existing repo?

I have a repo with subrepos, with a long history. At some point the main subrepo became fully self-contained (doesn't depend on other sister subrepos).
I don't care anymore about the history of the whole thing before the main subrepo became self-contained. So I want to start a new repo that contains just what the subrepo has in it from that moment on. If possible, please describe in terms of TortoiseHg commands.
You probably want to make use of mercurial's convert extension. You can specify revisions to be converted, paths, branches and files to include or exclude in the newly-created repository.
hg convert from-path new-repo
Convert is a default extension which just needs activation.
You might be able to weed out any other changesets you don't need by using the hg strip command.
It will entirely remove changesets (and their ancestors) from a repository. Most likely you would want to make a fresh clone and then work on stripping it down.
One potential pitfall is that the final stripped repo would still have shared parentage with the original; therefore the potential exists for someone to accidentally pull down again changesets which were stripped.
The hg convert command (noted in another answer) does not have this downside.

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

Force pushing to new head when already pulled all changes

Is it possible to forcibly create new remote head when pushing ?
Suppose I have done some local commits on branch "default" then pulled and merged from remote.
Now, I would like to push my commits to remote creating new head and a bookmark but preserve existing remote head and tip - ie. my coworkers should not get my changes yet when doing hg fetch.
Basically this should be a short lived branch (thus not a named branch) for purpose of backup and code review by other before being fully merged into "main" head of default branch.
I've tried --new-branch but it didn't help - no new head was created and remote tip moved to my head.
You can use the --force option to force the creation of a new head.
The --new-branch option is used for named branch, in your case, we are talking of anonymous branching.
The reason why the "tip is moved" is because you merged the changeset recently pulled. Doing that, there's no way to do what you want.
You should just pull the new changes from the remote, and force push everything without merging, this will create a new head (called an anonymous branch), which can later be merged to the default branch by you or someone else after the code review.
You can also use a second repository to push your changes, but this is a totally different workflow.
You cannot preserve tip when you push: it is a pseudo-tag that always points to the newest changeset in a repository. The concept of tip is deprecated in Mercurial because tip can change meaning more or less randomly depending on the order of pushes — as you've seen.
The only way to create a new head is to, well, create it :-) By this I mean that you need two heads — one with your changes and one with the main code you want the colleges to pull and merge with. With only a single head (the one you got after running hg merge) there's no way to signal to the colleges that they shouldn't use it.
A much better approach is to use a separate repository on the server. Go to your repository management software and create a fork for your changes. Then push into that and tell your colleges to review it. They'll pull from your clone and look the changes over. If they like them, then they can merge with the main code and push to the normal repo. If they don't like the changes, then they can throw away their local clone, strip the changesets, or maybe just rollback the pull.
My solution to this issue is to use a previous revision for the start of the bookmark, if there is none or you just do not want to, you can make a dummy commit (like a small change to a README file, etc.) and bookmark the revision before that.
I think hg bookmarks need a lot of fine tuning before they become like git branches, but the process I describe is pretty much what is explained in the mercurial bookmarks kick starter.
For example, if your currently at revision 250.
echo >>README
hg ci -m "enabling bookmark branch_xyz"
hg book my-tip # optional but nice to have
hg book -r 250 branch_xyz
hg up branch_xyz
# hack ... hack hack
hg ci -m "awesome feature xyz (in progress)"
hg push -fB branch_xyz
now this bookmark lives on the server for others to work with ... but can be easily pruned later

Mercurial clone cleanup to match upstream

I have a hg clone of a repository into which I have done numerous changes locally over a few months and pushed them to my clone at google code. Unfortunately as a noob I committed a whole bunch of changes on the default branch.
Now I would like to make sure my current default is EXACTLY as upstream and then I can do proper branching off default and only working on the branches..
However how do I do that cleanup though?
For reference my clone is http://code.google.com/r/mosabua-roboguice/source/browse
PS: I got my self into the same problem with git and got that cleaned up: Cleanup git master branch and move some commit to new branch?
First, there's nothing wrong with committing on the default branch. You generally don't want to create a separate named branch for every task in Mercurial, because named branches are forever. You might want to look at the bookmark feature for something closer to git branches ("hg help bookmarks"). So if the only thing wrong with your existing changesets is that they are on the default branch, then there really is nothing wrong with them. Don't worry about it.
However, if you really want to start afresh, the obvious, straightforward thing to do is reclone from upstream. You can keep your messy changesets by moving the existing repo and recloning. Then transplant the changesets from the old repo into the new one on a branch of your choosing.
If you don't want to spend the time/bandwidth for a new clone, you can use the (advanced, dangerous, not for beginners) strip command. First, you have to enable the mq extension (google it or see the manual -- I'm deliberately not explaining it here because it's dangerous). Then run
hg strip 'outgoing("http://upstream/path/to/repo")'
Note that I'm using the revsets feature added in Mercurial 1.7 here. If you're using an older version, there's no easy way to do this.
The best way to do this is with two clones. When working with a remote repo I don't control I always keep a local clone called 'virgin' to which I make no changes. For example:
hg clone -U https://code.google.com/r/mosabua-roboguice-clean/ mosabua-roboguice-clean-virgin
hg clone mosabua-roboguice-clean-virgin mosabua-roboguice-clean-working
Note that because Mercurial uses hard links for local clones and because that first clone was a clone with -U (no working directory (bare repo in git terms)) this takes up no additional disk space.
Work all you want in robo-guice working and pull in robo-guice virgin to see what's going on upstream, and pull again in roboguice-working to get upstream changes.
You can do something like this after the fact by creating a new clone of the remote repo and if diskspace is precious use the relink extension to associate them.
Preface - all history changes have sense only for non-published repos. You'll have to push to GoogleCode's repo from scratch after editing local history (delete repo on GC, create empty, push) - otherwise you'll gust get one more HEAD in default branch
Manfred
Easy (but not short) way - default only+MQ
as Greg mentioned, install MQ
move all your commits into MQ-patches on top of upstream code
leave your changes as pathes forever
check, edit if nesessary and re-integrate patches after each upstream pull (this way your own CG-repo without MQ-patches will become identical to upstream)
More complex - MQ in the middle + separate branches
above
above
create named branch, switch to it
"Finish" patches
Pull upstream, merge with your branch changes (from defaut to yourbranch)
Commit your changes only into yourbranch
Rebasing
Enable rebase extension
Create named branch (with changeset in it? TBT)
Rebase your changesets to the new ancestor, test results
See 5-6 from "More complex" chapter
Perhaps you could try the Convert extension. It can bring a repository in a better shape, while preserving history. Of course, after the modifications have been done, you will have to delete the old repo and upload the converted one.

Managing long lived named branches in Mercurial

So I'm using mercurial for a project of mine, I'm the only developer.
I usually use the default branch for actual developing, I use some short lived branches for new-features, and that's fine: I create them, write the new feature and if it works good enough, I merge that branch in the default branch and never use it again.
But I'd like to write documentation in a different branch, since I don't really want to "pollute" the default branch with docs commits.
After I have written enough documentation for the stuff I have in the default branch I merge the docs branch in the main one. BUT after a while I'd like to use the docs branch again, and I have to pull the changes from the main one, or create another new branch.
What's the best workflow to deal with this? Is my approach entirely wrong?
Placing documents in source control is a little bit strange. If the documents are binary (.doc/.docx/.xlsx), Hg will not be able to merge them. If you're storing .html, .xml, or some plain text format then it will do a slightly better job. There are a few open source systems that will allow you to use Hg and provide separate document management (Redmine, for one)
Assuming you've just merged docs into default you can continue using the docs branch by doing this:
> hg update docs # update to the docs branch
> hg merge default # merge default into docs branch
(do some work)
> hg commit -m "adding new things to docs branch"
(merge into default when ready)
By merging default into docs, you're making sure that docs has all changes that existed on default. Performing a subsequent commit on docs will effectively allow you to continue working on that branch. Another way to say this is that merging is directional in Hg - if you want docs to be up to date with default, you've got to perform that merge explicitly.
To start using the docs branch again from a child of your merge with the original docs branch, simply change your working copy to the branch and commit to it.
If you have changes on the docs branch then you need to merge before you can commit.