I'm trying to create a bundle for a remote team. They have a copy of the depot from revision 892 and we are currently on revision 1119.
First I tried patches, but that created a ton of files that botched up when trying to apply them (usually on the merge submits)... and our repository is 17GB in size, so I'm trying to create a delta patch, thus figured hg bundle was perfect for this.
I generated a bundle via:
>hg bundle --rev 1119 --base 892 depot-892-to-1119.bundle
This created a bundle file that is 350MB, which is acceptable and feels right.
But when we apply it to the the destination depot that only goes to revision 892 it barfs on:
E:\dest-depot>hg unbundle -u depot-892-to-1119.bundle
adding changesets
transaction abort!
rollback completed
abort: 00changelog.i#e5cc33458251: unknown parent!
And so far this is similar to several other questions I have seen while searching, but I'll take it one step further.
I looked up e5cc33458251 in the source (bigger depot) and it shows up as revision 930 which is clearly after rev 892, but specifies this is the reason for the failure. Of course the destination depot doesn't have the revision. That is why I created the bundle in the first place.... so I'm not really sure why this one is causing me problems.
Now we do have a number of branches in the depot and rev 892 was tipped on a "Patch 2.7" branch and not default. I do not know if this should cause a problem. Eventually that patch branch was merged back into default in rev 999.
930 was actually a very small and trivial change to code and was also in "Patch 2.7" branch. There were actually 2 Patch 2.7 lines in the revision graph and they were merged together in 932. But again, nothing strange.
I am not seeing the problem here. Any ideas on what kind of a bundle I should be generating? Or if I should be going a different path?
It sounds like you're doing this essentially right, so let's check a few possible gotchas:
Are you aware that revision numbers aren't portable across clones? It's entirely possible that "their" 892 is different from yours. So you should find out what their latest revision is by nodeid and use that as the parameters to base.
I get that with their being remote using hg's internal protocol to actually transfer the data might not be feasible, but if you can get them to stand up a hg serve for a short while you can just do:
hg bundle ../depot-to-them.bundle http://THEIR_IP:8000
Then you'll have exactly the right bundle to get them everything they need without having to have them send you their nodeids.
Those aside the only other bit of info that might be worth mentioning is that by using --rev X --base Y you're saying "I want to send all the ancestors of X that they don't have if they only have Y and its ancestors", so if there's a branch that's not yet merged into X you're not going to be sending it, even if locally the revision numbers are between X and Y. That won't, however prevent the bundle from being applied, so it's more of a good-to-understand rather than a possible cause of your troubles.
Related
This is currently a purely theoretical question (related to this one), but let me first give the background. Whenever you run hg gexport the initial hash will vary from invocation to invocation. This is similar to when you run git init or hg init. However, since the Mercurial and Git commits correspond to each other and build on previous hashes, there should be some way to start over from a minimal common initial state (or minimal state on the Git side, for example).
Suppose I have used hg-git in the past and now I am trying to sync again between my Mercurial and my Git states, but without (or very little of) the original .git directory from the hg gexport. What I do have, though, are the two metadata files: git-mapfile and git-tags.
There is an old Git mirror, which is sort of "behind" and the Mercurial repo which is up-to-date.
Then I configure the Mercurial repo for hg-git like so (.hg/hgrc):
[git]
intree = True
[extensions]
hgext.bookmarks=
topic=
hggit=
[paths]
default = ssh://username#hgserver.tld//project/repo
gitmirror = git+ssh://username#server.tld/project/repo.git
If I now do the naive hg pull gitmirror all I will gain is a duplication of every existing commit on an unrelated branch with unrelated commit history (and the double amount of heads, compared to prior to the pull).
It clearly makes no big difference to place those two metadata files (git-mapfile and git-tags) into .hg. The biggest difference is that the pull without these files will succeed (but duplicate everything) and the pull with them will error out at the first revision because of "abort: unknown revision ..." (even makes sense).
Question: which part(s) and how much (i.e. what's the minimum!) of the Git-side data/metadata created by hg gexport do I have to keep around in order to start over syncing with hg-git? (I was unable to find this covered in the documentation.)
The core metadata is stored in .hg/git-mapfile, and the actual Git repository is stored in .hg/git or .git dependending on intree. The git-mapfile is the only file needed to reproduce the full state; anything else is just cache. In order to recreate a repository from scratch, do the following:
Clone or initialise the Mercurial repository, somehow.
Clone or initialise the embedded Git repository, e.g. using git clone --bare git+ssh://username#server.tld/project/repo.git .hg/git.
Copy over the metadata from the original repository, and put it into .hg/git-mapfile.
Run hg git-cleanup to remove any commits from the map no longer known to Mercurial.
Pull from Git.
Push to Git.
These are the steps I'd use, off the top of my head. The three last steps are the most important. In particular, you must pull from Git to populate the repository prior to pushing; otherwise, the conversion will fail.
We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.
I spent yesterday searching via Google and looking through SO for an answer but couldn't find anyone having the exact problem I'm having.
A while ago, I created a branch at Rev 176 in my local Mercurial repo and at Rev 196 created the named branch, "Port to VS2010." See the TortoiseHg screen cap below. I've been able to successfully push the other, "006-x86 and x64 Builds" branch to the remote repo but whenever I try to push the new branch, I get this error in the log:
abort: push creates new remote head 207852dab969!
hint: merge or see "hg help push" for details about pushing new heads
Short of merging or forcing it, how can I push this branch? It represents a tentative solution that will probably never be needed, but I wanted to keep it around just in case.
(Note: You'll notice three gaps in the Rev column. They represent immaterial changsets for the "006-x86 and x64 Builds" branch. I removed them in order to shorten the image.)
Update:
Per Lazy Badger:
acs_FromBuildServer_edited% hg heads -T "{node|short} {branch}\n"
% hg heads -T "{node|short} {branch}\n"
24af28a99211 006-x86 and x64 Builds
69be2af28b7c Port to VS2010
207852dab969 default
86e00db4ba95 005-No Register CardContext
9df44947cc8b 004-Hack typedef boost shared_ptr
81055bcdb3cc 003-Use boost shared_ptr
6358126f4757 002-Add Meyers Fix
1e23ed012883 001-Solution
bcc01f6fbef4 default
[command completed successfully Fri Feb 12 11:15:36 2016]
acs_FromBuildServer_edited%
Actually, you only have two choices. You can merge or you can force it. I think in your case you want to force it.
Typically, you don't want to force a new head because other developers might not be aware of the new head. Since this is an experiment you just want to keep for posterity then it's fine to force it. If you find out you need it you would merge it at that time.
It seems you have possible mix of two unrelated problems
anonymous branching (2 heads of some branch)
pushing new (not existing on remote) branch
New (named) branch have to be pushed with added option to default push command (--new-branch in CLI) or "Allow push to new branch" checkbox in THG GUI
Pushing additional head in bad idea in common, better to find this new head in pre-existing branch and merge
Well, as expected, you have two heads of default branch
207852dab969 default
...
bcc01f6fbef4 default
and according to error message head 207852dab969 (r195) is new, not pushed yet. No, closing branch will not allow you to push branch. If you don't want to merge or force push, you can move diverged history (from branchpoint) to another named branch
We're currently undergoing an attempt to migrate our mercurial (in this case an ancient version of Kiln) to BitBucket and we immediately ran in issues with size (if you don't know, BitBucket imposes a rather generous 2gb repo limit - that we happened to blow by).
Anyways, I've cleaned up the sins of the past:
using convert with filemaps (removing binaries/static files that should never been in the repo)
creating separate repos for other things that shouldn't have been in the main repo
attempting to use generaldelta to reduce size (as per
https://www.mercurial-scm.org/wiki/ScaleMercurial)
using branchmaps to try to consolidate old branches and their associated changesets
Even with these steps, I still have a very large manifest file, despite the "data" stored for the repo shrinking down to a "manageable" size (~600mb), my manifest file is nearly 700mb.
Some additional information: in general, we practice branch-per-feature and have two-branch track out to environments:
a release branch (deployed to staging and then to prod)
default branch (originally off of release, all features are first merged here and then to release. this branch dies and is reborn every two weeks)
One difference in this workflow is that default itself never is merged in to release (a la gitflow/hgflow). Does this uni-directional flow into default cause issues?
We "only" have 120 open branch heads, so it seems like that's manageable?
I'm obviously missing some step here (or else the repo is just completely hosed).
Just for future reference, I followed Tim's suggestion above. My full script ended up looking like this:
hg --config format.generaldelta=1 clone --pull oldrepo oldrepo-generaldelta
hg --config format.generaldelta=1 clone --pull oldrepo-generaldata oldrepo-generaldelta2
hg convert --filemap filemap.txt oldrepo-generaldelta2 newrepo
As Tim mentioned in his linked answer - our manifests went from about 700mb down to about 40mb with the second clone.
Can I optimize a Mercurial clone?
We have 2,700+ revisions and it takes a good 30-45 seconds to load Mercurial when doing a merge, push or anything else with TortoiseHg. I'm wondering if there's a way other than straight up creating a new repository to clean up the revision history. Say, cut off files under revision 2,400 or so.
Not an answer to your question, but:
Maybe reducing "log batch size" to 100 (default is 500) in the settings helps.
Our 2300+ rev repo loads in 2-3 secs (off my 15k rpm SAS-disk, but never mind that), so I don't think your problem is many revs, really. There are much bigger repos out there. :)
Note that both Mercurial core and TortoiseHg developers are keen on finding performance bugs, so it might be worthwhile to ask on the mail-lists for assistance.
You can use the histedit extension to compress several changesets into one. Executing the histedit command on a range of revisions will spawn a text document that looks like this (from the histedit documentation):
pick c561b4e977df Add beta
pick 030b686bedc4 Add gamma
pick 7c2fd3b9020c Add delta
Edit history between c561b4e977df and 7c2fd3b9020c
Commands:
p, pick = use commit
e, edit = use commit, but stop for amending
f, fold = use commit, but fold into previous commit
d, drop = remove commit from history
Changing pick to fold for a certain changeset in the list above will fold it into the previous changeset. It will give you an opportunity to resolve failed merges and enter a new commit message as well.
WARNING:
Using histedit will modify the repository history, including hash IDs, which will cause problems unless you re-start each developer with a new repository clone after the changes have been made. Also, you would probably need to limit your histedit-ing to changesets with a single parent (ie: non-merge changesets).