What parts of the hg-git metadata state do I need to back up in order to start over? - mercurial

This is currently a purely theoretical question (related to this one), but let me first give the background. Whenever you run hg gexport the initial hash will vary from invocation to invocation. This is similar to when you run git init or hg init. However, since the Mercurial and Git commits correspond to each other and build on previous hashes, there should be some way to start over from a minimal common initial state (or minimal state on the Git side, for example).
Suppose I have used hg-git in the past and now I am trying to sync again between my Mercurial and my Git states, but without (or very little of) the original .git directory from the hg gexport. What I do have, though, are the two metadata files: git-mapfile and git-tags.
There is an old Git mirror, which is sort of "behind" and the Mercurial repo which is up-to-date.
Then I configure the Mercurial repo for hg-git like so (.hg/hgrc):
[git]
intree = True
[extensions]
hgext.bookmarks=
topic=
hggit=
[paths]
default = ssh://username#hgserver.tld//project/repo
gitmirror = git+ssh://username#server.tld/project/repo.git
If I now do the naive hg pull gitmirror all I will gain is a duplication of every existing commit on an unrelated branch with unrelated commit history (and the double amount of heads, compared to prior to the pull).
It clearly makes no big difference to place those two metadata files (git-mapfile and git-tags) into .hg. The biggest difference is that the pull without these files will succeed (but duplicate everything) and the pull with them will error out at the first revision because of "abort: unknown revision ..." (even makes sense).
Question: which part(s) and how much (i.e. what's the minimum!) of the Git-side data/metadata created by hg gexport do I have to keep around in order to start over syncing with hg-git? (I was unable to find this covered in the documentation.)

The core metadata is stored in .hg/git-mapfile, and the actual Git repository is stored in .hg/git or .git dependending on intree. The git-mapfile is the only file needed to reproduce the full state; anything else is just cache. In order to recreate a repository from scratch, do the following:
Clone or initialise the Mercurial repository, somehow.
Clone or initialise the embedded Git repository, e.g. using git clone --bare git+ssh://username#server.tld/project/repo.git .hg/git.
Copy over the metadata from the original repository, and put it into .hg/git-mapfile.
Run hg git-cleanup to remove any commits from the map no longer known to Mercurial.
Pull from Git.
Push to Git.
These are the steps I'd use, off the top of my head. The three last steps are the most important. In particular, you must pull from Git to populate the repository prior to pushing; otherwise, the conversion will fail.

Related

How to remove largefiles from Mercurial repo

See also this question.
Without knowing what I was doing, I enabled the largefiles extension, committed a file and pushed it to kiln. Now I know the error of my ways, and I need to permanently revert this change.
I followed the guidance from SO on the subject; and I can remove largefiles locally, but this doesn't affect the remote repos in kiln. I have tried opening the repo in KilnRepositories on the Kiln server and nuking the largefiles folder (as well as deleting 'largefiles' from the requires file), but after a few pushes/pulls the folder and the require's line come back.
Is there a way to make this permanent? (Setting requires to readonly doesn't work either).
Note: This is true at least of (for Windows) TortoiseHg 2.4 (Mercurial 2.2.1) -> 2.7.1 (Mercurial 2.5.2). I will not speak for future or older versions.
After looking through the various mercurial extensions available, I have concluded that it is generally not possible to convert a repository 'back' once a file has been committed using the largefiles extension.
First, the two rationales for why you do not want largefiles in play on your repos: one and two.
Once a file has has been committed as a largefile, to remove it, all references to the '.hglf' path must be removed from the repo. A backout isn't sufficient, as its commit contents will reference the path of the file, including the '.hglf' folder. Once mercurial sees this, it will write 'largefiles' back to the /.hg/requires file and the repo is once more largefile locked. Likewise with hg forget and remove.
Option 1: If your repo is in isolation (you have end-to-end control of the repo in all of its local and remote locations AND no one else has branched from this repo), it may be possible to use the mq extension and strip the changeset. This is probably only a viable option if you have caught the mistake in time.
Option 2: If the offending changeset (the largefile commit) exists on a commit phase that is draft (or that can be forced back into draft), then it may be possible to import the commit to mq and unapply the changeset using hg qpop. This is superior to stripping, because it preserves the commit history forward from the extracted changeset. In real life, this is frequently not possible, because you've likely already performed merges and pushed/pulled from public phase branches. However, if caught soon enough, mq can offer a way to salvage the repo.
Option 3: If the offending changeset is referenced in one and only one place (the original commit), and no one has attempted to backout/remove/forget the changset (thus creating multiple references), it may be possible to use hg rebase, to fold the first child changeset after the offense with the parent changeset of the offense. In doing so, the offensive changeset becomes a new head which can then be stripped off with mq strip. This can work where attempts to import to mq have failed.
Option 4: If none of the above work, you can use transplant or graft, to export all of the non-offending changesets as patches (careful to export them in the correct sequence), then hg update to the first sane changeset before the offense, mq strip the repo of all forward changesets and then reapply your exported patches in sequence.
Option 5: (What I ultimately did). Clone the repo locally, so that you have two copies: clone_to_keep, clone_to_destroy. In clone_to_keep, update to the first sane changeset before the offense. Mq strip all forward changesets. Merge back down if left with multiple heads. In clone_to_destroy, update to the tip. In windows explorer, copy everything in /clone_to_destroy except the .hg and .hglf folders to the /clone_to_keep folder. Inside Tortoise, commit all changes in clone_to_keep as a single changeset. Preserve one remote instance of clone_to_destroy in a readonly state for historical purposes, and destroy all others.
Option 6: The nuclear option. If all else fails, and if you don't care about the integration of your repo with external systems (bug tracking, CI, etc), you can follow the aforementioned SO post and use the hg convert extension. This will create a new copy of the infected repo, removing all references to the offending changesets; however, it does so by iterating each changeset in the entire repo and committing it to the new repo as a NEW changeset. This creates a repo which is incompatible with any existing branch repos--none of the changeset ids will line up. Unless you have no branch repos, this option will probably never work.
In all cases, you then have to take your fix and manually reapply to each distinct repository instance (copy the repo folder, clone, whatever your preferred method).
In the end, it turns out that enabling largefiles is an extremely expensive mistake to make. It's time consuming and ultimately destructive to fix. I don't recommend ever allowing largefiles to make it into your repos.

Mercurial clone cleanup to match upstream

I have a hg clone of a repository into which I have done numerous changes locally over a few months and pushed them to my clone at google code. Unfortunately as a noob I committed a whole bunch of changes on the default branch.
Now I would like to make sure my current default is EXACTLY as upstream and then I can do proper branching off default and only working on the branches..
However how do I do that cleanup though?
For reference my clone is http://code.google.com/r/mosabua-roboguice/source/browse
PS: I got my self into the same problem with git and got that cleaned up: Cleanup git master branch and move some commit to new branch?
First, there's nothing wrong with committing on the default branch. You generally don't want to create a separate named branch for every task in Mercurial, because named branches are forever. You might want to look at the bookmark feature for something closer to git branches ("hg help bookmarks"). So if the only thing wrong with your existing changesets is that they are on the default branch, then there really is nothing wrong with them. Don't worry about it.
However, if you really want to start afresh, the obvious, straightforward thing to do is reclone from upstream. You can keep your messy changesets by moving the existing repo and recloning. Then transplant the changesets from the old repo into the new one on a branch of your choosing.
If you don't want to spend the time/bandwidth for a new clone, you can use the (advanced, dangerous, not for beginners) strip command. First, you have to enable the mq extension (google it or see the manual -- I'm deliberately not explaining it here because it's dangerous). Then run
hg strip 'outgoing("http://upstream/path/to/repo")'
Note that I'm using the revsets feature added in Mercurial 1.7 here. If you're using an older version, there's no easy way to do this.
The best way to do this is with two clones. When working with a remote repo I don't control I always keep a local clone called 'virgin' to which I make no changes. For example:
hg clone -U https://code.google.com/r/mosabua-roboguice-clean/ mosabua-roboguice-clean-virgin
hg clone mosabua-roboguice-clean-virgin mosabua-roboguice-clean-working
Note that because Mercurial uses hard links for local clones and because that first clone was a clone with -U (no working directory (bare repo in git terms)) this takes up no additional disk space.
Work all you want in robo-guice working and pull in robo-guice virgin to see what's going on upstream, and pull again in roboguice-working to get upstream changes.
You can do something like this after the fact by creating a new clone of the remote repo and if diskspace is precious use the relink extension to associate them.
Preface - all history changes have sense only for non-published repos. You'll have to push to GoogleCode's repo from scratch after editing local history (delete repo on GC, create empty, push) - otherwise you'll gust get one more HEAD in default branch
Manfred
Easy (but not short) way - default only+MQ
as Greg mentioned, install MQ
move all your commits into MQ-patches on top of upstream code
leave your changes as pathes forever
check, edit if nesessary and re-integrate patches after each upstream pull (this way your own CG-repo without MQ-patches will become identical to upstream)
More complex - MQ in the middle + separate branches
above
above
create named branch, switch to it
"Finish" patches
Pull upstream, merge with your branch changes (from defaut to yourbranch)
Commit your changes only into yourbranch
Rebasing
Enable rebase extension
Create named branch (with changeset in it? TBT)
Rebase your changesets to the new ancestor, test results
See 5-6 from "More complex" chapter
Perhaps you could try the Convert extension. It can bring a repository in a better shape, while preserving history. Of course, after the modifications have been done, you will have to delete the old repo and upload the converted one.

subrepo, hg clone and symlinks

I'm quite new to mercurial, I've read a lot on this topic but I've been unable to find a clear answer.
The mercurial guide says: "For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory)."
The Repository wiki page says: "All of the files and directories that coexist with the .hg directory in the repository root are said to live in the working directory".
Now, to "link" a subrepo in a main repo I do:
hg init main
cd main
echo subrepo = ../subrepo > .hgsub
hg clone ../subrepo subrepo # (1)
hg add
hg ci -m "initial rev of the main repo"
Does the definition above mean that I'm actually creating a copy of subrepo when I perform (1)?? Or am I creating just a symlink to ../subrepo? According to the output of ls, it is an actual copy. But it sounds so strange to me... If someone could put a bit of light on this subject, I'd appreciate.
First of all, that part of Mercurial, I'm not an expert, but here's what I've understood.
No, you didn't create a link to the whole directory. Instead, files were hardlinked inside it.
This means that space on disk is reserved to keep your directory structure separate, but the files are all identical, because they were just cloned, so they are constructed as links back to the original.
When you start manipulating the repository, through your add or commit (ci) commands, then the hardlinks are broken by Mercurial and separate files are constructed for each, on demand.
Now, this is purely a technical thing, you don't need to know or care about this. If it makes it easier, just think of a clone as a complete copy of the original repository, separate files and all that. The hardlink part is just to save diskspace for the things that are the same.
Since a typical project has many files, and a typical changeset only changes a few files, and a typical reason to clone is that you're going to do a fixed set of changes, hardlinks makes sense since many of the files in the repository directories will be 100% identical to their original for the lifetime of the repository.
For those that aren't, all of that is silently handled by Mercurial for you.
Let us start by looking at what happens when you clone without talking about subrepositories. When you do
$ hg clone A B
then Mercurial will make hard links for the files inside A/.hg/store/data. So if a file called x is tracked, then after the clone you will see that
A/.hg/store/data/x.i
and
B/.hg/store/data/x.i
are hard linked -- this means that the two filenames really refer to the same file. As Lasse points out, this is smart since you might never commit a change to x clone, and so there is no reason to make two different x.i files for the A and B clones. Another advantage is that it is much faster to make a hard link than to copy a file, especially if x.i is very large: the hard link is a constant time operation.
In your example above you are adding a subrepository subrepo to the main repository. A subrepository consist of two things:
the subrepository itself. This what you creates when you do
$ hg clone ../subrepo
the subrepository meta data. This is what you store in the .hgsub file. You must tell Mercurial where you want the subrepository and where Mercurial can clone it from.
You ask if you copy or symlink the repository, and you certainly copied (cloned) it, as you have also confirmed with ls. Afterwards you added some meta data to Mercurial that tells it where it can expect to find the subrepository. This has nothing to do with a symbolic link in the normal filesystem sense, it is just some meta data for Mercurial.

Mercurial Remove History

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.
There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.
You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)

Merging changes to a workspace with uncommitted changes

We've just recently switched over from SVN to Mercurial, but now we are running into problems with our workflow. Example:
I have my local clone of the repository which I work on. I'm making some highly experimental changes to our code base, something that I don't want to commit before I'm sure it works the way it is supposed to, I don't want to commit it even locally. Now, simultaneously, my co-worker has made some significant improvements/bug fixes which I need. He pushes his commits to our main repository. The question is, how can I merge his changes to my workspace without the requirement that I have to commit all my changes, since I need his changes to test my own code?
A more day-to-day problem we have with the exact same workflow is where we have a couple of configuration files which are in the repository. Each developer makes a couple of small environment specific changes to the configuration files, but do not commit the changes. These couple of uncommitted files hinders us from making any merges to our workspace, just like with the example above. Ideally, the configuration files probably shouldn't be in the repository, unfortunately, that's just how it has to be for here unnamed reasons.
If you don't want to clone, you can do it the following way.
hg diff > mylocalchanges.txt
hg revert -a
# Do your merge here, once you are done, import back your local mods
hg import --no-commit mylocalchanges.txt
There are two operations, as you've discovered, that makes changes from one person available to someone else (or many, on either side.)
There's pulling, which takes changes from some other clone of the repository and puts them into your clone.
There's pushing, which takes changes from your repository and puts them into another clone.
In your case, your coworker has pushed his changes into what I assume is your central master of the repository.
After he has done this, you can pull the latest changes down into your repository, and merge them into your branch. This will incorporate any bugfixes or changes your coworker did into your experimental code.
This gives you the freedom of staying current on other coworkers development in your project, and not having to release your experimental code until it is ready (or even at all.)
So, as long as you stay away from the Push command, you're safe.
Of course, this also assumes nobody is pulling directly from your clone of the repository, if they do that, then of course they will get your experimental changes, but it doesn't sound like you've set it up this way (and it is highly unlikely as well.)
As for the configuration files, the typical way to do this is that you only commit a master file template into the repository, with a different name (ie. an extra extension .template or similar), and then place the name of the real configuration file into the ignore filter.
Each developer then has to make his or her own copy of the template, rename it, and change it in any way they want, without the risk of committing database connection strings, passwords, or local paths, to the repository.
If necessary, provide a script that will help the developer make the real configuration file if it is long and complex.
Regarding your experimental changes, you should commit them. Often.
Simply you commit them in a clone you don't push. You only pull to merge whatever updates you need from other repos.
As for config files, don't commit them.
Commit template files, and script able to generate complete config files from the template.
That way, developers will only modify "private" (i.e. not committed) config files with their own private values.
If you know your uncommitted changes will not collide with the merge commit that you are creating - then you can do the following...
1) Shelve the uncommitted changes
2) Do the pull and merge
3) Unshelve the uncommitted changes
Shelf effectively stores your uncommitted changes away as into diff (relative to your last commit) then rolls back those files in your local workspace. Then un-shelving then applies that diff, bringing back your uncommitted changes.
Tools such as TortoiseHg have shelf built in.