Mercurial Remove History - mercurial

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.

There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.

You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)

Related

What parts of the hg-git metadata state do I need to back up in order to start over?

This is currently a purely theoretical question (related to this one), but let me first give the background. Whenever you run hg gexport the initial hash will vary from invocation to invocation. This is similar to when you run git init or hg init. However, since the Mercurial and Git commits correspond to each other and build on previous hashes, there should be some way to start over from a minimal common initial state (or minimal state on the Git side, for example).
Suppose I have used hg-git in the past and now I am trying to sync again between my Mercurial and my Git states, but without (or very little of) the original .git directory from the hg gexport. What I do have, though, are the two metadata files: git-mapfile and git-tags.
There is an old Git mirror, which is sort of "behind" and the Mercurial repo which is up-to-date.
Then I configure the Mercurial repo for hg-git like so (.hg/hgrc):
[git]
intree = True
[extensions]
hgext.bookmarks=
topic=
hggit=
[paths]
default = ssh://username#hgserver.tld//project/repo
gitmirror = git+ssh://username#server.tld/project/repo.git
If I now do the naive hg pull gitmirror all I will gain is a duplication of every existing commit on an unrelated branch with unrelated commit history (and the double amount of heads, compared to prior to the pull).
It clearly makes no big difference to place those two metadata files (git-mapfile and git-tags) into .hg. The biggest difference is that the pull without these files will succeed (but duplicate everything) and the pull with them will error out at the first revision because of "abort: unknown revision ..." (even makes sense).
Question: which part(s) and how much (i.e. what's the minimum!) of the Git-side data/metadata created by hg gexport do I have to keep around in order to start over syncing with hg-git? (I was unable to find this covered in the documentation.)
The core metadata is stored in .hg/git-mapfile, and the actual Git repository is stored in .hg/git or .git dependending on intree. The git-mapfile is the only file needed to reproduce the full state; anything else is just cache. In order to recreate a repository from scratch, do the following:
Clone or initialise the Mercurial repository, somehow.
Clone or initialise the embedded Git repository, e.g. using git clone --bare git+ssh://username#server.tld/project/repo.git .hg/git.
Copy over the metadata from the original repository, and put it into .hg/git-mapfile.
Run hg git-cleanup to remove any commits from the map no longer known to Mercurial.
Pull from Git.
Push to Git.
These are the steps I'd use, off the top of my head. The three last steps are the most important. In particular, you must pull from Git to populate the repository prior to pushing; otherwise, the conversion will fail.

Mercurial: how do I create a new repository containing a subrange of revisions from an existing repo?

I have a repo with subrepos, with a long history. At some point the main subrepo became fully self-contained (doesn't depend on other sister subrepos).
I don't care anymore about the history of the whole thing before the main subrepo became self-contained. So I want to start a new repo that contains just what the subrepo has in it from that moment on. If possible, please describe in terms of TortoiseHg commands.
You probably want to make use of mercurial's convert extension. You can specify revisions to be converted, paths, branches and files to include or exclude in the newly-created repository.
hg convert from-path new-repo
Convert is a default extension which just needs activation.
You might be able to weed out any other changesets you don't need by using the hg strip command.
It will entirely remove changesets (and their ancestors) from a repository. Most likely you would want to make a fresh clone and then work on stripping it down.
One potential pitfall is that the final stripped repo would still have shared parentage with the original; therefore the potential exists for someone to accidentally pull down again changesets which were stripped.
The hg convert command (noted in another answer) does not have this downside.

Rename a local-only mercurial branch?

I find a lot of questions about renaming hg branches that have already been pushed/pulled by other people, but I've got a situation here where work has been done on a set of branhces (10 in total) that all have the "wrong" name - wrong in the sense that the repo (on bitbucket.org) has restrictions on the naming of branches.
Any developer can open a new branch named app-feature-xxxx (where xxxx can be anything), but a boatload of work has been done by a new dev, and none of the branches follow this naming pattern (the branches are effectively the xxxx part without the app-feature- prefix)
Currently these branches are known only on his machine - they've never been pushed to BitBucket.org, nor pulled by anyone else
Can they be renamed in-situ, before they're pushed? Right now hg is attempting to commit his history to BitBucket with these branch names and it's failing. If the branches can be renamed before that happens, everything should be golden.. And there aren't the usual "but what about everyone else's history?" problems, because only one person has these commits..
The easiest I've been able to come up with right now is to clone the repo again, just make one app-feature-lotsofupdates branch, and then keep switching working copy in the original repo, and use a diff tool to apply the code from the original repo (with the wrong names) to this newly cloned repo, committing after every diff/copy - effectively a manual merge of all the various features into one branch (that will then be merged into production)
You can do this using the convert extension to Mercurial, and the branchmap option.
The branchmap is a file that allows you to rename a branch when it is
being brought in from whatever external repository. When used in
conjunction with a splicemap, it allows for a powerful combination to
help fix even the most badly mismanaged repositories and turn them
into nicely structured Mercurial repositories. The branchmap contains
lines of the form
original_branch_name new_branch_name
original_branch_name is the name
of the branch in the source repository, and new_branch_name is the
name of the branch is the destination repository.
The command line to do this would be something like:
hg convert --branchmap branchmap.txt path\to\source\repo path\to\converted\repo
Documentation.

Restore history to split repository

Rightly or wrongly (almost certainly wrongly) I've ended up in the following position:
We used to have one big mercurial repository. We have now moved a large portion of this code base in to a separate repository. Looking back, it seems we could have done this in such a way that kept the file history, but naively we simply copied the files in to the new repository.
We are now in a position where the two repositories have been split, but I want to attach the file history of the moved files to those in the new repository. The files have been deleted in the original repository - but this history obviously still exists.
Is there anyway to "pre-pend" the history of the files in the original repository to those in the split repository?
The convert extension allows you to convert a Mercurial repository to another one while excluding and modifying filenames (using the --filemap option).
So use the convert command twice, one for each split, and use a corresponding filemap which only involves the files supposed to be in the particular target repository.
If you already did commits in the split repos you do not want to loose, use the convert approach anyway and afterwards migrate your new commits made in the splits w/o history to your new split repos made with convert:
Suppose your old splits with lost history are os1 and os2 and your new splits made with the convert command are ns1 and ns2. Then in ns1 you would do
hg pull --force os1 (force is required because you pull from an unrelated repo),
hg rebase -s <first-new-commit-made-in-os1> -d <last-converted-commit-in-ns1> (rebase new commits onto your split made with convert)
hg strip <first-commit-at-all-in-os1> (discard everything you pulled from os1 but not needed anymore)
Do it similarly with os2 and ns2
I don't know about attaching the file history after you've already done the split, but if you can do it all over again this is one option:
In your original repository, go back to the last revision (REV) before you deleted the files:
hg up -r REV
Go up one level from the original repository and make two new clones:
hg clone OriginalRepo Split1
hg clone OriginalRepo Split2
Go into Split1, delete the files you don't want in that repository, and commit. Do the same for Split2. These two are now your new split repositories, both with a complete history. Of course, the history for all deleted files (even those you initially didn't want in the repository) will exist equally in both repositories as well.
If you've done a lot of work in your new repositories after you made the split, you might be able to add these to your Split1 and Split2 by using export/import (check hg help import and hg help export). Depending on the number of changes you've made, there might be better ways to do this (e.g. mercurial queues), but this is what came to my mind first.

subrepo, hg clone and symlinks

I'm quite new to mercurial, I've read a lot on this topic but I've been unable to find a clear answer.
The mercurial guide says: "For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory)."
The Repository wiki page says: "All of the files and directories that coexist with the .hg directory in the repository root are said to live in the working directory".
Now, to "link" a subrepo in a main repo I do:
hg init main
cd main
echo subrepo = ../subrepo > .hgsub
hg clone ../subrepo subrepo # (1)
hg add
hg ci -m "initial rev of the main repo"
Does the definition above mean that I'm actually creating a copy of subrepo when I perform (1)?? Or am I creating just a symlink to ../subrepo? According to the output of ls, it is an actual copy. But it sounds so strange to me... If someone could put a bit of light on this subject, I'd appreciate.
First of all, that part of Mercurial, I'm not an expert, but here's what I've understood.
No, you didn't create a link to the whole directory. Instead, files were hardlinked inside it.
This means that space on disk is reserved to keep your directory structure separate, but the files are all identical, because they were just cloned, so they are constructed as links back to the original.
When you start manipulating the repository, through your add or commit (ci) commands, then the hardlinks are broken by Mercurial and separate files are constructed for each, on demand.
Now, this is purely a technical thing, you don't need to know or care about this. If it makes it easier, just think of a clone as a complete copy of the original repository, separate files and all that. The hardlink part is just to save diskspace for the things that are the same.
Since a typical project has many files, and a typical changeset only changes a few files, and a typical reason to clone is that you're going to do a fixed set of changes, hardlinks makes sense since many of the files in the repository directories will be 100% identical to their original for the lifetime of the repository.
For those that aren't, all of that is silently handled by Mercurial for you.
Let us start by looking at what happens when you clone without talking about subrepositories. When you do
$ hg clone A B
then Mercurial will make hard links for the files inside A/.hg/store/data. So if a file called x is tracked, then after the clone you will see that
A/.hg/store/data/x.i
and
B/.hg/store/data/x.i
are hard linked -- this means that the two filenames really refer to the same file. As Lasse points out, this is smart since you might never commit a change to x clone, and so there is no reason to make two different x.i files for the A and B clones. Another advantage is that it is much faster to make a hard link than to copy a file, especially if x.i is very large: the hard link is a constant time operation.
In your example above you are adding a subrepository subrepo to the main repository. A subrepository consist of two things:
the subrepository itself. This what you creates when you do
$ hg clone ../subrepo
the subrepository meta data. This is what you store in the .hgsub file. You must tell Mercurial where you want the subrepository and where Mercurial can clone it from.
You ask if you copy or symlink the repository, and you certainly copied (cloned) it, as you have also confirmed with ls. Afterwards you added some meta data to Mercurial that tells it where it can expect to find the subrepository. This has nothing to do with a symbolic link in the normal filesystem sense, it is just some meta data for Mercurial.