Merging mercurial repo after lfconvert - mercurial

I want to use the largefiles extension to track binary data in my repos.
The documentation states that you have to convert your "normal" repo to a "largefiles" repo.
Like so:
hg lfconvert normal_repo largefiles_repo
Is it possible to merge two repos after they have been converted to a largefiles repo without both of this repos to have been converted at the some "time".
Or do all repos have to share a common "lfconvert event" to be able to merge between them?

It looks like lfconvert rewrites repository history from the point where the first large file appears. You could theoretically merge the resulting repo in your old repo, but then the (versions of) large files you had in your old repository will also be in the merged repository; you will end up with a repository sporting two roots.

Related

Mercurial: how do I create a new repository containing a subrange of revisions from an existing repo?

I have a repo with subrepos, with a long history. At some point the main subrepo became fully self-contained (doesn't depend on other sister subrepos).
I don't care anymore about the history of the whole thing before the main subrepo became self-contained. So I want to start a new repo that contains just what the subrepo has in it from that moment on. If possible, please describe in terms of TortoiseHg commands.
You probably want to make use of mercurial's convert extension. You can specify revisions to be converted, paths, branches and files to include or exclude in the newly-created repository.
hg convert from-path new-repo
Convert is a default extension which just needs activation.
You might be able to weed out any other changesets you don't need by using the hg strip command.
It will entirely remove changesets (and their ancestors) from a repository. Most likely you would want to make a fresh clone and then work on stripping it down.
One potential pitfall is that the final stripped repo would still have shared parentage with the original; therefore the potential exists for someone to accidentally pull down again changesets which were stripped.
The hg convert command (noted in another answer) does not have this downside.

How to graft from other repository?

I have two repositories with tho different mercurual named branches, say V1 and V2. The branches are divergent since about one-two years. I'd like to graft some changesets from one repo into the other, without pulling the changes.
I don't want to pull the changes for multiple reasons.
I don't want to conflict developers with history of multiple branches, because there will be enough local branches to care about.
I want to have single branch central repos and developers could accidently push the second branch. The central branches would interact with SVN and should have only one branch per repo. I know I could use central hooks, to prevent such a push, but I don't want questions like, can't push, or how can I do that.
The size of the repo would grow to multiple gigabytes (before pull about 700MB). As I understand, it's because of deficites of current mercurial storage format.
I know, the transplant extension can do the work. I tried it, but I can't force other developers to handle rejects instead of simply use a merge tool. Is there an other way?
In fact there are more then two repos with each a branch, but for the example simplicity two should be enough.
You might be able to do the work in an intermediate repo:
Pull in the changes
Do whatever grafting/rebasing/transplanting you need
Strip out the things you pulled in step 1. or if that doesn't work:
Pull only the changes from the branch you want into the actual repo
You'd end up with a repository that includes your desired change sets, but not all the history from the unwanted branch.
Follow-up to #DanMan
Pull needed branch into intermediate repo
Strip unwanted changesets in clone
hg pull CLONE in real target
Write a tool, a hg-extension or extend the graft command so it can graft from a second repository, similar to the transplant extension.
Yes, the implementation is not so easy as adding the second-repo-functionality in the transplant command. The transplant extension simply uses a patch from an other repo instead of one from own repo. But I think also for graft there is no technical reason, not to do that.
If I understand right, grafting of a single file change is not more than calling the merge tool with the files:
(base) the parent of the to-graft changeset
(my changes) the target revision, on which to graft
(theirs) the to-graft refision
So in order to graft a file change from an other repo, the whole file from other repo is needed before the to-graft changeset was applied (base) and after it was applied (theirs). Technically it should be no problem.
Additionally the implementation
need to determine, which files are affected by the changeset to graft
need to handle deletions correctly
need to handle file renamings (not sure, how complicated it is)
All that should be possible, I see no real technical problems.

Restore history to split repository

Rightly or wrongly (almost certainly wrongly) I've ended up in the following position:
We used to have one big mercurial repository. We have now moved a large portion of this code base in to a separate repository. Looking back, it seems we could have done this in such a way that kept the file history, but naively we simply copied the files in to the new repository.
We are now in a position where the two repositories have been split, but I want to attach the file history of the moved files to those in the new repository. The files have been deleted in the original repository - but this history obviously still exists.
Is there anyway to "pre-pend" the history of the files in the original repository to those in the split repository?
The convert extension allows you to convert a Mercurial repository to another one while excluding and modifying filenames (using the --filemap option).
So use the convert command twice, one for each split, and use a corresponding filemap which only involves the files supposed to be in the particular target repository.
If you already did commits in the split repos you do not want to loose, use the convert approach anyway and afterwards migrate your new commits made in the splits w/o history to your new split repos made with convert:
Suppose your old splits with lost history are os1 and os2 and your new splits made with the convert command are ns1 and ns2. Then in ns1 you would do
hg pull --force os1 (force is required because you pull from an unrelated repo),
hg rebase -s <first-new-commit-made-in-os1> -d <last-converted-commit-in-ns1> (rebase new commits onto your split made with convert)
hg strip <first-commit-at-all-in-os1> (discard everything you pulled from os1 but not needed anymore)
Do it similarly with os2 and ns2
I don't know about attaching the file history after you've already done the split, but if you can do it all over again this is one option:
In your original repository, go back to the last revision (REV) before you deleted the files:
hg up -r REV
Go up one level from the original repository and make two new clones:
hg clone OriginalRepo Split1
hg clone OriginalRepo Split2
Go into Split1, delete the files you don't want in that repository, and commit. Do the same for Split2. These two are now your new split repositories, both with a complete history. Of course, the history for all deleted files (even those you initially didn't want in the repository) will exist equally in both repositories as well.
If you've done a lot of work in your new repositories after you made the split, you might be able to add these to your Split1 and Split2 by using export/import (check hg help import and hg help export). Depending on the number of changes you've made, there might be better ways to do this (e.g. mercurial queues), but this is what came to my mind first.

How to remove largefiles from Mercurial repo

See also this question.
Without knowing what I was doing, I enabled the largefiles extension, committed a file and pushed it to kiln. Now I know the error of my ways, and I need to permanently revert this change.
I followed the guidance from SO on the subject; and I can remove largefiles locally, but this doesn't affect the remote repos in kiln. I have tried opening the repo in KilnRepositories on the Kiln server and nuking the largefiles folder (as well as deleting 'largefiles' from the requires file), but after a few pushes/pulls the folder and the require's line come back.
Is there a way to make this permanent? (Setting requires to readonly doesn't work either).
Note: This is true at least of (for Windows) TortoiseHg 2.4 (Mercurial 2.2.1) -> 2.7.1 (Mercurial 2.5.2). I will not speak for future or older versions.
After looking through the various mercurial extensions available, I have concluded that it is generally not possible to convert a repository 'back' once a file has been committed using the largefiles extension.
First, the two rationales for why you do not want largefiles in play on your repos: one and two.
Once a file has has been committed as a largefile, to remove it, all references to the '.hglf' path must be removed from the repo. A backout isn't sufficient, as its commit contents will reference the path of the file, including the '.hglf' folder. Once mercurial sees this, it will write 'largefiles' back to the /.hg/requires file and the repo is once more largefile locked. Likewise with hg forget and remove.
Option 1: If your repo is in isolation (you have end-to-end control of the repo in all of its local and remote locations AND no one else has branched from this repo), it may be possible to use the mq extension and strip the changeset. This is probably only a viable option if you have caught the mistake in time.
Option 2: If the offending changeset (the largefile commit) exists on a commit phase that is draft (or that can be forced back into draft), then it may be possible to import the commit to mq and unapply the changeset using hg qpop. This is superior to stripping, because it preserves the commit history forward from the extracted changeset. In real life, this is frequently not possible, because you've likely already performed merges and pushed/pulled from public phase branches. However, if caught soon enough, mq can offer a way to salvage the repo.
Option 3: If the offending changeset is referenced in one and only one place (the original commit), and no one has attempted to backout/remove/forget the changset (thus creating multiple references), it may be possible to use hg rebase, to fold the first child changeset after the offense with the parent changeset of the offense. In doing so, the offensive changeset becomes a new head which can then be stripped off with mq strip. This can work where attempts to import to mq have failed.
Option 4: If none of the above work, you can use transplant or graft, to export all of the non-offending changesets as patches (careful to export them in the correct sequence), then hg update to the first sane changeset before the offense, mq strip the repo of all forward changesets and then reapply your exported patches in sequence.
Option 5: (What I ultimately did). Clone the repo locally, so that you have two copies: clone_to_keep, clone_to_destroy. In clone_to_keep, update to the first sane changeset before the offense. Mq strip all forward changesets. Merge back down if left with multiple heads. In clone_to_destroy, update to the tip. In windows explorer, copy everything in /clone_to_destroy except the .hg and .hglf folders to the /clone_to_keep folder. Inside Tortoise, commit all changes in clone_to_keep as a single changeset. Preserve one remote instance of clone_to_destroy in a readonly state for historical purposes, and destroy all others.
Option 6: The nuclear option. If all else fails, and if you don't care about the integration of your repo with external systems (bug tracking, CI, etc), you can follow the aforementioned SO post and use the hg convert extension. This will create a new copy of the infected repo, removing all references to the offending changesets; however, it does so by iterating each changeset in the entire repo and committing it to the new repo as a NEW changeset. This creates a repo which is incompatible with any existing branch repos--none of the changeset ids will line up. Unless you have no branch repos, this option will probably never work.
In all cases, you then have to take your fix and manually reapply to each distinct repository instance (copy the repo folder, clone, whatever your preferred method).
In the end, it turns out that enabling largefiles is an extremely expensive mistake to make. It's time consuming and ultimately destructive to fix. I don't recommend ever allowing largefiles to make it into your repos.

Mercurial Remove History

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.
There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.
You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)