How to remove largefiles from Mercurial repo - mercurial

See also this question.
Without knowing what I was doing, I enabled the largefiles extension, committed a file and pushed it to kiln. Now I know the error of my ways, and I need to permanently revert this change.
I followed the guidance from SO on the subject; and I can remove largefiles locally, but this doesn't affect the remote repos in kiln. I have tried opening the repo in KilnRepositories on the Kiln server and nuking the largefiles folder (as well as deleting 'largefiles' from the requires file), but after a few pushes/pulls the folder and the require's line come back.
Is there a way to make this permanent? (Setting requires to readonly doesn't work either).

Note: This is true at least of (for Windows) TortoiseHg 2.4 (Mercurial 2.2.1) -> 2.7.1 (Mercurial 2.5.2). I will not speak for future or older versions.
After looking through the various mercurial extensions available, I have concluded that it is generally not possible to convert a repository 'back' once a file has been committed using the largefiles extension.
First, the two rationales for why you do not want largefiles in play on your repos: one and two.
Once a file has has been committed as a largefile, to remove it, all references to the '.hglf' path must be removed from the repo. A backout isn't sufficient, as its commit contents will reference the path of the file, including the '.hglf' folder. Once mercurial sees this, it will write 'largefiles' back to the /.hg/requires file and the repo is once more largefile locked. Likewise with hg forget and remove.
Option 1: If your repo is in isolation (you have end-to-end control of the repo in all of its local and remote locations AND no one else has branched from this repo), it may be possible to use the mq extension and strip the changeset. This is probably only a viable option if you have caught the mistake in time.
Option 2: If the offending changeset (the largefile commit) exists on a commit phase that is draft (or that can be forced back into draft), then it may be possible to import the commit to mq and unapply the changeset using hg qpop. This is superior to stripping, because it preserves the commit history forward from the extracted changeset. In real life, this is frequently not possible, because you've likely already performed merges and pushed/pulled from public phase branches. However, if caught soon enough, mq can offer a way to salvage the repo.
Option 3: If the offending changeset is referenced in one and only one place (the original commit), and no one has attempted to backout/remove/forget the changset (thus creating multiple references), it may be possible to use hg rebase, to fold the first child changeset after the offense with the parent changeset of the offense. In doing so, the offensive changeset becomes a new head which can then be stripped off with mq strip. This can work where attempts to import to mq have failed.
Option 4: If none of the above work, you can use transplant or graft, to export all of the non-offending changesets as patches (careful to export them in the correct sequence), then hg update to the first sane changeset before the offense, mq strip the repo of all forward changesets and then reapply your exported patches in sequence.
Option 5: (What I ultimately did). Clone the repo locally, so that you have two copies: clone_to_keep, clone_to_destroy. In clone_to_keep, update to the first sane changeset before the offense. Mq strip all forward changesets. Merge back down if left with multiple heads. In clone_to_destroy, update to the tip. In windows explorer, copy everything in /clone_to_destroy except the .hg and .hglf folders to the /clone_to_keep folder. Inside Tortoise, commit all changes in clone_to_keep as a single changeset. Preserve one remote instance of clone_to_destroy in a readonly state for historical purposes, and destroy all others.
Option 6: The nuclear option. If all else fails, and if you don't care about the integration of your repo with external systems (bug tracking, CI, etc), you can follow the aforementioned SO post and use the hg convert extension. This will create a new copy of the infected repo, removing all references to the offending changesets; however, it does so by iterating each changeset in the entire repo and committing it to the new repo as a NEW changeset. This creates a repo which is incompatible with any existing branch repos--none of the changeset ids will line up. Unless you have no branch repos, this option will probably never work.
In all cases, you then have to take your fix and manually reapply to each distinct repository instance (copy the repo folder, clone, whatever your preferred method).
In the end, it turns out that enabling largefiles is an extremely expensive mistake to make. It's time consuming and ultimately destructive to fix. I don't recommend ever allowing largefiles to make it into your repos.

Related

Mercurial: how do I create a new repository containing a subrange of revisions from an existing repo?

I have a repo with subrepos, with a long history. At some point the main subrepo became fully self-contained (doesn't depend on other sister subrepos).
I don't care anymore about the history of the whole thing before the main subrepo became self-contained. So I want to start a new repo that contains just what the subrepo has in it from that moment on. If possible, please describe in terms of TortoiseHg commands.
You probably want to make use of mercurial's convert extension. You can specify revisions to be converted, paths, branches and files to include or exclude in the newly-created repository.
hg convert from-path new-repo
Convert is a default extension which just needs activation.
You might be able to weed out any other changesets you don't need by using the hg strip command.
It will entirely remove changesets (and their ancestors) from a repository. Most likely you would want to make a fresh clone and then work on stripping it down.
One potential pitfall is that the final stripped repo would still have shared parentage with the original; therefore the potential exists for someone to accidentally pull down again changesets which were stripped.
The hg convert command (noted in another answer) does not have this downside.

Mercurial : how to deal with rejected pull request?

At least : how to deal with it cleanly?
Currently, the only worklow I know is : suppress your clone + re-fork from main repository.
That's really underoptimal...
The other option is merging with main repo's tip, or backing out your changeset.
But if you do that, will the backout appear in subsequent pull requests?
If so, it's embarrassing polluting main repo with rejected changesets and their backout...
What's the correct workflow?
It depends.
If your fork is public or you want to keep the rejects, something like this workflow is probably ideal:
Update to the tip and do hg ci --close-branch.
Pull from the main repository if necessary.
Update to the most recent changeset which belongs to the main repository, and do all work there.
If your repo is using the # bookmark, move it to the new head.
If your fork is not public, you can simply strip the changesets you no longer want. Look in your repository settings for the "strip changesets" option. You will also need to execute hg strip locally on each clone of the repo; the activity feed will provide the precise command to use.
If you happen to be using the experimental Evolve extension, you can hg prune unwanted changesets instead of the procedures described above. This will leave them around, but hide them from history and prevent them from being pushed or pulled (in most circumstances). This is (intended to be) a "safe" operation that you can do to shared changesets. If anyone pulls from your repository, the changesets will be automatically pruned in their repository as well. To undo, see the hg touch command.
NB: Bitbucket will not back up changesets when you strip them. Please be careful.

Mercurial: graft vs. record vs. qrecord vs. shelve vs. transplant vs. dirstate vs. queue

I am new to Mercurial and still somehow in the evaluation process, so these four concepts are kind of confusing for me. Some are mentioned to be an equivalent to Git's Staging/Index concept, or some even a better one than Git's Staging.
How do the four commands hg graft, hg record, hg qrecord and hg shelve (and hg transplant, but this is explained in Graft vs. Transplant already) compare to each other, and how the concepts of queues and the dirstate? In which use cases is one choosen over the other?
I know there are help pages for each one, but it still is difficult to figure out what each one does as VCS in general is a new topic for me.
The design of Mercurial simply does not include the concept of a staging area. That is, there is no intermediate state between local modification and commit.
Here is an overview of each of the concepts you mentioned:
hg graft is the equivalent of git cherry-pick. It copies a commit from one branch to another. A typical use case for this feature is to copy a bug fix from one release branch to another. This command replaces the older (and now obsolete) hg transplant extension.
hg record and hg qrecord are similar to git add --patch. They allow you to interactively select hunks for commit. So if you modified several different areas of one file, you could select which areas (i.e. hunks) you actually want to commit and which you want to leave as local modifications.
qrecord is only available if you have mq enabled. It commits to an mq patch rather than a standard commit.
hg shelve is similar to git stash. It allows you to temporarily set aside local modifications to your files (or hunks of a file). These modifications can then be unshelved when you are ready for them.
dirstate is an internal class of of the Mercurial source code. It is not exposed to the user.
Mercurial Queues (also know as mq) are probably the closest you will get to a staging area in Mercurial. Here is a description from the Mercurial wiki:
Changes are maintained as patches which are committed into Mercurial.
Commits can be removed or reordered, and the underlying patch can be
refreshed based on changes made in the working directory. The patch
directory can also be placed under revision control, so you can have a
separate history of changes made to your patches.
mq is often used to polish/rework commits that you are testing locally, but have not pushed to a public location. Some people also use it to maintain a set of modifications to 3rd party code.

Mercurial clone cleanup to match upstream

I have a hg clone of a repository into which I have done numerous changes locally over a few months and pushed them to my clone at google code. Unfortunately as a noob I committed a whole bunch of changes on the default branch.
Now I would like to make sure my current default is EXACTLY as upstream and then I can do proper branching off default and only working on the branches..
However how do I do that cleanup though?
For reference my clone is http://code.google.com/r/mosabua-roboguice/source/browse
PS: I got my self into the same problem with git and got that cleaned up: Cleanup git master branch and move some commit to new branch?
First, there's nothing wrong with committing on the default branch. You generally don't want to create a separate named branch for every task in Mercurial, because named branches are forever. You might want to look at the bookmark feature for something closer to git branches ("hg help bookmarks"). So if the only thing wrong with your existing changesets is that they are on the default branch, then there really is nothing wrong with them. Don't worry about it.
However, if you really want to start afresh, the obvious, straightforward thing to do is reclone from upstream. You can keep your messy changesets by moving the existing repo and recloning. Then transplant the changesets from the old repo into the new one on a branch of your choosing.
If you don't want to spend the time/bandwidth for a new clone, you can use the (advanced, dangerous, not for beginners) strip command. First, you have to enable the mq extension (google it or see the manual -- I'm deliberately not explaining it here because it's dangerous). Then run
hg strip 'outgoing("http://upstream/path/to/repo")'
Note that I'm using the revsets feature added in Mercurial 1.7 here. If you're using an older version, there's no easy way to do this.
The best way to do this is with two clones. When working with a remote repo I don't control I always keep a local clone called 'virgin' to which I make no changes. For example:
hg clone -U https://code.google.com/r/mosabua-roboguice-clean/ mosabua-roboguice-clean-virgin
hg clone mosabua-roboguice-clean-virgin mosabua-roboguice-clean-working
Note that because Mercurial uses hard links for local clones and because that first clone was a clone with -U (no working directory (bare repo in git terms)) this takes up no additional disk space.
Work all you want in robo-guice working and pull in robo-guice virgin to see what's going on upstream, and pull again in roboguice-working to get upstream changes.
You can do something like this after the fact by creating a new clone of the remote repo and if diskspace is precious use the relink extension to associate them.
Preface - all history changes have sense only for non-published repos. You'll have to push to GoogleCode's repo from scratch after editing local history (delete repo on GC, create empty, push) - otherwise you'll gust get one more HEAD in default branch
Manfred
Easy (but not short) way - default only+MQ
as Greg mentioned, install MQ
move all your commits into MQ-patches on top of upstream code
leave your changes as pathes forever
check, edit if nesessary and re-integrate patches after each upstream pull (this way your own CG-repo without MQ-patches will become identical to upstream)
More complex - MQ in the middle + separate branches
above
above
create named branch, switch to it
"Finish" patches
Pull upstream, merge with your branch changes (from defaut to yourbranch)
Commit your changes only into yourbranch
Rebasing
Enable rebase extension
Create named branch (with changeset in it? TBT)
Rebase your changesets to the new ancestor, test results
See 5-6 from "More complex" chapter
Perhaps you could try the Convert extension. It can bring a repository in a better shape, while preserving history. Of course, after the modifications have been done, you will have to delete the old repo and upload the converted one.

Mercurial Remove History

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.
There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.
You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)