Fetching a single file from another mercurial repository [duplicate] - mercurial

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Mercurial: copying ONE file and its history to another repository
I have several repositories in my local machine.
One is my main code, another is an assortment of useful code/tools.
These are two fundamentally different repos. It might make sense to setup a new repo and pull these two in as sub-repos, but I want to wait until Mercurial devs mark sub-repos as non-experimental before I do that.
One of the useful code files has become so useful, I want to put it into my main code area...but I want to maintain its history. This will, of course, result in some variant of a fork, but that's acceptable. (best case would be being able to push-pull it back and forth and keep updating its history).

I'd just use the subrepo feature that came online in 1.3. It might change slightly, but you won't be left high and dry backwards compatibility wise.
If you can't bring yourself to so, then what you need to do is:
use hg convert with a filemap that deletes all files except the one you want and convert from the repo with the single useful file to a new repo containing only that file and all its history
then hg pull from the new single-file-full-history repo into the target repo
hg merge in the target repo and you'll have that file with all it's history
The other option would be to hg export the entire tools repo, use grepdiff (part of difftools) to limit to only one file, and then import into the target repo, but that's crazy.

The short answer is you can't copy a file and its history simply, as stated in this thread:
https://www.mercurial-scm.org/pipermail/mercurial/2009-April/025105.html
I'm relatively new to DVCS and you really have to think of each repo as a self contained package. Not like svn or p4, where you can hang different projects off the root and configure it how you like and do partial repo checkouts. (That said, I really like the flexibility of being able to clone repos quickly to try things out. And being able to commit on a local machine.)
I'm just looking at a similar problem. I've branched a repo to make changes and I only want one file out of one changeset. And it is nice to have the history.
You could look at:
hg cat
This would probably involve writing a script to transfer history, i.e. commit N changesets in the target repo with the hg cat results from the source. Wonder if there is an extension to do this?
You could get the log of the file you want to copy and paste that into a commit comment. It's not in the metadata, but you do have a record and all the hashes etc.

may be
hg export
also can help you.

Related

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

How to completely change an hg repo's structure

I have an hg (Mercurial) repo located at, say:
http://myhg:5000/projects/fizzbuzz
This fizzbuz directory has the following basic structure:
fizzbuzz/
src/
... thousands of source files
docs/
... lots of docs
tests/
... lots of tests
I am now completely re-engineering the fizzbuzz app. The new app's project structure will be completely different (from the top down) than the existing one:
fizzbuzz/
herps/
foo/
... thousands of foos
bar/
... thousands of bars
derps/
... lots of derps
It's essentially a brand new app. I guess one solution would be to delete the fizzbuzz repo and then create a new one and add my code to the new one. But I was wondering if there's a way to basically tell hg to erase everything in a repo (but not delete the repo), and then add in the new, re-engineered, content. Or some other way to elegantly swap out the new code base for the old. Ideas? Thanks in advance!
Sure, you can wipe a repository by deleting everything and commiting all the changes (all the deletions). If you ever need to restore or review the old code it will still be available by checking out the revisions prior to the deletion. Unless your repository is very large on disk this is probably the way to go--alternatively you can start a new repository for the new version and leave the current one as-is.
In either case deleting all the code and its history is typically unwise.
Take a look at the Convert extension. While it is, by design, not possible to change the history of your current repository, you can, via hg convert, construct a new repository based on the history of an existing repository. This is very useful for scenarios like you describe, where you need to refactor the file-structure to such a degree that the old history is no longer useful.
That said, consider just making the changes directly in your current repository. What actual benefit do you get by rewriting history? Make the changes now, and Mercurial will continue doing it's job of tracking where you came from.

Creating Mercurial subrepositories, while maintaining the history

I am about to make some major changes to my Mercurial repositories. As I am going to be using a Feature of Last Resort, I am looking for some advice and reassurance that I am not doing something stupid.
Where I Am:
I have a Mercurial repository with a complete history of all of these files:
/source
/secret_subsystem
/unclassified_subsystem
/common_files
Source is the Mercurial repository.
The secret subsystem folder contains code which is intellectual property we want to keep in-house.
The unclassified subsystem folder contains code which we want to outsource to a third-party to maintain.
The common files folder contains code that both subsystems depend on. We will be keeping ownership, but we want to share it with the third-party.
Obviously, I can't just push out my whole repository to the third-party company. The third-party would see too much.
Where I Want To Be:
Having read up on subrepositories, this is where I think I need to be:
Have THREE subrepositories: secret_subsystem, unclassified_subsystem, common_files.
Ensure there are no other files at the /source level, due to this recommendation.
Have the outsourcers create a brand new respository at the source level on their machines, and two corresponding subrepositories.
Push the unclassified_subsystem and common_files to the out-sourcer, pulling back unclassified_subsystem as required, pushing out new common_files repositories as required.
Maintaining History:
I would like to maintain the commit history, as much as practical, for all of the subsystems.
To do this, I will run the hg convert extension command three times, once for each subrepository. I will filter down to only the files that belong in each subrepository. I may also need to map filenames to move the files from ./common_files/foo.py to ./foo.py (for example).
My Questions:
1) Is dividing up a repository into subrepository a reasonable way of implementing security - viz. that a third-party can only see and edit some of our files?
2) Is using hg convert a reasonable way to create a subrepository from an existing repository, while still maintaining the history?
3) Will hg convert's filter strip out (a) all commits messages about files NOT in the filtered respository? Will it filter out all diffs for files NOT in the filtered repository?
There is another implied question: Am I heading into a world of hurt? If so, I will simply give up on retaining file histories, or even make them seperate repositories and forget about cross-repository commits.
I've not used subrepos so far, but I can answer 2) and 3):
2) Yes, sounds reasonable.
3) Yes.
There was a similar question just 2 days ago: Convert mercurial repository to subrepositories with full history (like hg log -f)

"factorizing" a mercurial repository on kiln

Summarized questions:
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
Detailed Questions:
I have begun the process of creating multiple repositories representing several projects that have shared components, and that is going well, thanks to SO and some helpful answers to my question here
As I move on to adding a second project I notice there are a few files in my projects that are duplicated, and are essentially the same thing, with enough similarity to warrant taking them out of a main project repository and creating a new subrepository so they can be
used by any new projects I begin, and
removed from other existing repositories, since they are identical.
I am assuming the best way is to simply create a new repository, move the files across on the local file system, push both repositories, and then create a .hgsub file and proceeed as in the answer to my earlier question. This would obviously then shift the files concerned to a subfolder in the local file system under each main project, which i can live with, but it does raise the hypothetical question - is it possible to have a list of files in a repository that are effectively part of a sub repository but reside alongside other files (i.e. not in a sub folder).
If I wanted to (for example) have a "acme.h" file in each project that is part of another repository could I do this? as it happens, I don't need to do this at this point in time, and in my current situation it would be better from a design point of view to have the files I need to "refactor" into another repository in their own subfolder, however that might not always be the case. I use refactor in quotes here, as strictly speaking it's more about refactoring duplicated files that is refactoring code - however the same principle applies.
hopefully my questions are succinct enough to be answered without too much more explanation.
Thanks for summary, makes it much easier to answer!
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
You can use the convert extension to extract a directory from an existing Mercurial repository. You'll want to use the --filemap flag and in the filemap you include the directory you want and rename it to the root. See hg help convert for more info.
After you get a smaller repository with the
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
They must be in their own folders. This is simply because that's how a repository looks like in Mercurial, Git, Subversion, ... When you're dealing with subrepositories, then Mercurial is not tracking the files inside the subrepo: it's just asking some (other) system to make a checkout of repository foo at some location.
So when your .hgsub file has
foo = foo
bar = [git]bar
baz = [svn]baz
then Mercurial will notice this on hg update and run
hg clone default-path-of-this-repo/foo foo
git clone default-path-of-this-repo/bar bar
svn checkout default-path-of-this-repo/baz baz
for your. This explains why subrepostories are directories in the outer repository: that's simply what a clone/checkout looks like these days.
As you can see, subrepositories can be of different types. It's conceivable that someone could add a RCS subrepository type for tracking individual files. They would then not have to live in a directory.

Are deleted files still downloaded with an hg clone?

In our Mercurial repo we added a really big file (and did an hg push), then deleted the big file (and did another push).
Now if someone does an hg clone will they still pull down that big file? I know it won't appear in their working directory as it was deleted, but will the file still be pulled down and stored in Mercurial internal storage?
I'd like to ensure people don't have to pull down the file. I've learned that really big files should be stored outside of Mercurial, so I deleted the file. But I was wondering if people will still be pulling down the big file - in which case I guess I will recreate the repository from scratch.
Of course it will still be in the repository.
You can always update back to older revisions, and if you update back to the revision you got when you committed the file, it'll be there in all its glory.
There are two ways to mitigate this (when you're committing, not now):
One of the big-files extensions, these essentially add big files to a secondary repository and link the two, so that if you update to a revision where the file doesn't exist, and you don't already have it, it will not get updated. ie. it's more a "on-demand" style of pulling
If the file never changes, keep it available on the network and just create some kind of link to it instead of a full copy
Right now, you got four options:
Strip away the changeset that added the file, and all the changesets that came after it. You can do that using the Mercurial Queues extension. Note that you need to do this stripping in all clones. If just one of your users push back the repository that has that file in its history to the central clone, you have the changesets back.
Rebuild the repository from scratch manually
Using the hg convert command and some filtering, the --filemap option can be used for this
Leave it as is. How big is it, will be much of a problem?
Note that rebuilding the repository, either manually or through hg convert will invalidate all clones. Anyone trying to push to your new central clone from an old clone will get a message about unrelated repositories. If any of your users are stupi^H^H^H^H^Hnot smart enough to realize that forcing the push is a bad idea, then you will have problems with this approach.
Yes, the file is still in the history. If you want to delete it completely, you need to use Mercurial Queues — see Editing History on Mercurial wiki.
Just keep in mind this breaks clones as revision IDs change.