subrepo, hg clone and symlinks

subrepo, hg clone and symlinks - mercurial

I'm quite new to mercurial, I've read a lot on this topic but I've been unable to find a clear answer.
The mercurial guide says: "For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory)."
The Repository wiki page says: "All of the files and directories that coexist with the .hg directory in the repository root are said to live in the working directory".
Now, to "link" a subrepo in a main repo I do:
hg init main
cd main
echo subrepo = ../subrepo > .hgsub
hg clone ../subrepo subrepo # (1)
hg add
hg ci -m "initial rev of the main repo"
Does the definition above mean that I'm actually creating a copy of subrepo when I perform (1)?? Or am I creating just a symlink to ../subrepo? According to the output of ls, it is an actual copy. But it sounds so strange to me... If someone could put a bit of light on this subject, I'd appreciate.

First of all, that part of Mercurial, I'm not an expert, but here's what I've understood.
No, you didn't create a link to the whole directory. Instead, files were hardlinked inside it.
This means that space on disk is reserved to keep your directory structure separate, but the files are all identical, because they were just cloned, so they are constructed as links back to the original.
When you start manipulating the repository, through your add or commit (ci) commands, then the hardlinks are broken by Mercurial and separate files are constructed for each, on demand.
Now, this is purely a technical thing, you don't need to know or care about this. If it makes it easier, just think of a clone as a complete copy of the original repository, separate files and all that. The hardlink part is just to save diskspace for the things that are the same.
Since a typical project has many files, and a typical changeset only changes a few files, and a typical reason to clone is that you're going to do a fixed set of changes, hardlinks makes sense since many of the files in the repository directories will be 100% identical to their original for the lifetime of the repository.
For those that aren't, all of that is silently handled by Mercurial for you.

Let us start by looking at what happens when you clone without talking about subrepositories. When you do
$ hg clone A B
then Mercurial will make hard links for the files inside A/.hg/store/data. So if a file called x is tracked, then after the clone you will see that
A/.hg/store/data/x.i
and
B/.hg/store/data/x.i
are hard linked -- this means that the two filenames really refer to the same file. As Lasse points out, this is smart since you might never commit a change to x clone, and so there is no reason to make two different x.i files for the A and B clones. Another advantage is that it is much faster to make a hard link than to copy a file, especially if x.i is very large: the hard link is a constant time operation.
In your example above you are adding a subrepository subrepo to the main repository. A subrepository consist of two things:
the subrepository itself. This what you creates when you do
$ hg clone ../subrepo
the subrepository meta data. This is what you store in the .hgsub file. You must tell Mercurial where you want the subrepository and where Mercurial can clone it from.
You ask if you copy or symlink the repository, and you certainly copied (cloned) it, as you have also confirmed with ls. Afterwards you added some meta data to Mercurial that tells it where it can expect to find the subrepository. This has nothing to do with a symbolic link in the normal filesystem sense, it is just some meta data for Mercurial.

Related

What parts of the hg-git metadata state do I need to back up in order to start over?

This is currently a purely theoretical question (related to this one), but let me first give the background. Whenever you run hg gexport the initial hash will vary from invocation to invocation. This is similar to when you run git init or hg init. However, since the Mercurial and Git commits correspond to each other and build on previous hashes, there should be some way to start over from a minimal common initial state (or minimal state on the Git side, for example).
Suppose I have used hg-git in the past and now I am trying to sync again between my Mercurial and my Git states, but without (or very little of) the original .git directory from the hg gexport. What I do have, though, are the two metadata files: git-mapfile and git-tags.
There is an old Git mirror, which is sort of "behind" and the Mercurial repo which is up-to-date.
Then I configure the Mercurial repo for hg-git like so (.hg/hgrc):
[git]
intree = True
[extensions]
hgext.bookmarks=
topic=
hggit=
[paths]
default = ssh://username#hgserver.tld//project/repo
gitmirror = git+ssh://username#server.tld/project/repo.git
If I now do the naive hg pull gitmirror all I will gain is a duplication of every existing commit on an unrelated branch with unrelated commit history (and the double amount of heads, compared to prior to the pull).
It clearly makes no big difference to place those two metadata files (git-mapfile and git-tags) into .hg. The biggest difference is that the pull without these files will succeed (but duplicate everything) and the pull with them will error out at the first revision because of "abort: unknown revision ..." (even makes sense).
Question: which part(s) and how much (i.e. what's the minimum!) of the Git-side data/metadata created by hg gexport do I have to keep around in order to start over syncing with hg-git? (I was unable to find this covered in the documentation.)

The core metadata is stored in .hg/git-mapfile, and the actual Git repository is stored in .hg/git or .git dependending on intree. The git-mapfile is the only file needed to reproduce the full state; anything else is just cache. In order to recreate a repository from scratch, do the following:
Clone or initialise the Mercurial repository, somehow.
Clone or initialise the embedded Git repository, e.g. using git clone --bare git+ssh://username#server.tld/project/repo.git .hg/git.
Copy over the metadata from the original repository, and put it into .hg/git-mapfile.
Run hg git-cleanup to remove any commits from the map no longer known to Mercurial.
Pull from Git.
Push to Git.
These are the steps I'd use, off the top of my head. The three last steps are the most important. In particular, you must pull from Git to populate the repository prior to pushing; otherwise, the conversion will fail.

Is there any side effect if I clone a repo and rename it?

I have central mercurial repository server. I cloned repoA on my local system
initiated new repoB on central server. cloned repob to local. copied everything from repoA to repoB, commit and pushed to repoB (central server)
now i have all the changeset history from repoA on this new repoB
there was a need to do so as there were two application code on same repoA, to separate it i did the above experiment. and it is working.
my question is by doing so is there any side effects , or is there a better way to do it (recommended way ) please suggest, thank you !

When you clone a repository to your local PC, the repository lives in a folder. That name of that folder is typically how people refer to the "name of the clone" or the "name of the repository".
Other than that, the folder name itself has very little significance and is not even properly part of the Mercurial repository.
It sounds like you did several other steps, but basically if you renamed repository A to B it won't make much difference (but see notes below).
You do not need to use hg clone to clone a repository. You can literally just copy the entire repository folder and the copy will work just fine. The one difference that I am aware of when you use clone vs. operating system file copy is that the clone will point back to the repo you cloned from (for use in push/pull operations). The copy would point back to the original source. (See notes below about some related effects).
One situation where you might cause some problems by renaming the repository folder is if you have have cloned FROM it. Example: you have local repo A. You clone A to B. Now internal to the configuration data in B is a reference to the folder path including A. If you rename A to A1 then that path is obviously broken.
In such a situation you can easily edit the B/.hg/hgrc file and modify the line starting with default= to correct the path.
Based on your question it sounded like you copied a bunch of stuff from one repo to another. Presumably this also included the .hg folder. Generally speaking I recommend avoiding the contents of that folder, and always approach it with caution.
Although technically some of it is human-readable it is simpler & safer to treat it as a black box, or you risk corrupting your repository. There are occasional exceptions (like hgrc) but they are few & far between.
Of course if you are just trying to learn how it works then by all means try things & see what happens! One of the great Mercurial features is the ability to copy a repo, mess around with it, and throw it away when done.

Mercurial: how do I create a new repository containing a subrange of revisions from an existing repo?

I have a repo with subrepos, with a long history. At some point the main subrepo became fully self-contained (doesn't depend on other sister subrepos).
I don't care anymore about the history of the whole thing before the main subrepo became self-contained. So I want to start a new repo that contains just what the subrepo has in it from that moment on. If possible, please describe in terms of TortoiseHg commands.

You probably want to make use of mercurial's convert extension. You can specify revisions to be converted, paths, branches and files to include or exclude in the newly-created repository.
hg convert from-path new-repo
Convert is a default extension which just needs activation.

You might be able to weed out any other changesets you don't need by using the hg strip command.
It will entirely remove changesets (and their ancestors) from a repository. Most likely you would want to make a fresh clone and then work on stripping it down.
One potential pitfall is that the final stripped repo would still have shared parentage with the original; therefore the potential exists for someone to accidentally pull down again changesets which were stripped.
The hg convert command (noted in another answer) does not have this downside.

Mercurial repo inside a repo

Is it possible to create a mercurial repository inside an existing mercurial repository?
The idea is to handle subdirectories of a repository as different repositories, how do you do that?
I'm not talking about subrepos (at least, if I understood the purpose of subrepos...), but if this is how subrepos do exist for, I got it wrong and I'll try to get it right :)
Thanks
~Aki
EDIT: To be more clear, I'd like to know what happens, the practices and the implications of having a repository inside another one, without specifying modules/subrepos.
In other words: what happens if I just do:
hg init globalRepo
hg init globalRepo/subRepo
and use these two repositories as-are?

It works well. Long before the subrepo support was added in Mercurial 1.3, lots of folks kept their entire home directories in a mercurial repo for tracking their .bashrc files and the like. Then within their home dir they'd have many clones of other repos.
Whenever you invoke mercurial (without the -R option) it looks in the current directory for a .hg directory and then just keeps going up directories until it finds one. So if you're in a repo that is in a repo, your commands will always act on the innermost repo you're in.
The caveat is that you want to make sure not to have files added to the outer repo that end up inside the inner repo. Then you'll have two repos updating the same files.

As you can see in this SO question, you can make that kind of nested hg init, even though it is usually reserved for defining subRepo (which is not what you are after).
Normally it should work as two independant repos, but I would advise adding an hgignore rule in the globalRepo in order to ignore the subRepo content altogether.

Here are some docs on nested repositories.

Mercurial Remove History

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.

There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.

You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008