Why does my Mercurial HG repo grow substantially when renaming files? - mercurial

I have a repo which counted 50,1 MB as Windows Explorer presents it when I checked properites of the .hg folder. Then I renamed a folder containing 341 files and 4MB, checked in the changes, and the repo now counted 54,5MB. I used the following command to rename:
hg rename -vf oldFolder newFolder
What's wrong? I'm only changing references, not cloning files here, am I not?
edit: Any tips on how I can debug this is also appreciated.

Mercurial doesn't store renames efficiently right now.
When files are initially stored, their entire contents is put in the repository (obviously). Later modifications only take up the space of the diff (with some compression applied).
However, the storage is done using a so-called 'revlog'. This revlog stores all versions of a file. Renaming will create a new revlog, where again an 'initial storage' takes up the size of an entire file, instead of a diff.
This is not an inherent problem (so most likely it will be solved 'eventually'), but it's quite complex to solve in a nice way. See this bug for more details.
Git stores files in a different way, which handles renames without this overhead. That's why you don't see any growth. The 'shrinking' you see most likely has to do with garbage collection.

As per the documentation of the hg rename command:
hg rename [OPTION]... SOURCE... DEST
aliases: move, mv
rename files; equivalent of copy + remove
Mark dest as copies of sources; mark sources for deletion. If dest is a
directory, copies are put in that directory. If dest is a file, there can
only be one source.
As documented, the rename command is the equivalent of a copy, and then a remove, and a record of where the file was copied from.

Related

In hg clone, what's the difference between "adding changesets", "adding manifests", and "adding file changes"?

From the Mercurial documentation:
The manifest is the file that describes the contents of the repository at a particular changeset ID
https://www.mercurial-scm.org/wiki/Manifest
When cloning a Mercurial repository, I see lines of output saying:
adding changesets
adding manifests
adding file changes
I don't understand the difference between these things. I thought I understood what a changeset is, but I don't know how it would be different from a set of "file changes". And based on the description above, a manifest sounds like the same thing. So what's the difference between all of these?
Mercurial divides the information you need to keep track of in a versioning system into several levels:
Changesets -- the metadata about each revision. Who (author), when (date and time), why (the summary text) and what (the affected filenames), etc. is stored here.
Manifests -- each manifest lists the file revisions for the files at a given revision. This is like a linking table in a database; the file contents are not contained, only what version of a given file is part of this revision.
The file changes -- These files store the actual file data. It is inefficient to store each version ever produced of a given file entirely formed. Instead, this stores file data in a delta compression form; changes between versions are stored, with the occasional full copy to aid faster restoring to a version.
All 3 levels need to be copied into your repository from the remote server when cloning.
See the Mercurial Wiki Design page for details.

How to diff two revisions of a file that got renamed in between and Mercurial doesn't know about the renaming?

I accidentally renamed a file outside of Mercurial. When I committed the change, Mercurial treated the change as two unrelated files (ie. a remove and a add). I need to go back to diff the two revisions but I don't know how to do so when Mercurial sees them as two respective files across different revisions. What can I do to diff the files?
You didn't say what operating system you were using. The following will work with bash on Linux:
diff <(hg cat -r rev1 file1) <(hg cat -r rev2 file2)
You can replace diff with another program like vimdiff if you want a visual diff.
If you want to actually fix the history so that Mercurial is aware of the rename (and can use that information in future merges if needed), there's a way to do so documented on the Tips and Tricks page on the Mercurial wiki.
Current contents copied here for ease of use (and in case the link gets broken later):
Steps:
Update your working directory to before you did the rename
Do an actual "hg rename" which will create a new head
Merge that new head into the revision where you did the "manual" rename (not the head revision!)
Then finally merge the head revision into this merge result.
Advice:
Make a clone first and work on that, just in case!
After finishing the steps, use a file compare tool to check that the original and the clone are identical
Check the file history of any moved file to make sure it is now restored
That being said, if all you want to do is compare the contents at the point in time, you can definitely accomplish that without making Mercurial aware of the rename (as mentioned in Stephen Rasku's answer). In fact, you can use a combination of "hg cat" and an external comparison tool to compare any files, not just ones that Mercurial knows about.
Fix history:
Update to first changeset with new-filename, save file outside WC
Update to parent of bad replacement changeset, replace file correctly (with rename tracking), commit, got second head
Rebase all changesets from old anonymous branch on top of fresh good changeset
--close-branch on bad-replacement changeset or delete this unwanted changeset or leave inactive head intact

HowTo merge old mercurial's commits into one

I'm interesting in reducing the size of .hg folder in my project. I have size of .hg folder near 180Mb, but the real sources size of project is near 6Mb. So, I need some ways to downscale size of .hg folder.
In Mercurial history, there are a lot of bin, image files.
I searched some ways to solve this problem, but this is not exactly what I need. I needn't remove specific file, I need to remove all commits, so, in example:
0....1360 commits. With in more than 150Mb, and 6Mb real sources. I need similar to use 0...(1~50) commits and near 10Mb of real sources. This is cloned repositary, but I want to save all history in my server. So, the cloned repo with 10 commits need's to be related with server repo with 1360 commits. Any ways?
It sounds like you have both source code and your binaries committed to the repository. One option is to create a second repository for only "published/compiled" code and remove anything compiled from the first repository. Use the new repository for deploying updated code to your server. This should come with a smaller .hg folder on the server.
Either way, you cannot "remove" history without actually losing it with a DVCS. You have the option of completely deleting the .hg folder, but if you want to use Mercurial to update that folder and keep things in sync with what the latest changes are in the repository, you'll need to keep that folder around.

Mercurial, ignore only object files for which the source is there

I have quite a few objects (.o) files in my (future) Mercurial repository. I have the source for most of them, but some were downloaded / provided by collaborators and I don't have the sources. Of course, versioning binary files is bad, but I'd like to do that for the few files for which I don't have the sources (given that they will rarely, if ever, be modified, and that on the other hand I cannot regenerate them myself).
Assuming that I have a simple way of determining where .c corresponding to a .o is if it exists (e.g. always in the same folder), is there an easy way to achieve this?
Thanks.
You can safely add *.o files to your .hgignore, and then manually hg add specific *.o files to the repository for tracking. Any automatic adding (such as hg addremove, or hg add without an argument) will still ignore *.o files, but you can manually add anything ignored and it will be tracked.

subrepo, hg clone and symlinks

I'm quite new to mercurial, I've read a lot on this topic but I've been unable to find a clear answer.
The mercurial guide says: "For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory)."
The Repository wiki page says: "All of the files and directories that coexist with the .hg directory in the repository root are said to live in the working directory".
Now, to "link" a subrepo in a main repo I do:
hg init main
cd main
echo subrepo = ../subrepo > .hgsub
hg clone ../subrepo subrepo # (1)
hg add
hg ci -m "initial rev of the main repo"
Does the definition above mean that I'm actually creating a copy of subrepo when I perform (1)?? Or am I creating just a symlink to ../subrepo? According to the output of ls, it is an actual copy. But it sounds so strange to me... If someone could put a bit of light on this subject, I'd appreciate.
First of all, that part of Mercurial, I'm not an expert, but here's what I've understood.
No, you didn't create a link to the whole directory. Instead, files were hardlinked inside it.
This means that space on disk is reserved to keep your directory structure separate, but the files are all identical, because they were just cloned, so they are constructed as links back to the original.
When you start manipulating the repository, through your add or commit (ci) commands, then the hardlinks are broken by Mercurial and separate files are constructed for each, on demand.
Now, this is purely a technical thing, you don't need to know or care about this. If it makes it easier, just think of a clone as a complete copy of the original repository, separate files and all that. The hardlink part is just to save diskspace for the things that are the same.
Since a typical project has many files, and a typical changeset only changes a few files, and a typical reason to clone is that you're going to do a fixed set of changes, hardlinks makes sense since many of the files in the repository directories will be 100% identical to their original for the lifetime of the repository.
For those that aren't, all of that is silently handled by Mercurial for you.
Let us start by looking at what happens when you clone without talking about subrepositories. When you do
$ hg clone A B
then Mercurial will make hard links for the files inside A/.hg/store/data. So if a file called x is tracked, then after the clone you will see that
A/.hg/store/data/x.i
and
B/.hg/store/data/x.i
are hard linked -- this means that the two filenames really refer to the same file. As Lasse points out, this is smart since you might never commit a change to x clone, and so there is no reason to make two different x.i files for the A and B clones. Another advantage is that it is much faster to make a hard link than to copy a file, especially if x.i is very large: the hard link is a constant time operation.
In your example above you are adding a subrepository subrepo to the main repository. A subrepository consist of two things:
the subrepository itself. This what you creates when you do
$ hg clone ../subrepo
the subrepository meta data. This is what you store in the .hgsub file. You must tell Mercurial where you want the subrepository and where Mercurial can clone it from.
You ask if you copy or symlink the repository, and you certainly copied (cloned) it, as you have also confirmed with ls. Afterwards you added some meta data to Mercurial that tells it where it can expect to find the subrepository. This has nothing to do with a symbolic link in the normal filesystem sense, it is just some meta data for Mercurial.