Find likely base revision in a mercurial repository - mercurial

I received some code as a tar archive (without the .hg directory). I know which repository this code is based on, but not which revision was used as a base for these modifications. Is there some way to find this out by just looking at the files? This is similar to Given a file, how to find out which revision in a mercurial repository this is? but I cannot reach the author of the code, so I cannot control how the files are extracted from the repository. I am also dealing with modified files here so the diff to the base revision would not be empty.
My fall-back plan would be to loop through all revisions and using the one with the smallest diff, but I'm still hoping there is a better solution.

There's no automated way to do it, but you could possibly reduce the time by using a hg bisect --command diff ... command to zero in on it.
As a tip if you (I know it probably wasn't) you ever has to give someone a snapshot again, use hg archive to make it. It includes a .hg_archive.txt file with version info that'll help if you have to do this again.

Related

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

Finding Large Files in Mercurial Repository

Similar to this link but for mercurial. I'd like to find the files that are most contributing to the size of my mercurial repository.
I intend to use hg convert to create a new, smaller repository. I'm just not sure yet which files are contributing to the repository size. They could be files that have already been deleted.
What is a good way to find these anywhere in the repository history? There are over 20,000 commits. I'm thinking a powershell script, but I'm not sure what the best way to go about this is.
Check hg help fileset. Something like
hg files "set:size('>1M')"
should do the trick for you. You might need to operate over all revisions, though as it only operates on one revision. In bash I'd try something like
for i in `hg log -r"all()" "set:size('>400k')" --template="{rev}\n"`; do hg files -r$i "set:size('>400k')"; done | sort | uniq
might do the trick. Maybe it can be optimized as it's currently a bit duplication and might run for quite a bit; on the OpenTTD repository with 22000 commits it took on my laptop just short of 10 minutes.
(Also check hg help on templates, files and grep)

Mercurial - Look into history of file without updating

I'm working in this file and I come across a piece of code which I think has changed at some point in history, and I would like to know where it changed.
It's a pretty big file with a lot of history, so when I use hg diff, I get a enormous list and I don't think it's efficient to search through that.
It would be really neat if I can look into an old revision of the file, to see what the file looked like at a certain point in time. Then I can see how the code worked back then so I can conclude how the bug evolved. Of course, I want to do this without updating the file, because I'm currently working in it and have made changes in it.
So, is there any way you can look into the history of a file without updating it?
There are a few tools to help you:
To get the history of a file you can just use hg log FILE which is probably the best starting point.
You can also use hg annotate FILE which lists every line in the file and says which revision changed it to be like it currently is. It can also take a revision using the --rev REV command tail to look at older versions of the file.
To just list the contents of a file at a given revision you can use hg cat FILE --rev REV.
If it proves too hard to track down the bug using those tools, you can just clone your repository somewhere else and use hg bisect to track it down.
hg bisect lets you find the changeset that intoduced a problem. To start the search run the hg bisect --reset command. It is well document in Mercurial: The Definitive Guide.

using hg revert to revert a group of files in Mercurial

I'm using Mercurial to read and debug a complex project, and my modify of the project can be divided into different group of files clearly. For example, if I modified four files
src1.cc src1.hh src2.cc src2.hh
It's apparent that I can divide them into two file groups such as group src1 includes src1.cc src1.hh and group src2 includes src2.cc src2.hh.
I'm wondering if I can revert a group of files by a simple command like 'hg revert group-name-alias' instead of listing all the filename of the group, which is a awful idea if I have modified many files?
Any help really appreciated!
From what I can understand of your use-case, you can:
Use patterns in the hg revert command. This means that you can
run hg revert src1* to revert all the first group.
Most probably, though, your stuff is in sub-folders and thankfully
you can specify a parent folder to the revert command.
So say your files are really like: foo/src1.cc, foo/src1.hh,
bar/src2.cc, bar/src2.hh. In that case, you can revert all the
second group with hg revert bar, assuming you're in the top folder.
If you're already in the bar folder, you can run hg revert ..
You can specify several patterns.
Use Mercurial queues if each one of your "file groups" is also
a different unit of work (a different bug fix or feature). This is not
so desirable if all files belong to the same unit of work, though.
No. To the best of my knowledge, Mercurial has no mechanism for grouping files.
You could do some trickery with aliases ([alias] revert-group-name = revert src2.cc src2.hh in ~/.hgrc), but aliases can only be prefixes, and can't perform variable expansions.
If your files are simple enough, you could use shell globbing (hg revert src2*), or a shell variable (GROUP_NAME="src2.cc src2.hh", then hg revert $GROUP_NAME).
You could also consider writing a small Mercurial extension. If you know Python, they don't take very long (my first took me about 30 minutes).
If the filenames meet patterns, you can use that pattern:
hg revert src1*
or
hg revert src1*.*
If those files are in a specific directory, you can do this:
hg revert dir\*
If the directory is more than one level deep and you want to get that directory and all its subdirectories, you can use this version of that commend:
hg revert dir\**\*

HG: Update a directory to a specific revision without cloning the whole repo?

Is it possible to update a directory to a specific revision without cloning the whole repository (local or on a central server) in Mercurial and how can I use it? This would be great, because to clone the whole repo first takes to much time for me and the folder really don't needs the whole repo. As example: default and the b2.3 branch from which I want to update.
Thanks in advance! :)
You can pull a specific branch, say b2.3 by using hg clone -r b2.3 source-repo target-repo.
If you really need just a non-versioned copy of all the files in revision N, then for some web repositories you may download such a copy using their web interface.
clone is the preferred way to do it in Mercurial. It should take a minimal amount of time when done locally. I'm unaware of any other way to do it.
Search for "hard links" on this tutorial page for more info on the subject.