Mercurial repository identification - mercurial

I need to be able to uniquely identify a Mercurial repository and have that identifier placed in a file that is included when cloned. If I can put the identifier in a file in the .hg folder that is preferable to simply adding a normal file to the repo.
I understand that I can get a near certain identifier from the first changes that are committed. I know that the hgrc file cannot be used to store the identifier, because it is not cloned.
So, my question is: Is there another file in the .hg folder that is cloned that I can use to put the identifier? Thanks.

From first read, it sounds like you want to be able to make sure that a clone of the repository is a clone of the correct repository and not some stand-in impostor. However, if the identification information you're thinking of using is cloned with everything else, then an impostor would still pass this test. You'd need to keep that identifier separate so that it can be compared against information in the clone.
Whether that is your purpose or not, any file in .hg that is cloned you may not want to edit. You'd have to add a file to be tracked in the other areas of the repo, outside of .hg. However, you don't really need an extra file at all, as the changeset hash is not just near certain, but very certain, so the information for handily identifying a repository is built-in to the repository itself.
On the commandline, you can get either the short or full versions of the very first changeset's hash identifier:
> hg id -i -r0
89abf5502e3c
> hg log -r0 --template "{node}"
89abf5502e3c5c65e532db04d8d87141f0ac8b73
If I am correct about your desire to compare 2 identifiers so that you or someone else knows a clone of the repository is a true clone and not a false clone, you would have the same changset id available separately so that someone can use one of the above commands to see the id of their clone and compare it to what you say it should be. This is much like how many websites with downloadable executable files show a hash identifier next to the download link so that you can hash the file yourself and compare the result to the hash on the website.
Edit regarding your comment that sheds light on the purpose of this:
Since you need to be able to read it from a file, there are a couple options:
Tracked file in repository root
There is one file you might consider, other than creating your own: .hgtags.
hg tag -r0 ident
...would tag the very first revision, allowing you to use ident as a reference to that changeset rather than -r0. Mercurial always uses tag information from the latest version of .hgtags, no matter what changeset the working directory is updated to, but that may not matter to your app. hg tag appends a line such as this to the .hgtags file, creating the file if it doesn't exist:
a247494248c4b96a571bbd12e90eade3bf559281 ident
This is most handy if you don't have a tags files yet in your repos, because it will be the first line in the file for easy finding. You might think could simply write this file yourself, but then you'd still have to call hg to get the changeset id and again at some point for adding it to tracking and then committing: hg tag does all that for you.
If there is already the possibility of a tags file to consider, that's ok, too, because they tend to be relatively short and you just need to look for the 1 line that ends with your chosen tag name. Mercurial is designed for append-only operations to .hgtags, but everything would still work fine if you inserted the line for this tag as the very first line if .hgtags already exists because: 1. The tag will never be moved or removed. 2. You'll be using a tag name not already used in the file.
Reading hg's guts
There are files that normally only Mercurial itself touches deeper in .hg that can be read to get the first changeset's hash. I looked into Mercurial's File Formats, Revlog, and RevlogNG, and at least for 2 of my own repos, .hg\store\00changelog.i contains the first changeset's hash at offset 0x20 (20 byte length). Probably, at least since Mercurial 0.9, it will be the same in all repos. RevlogNG also notes the first 4 bytes of that file will indicate Revlog version number and flags. While the changeset id is only 20 bytes long currently, the actual field for it is 32 bytes long, probably for future expansion to a longer hash.
Since this option requires no alteration of existing repositories and only involves reading the first 52-64 bytes of the main index, it's the one I'd probably go with. If I was catching this requirement in the early stages of the product before any repos it manages were out in the wild, I would lean toward the custom file approach because I would probably have my own metadata file created and added from the beginning of the repo.

error: repository is unrelated message come from mercurial/treediscovery.py:
base = list(base)
if base == [nullid]:
if force:
repo.ui.warn(_("warning: repository is unrelated\n"))
else:
raise util.Abort(_("repository is unrelated"))
base variable store last common parts of two repositories. By giving this idea of push/pull checks we may assume that repositories are related if they have common roots, so check hashes from command:
$ hg log -r "roots(all())"
For unknown to me reason hg log -r 0 always shown same root, but you may have situation that FIRST_REPO hold SECOND_REPO history, but obviously 0 revs of SECOND_REPO different from FIRST_REPO but Mercurial check is passed.
You may not trick roots checking by carefully crafting repositories because building two repositories looks like these (with common parts but different roots):
0 <--- SHA-256-XXX <--- SHA-256-YYY <--- SHA-256-ZZZ
0 <--- SHA-256-YYY <--- SHA-256-ZZZ
impossible because that mean you reverse SHA-256 as each subsequent hash depends on previous values.

Related

In hg clone, what's the difference between "adding changesets", "adding manifests", and "adding file changes"?

From the Mercurial documentation:
The manifest is the file that describes the contents of the repository at a particular changeset ID
https://www.mercurial-scm.org/wiki/Manifest
When cloning a Mercurial repository, I see lines of output saying:
adding changesets
adding manifests
adding file changes
I don't understand the difference between these things. I thought I understood what a changeset is, but I don't know how it would be different from a set of "file changes". And based on the description above, a manifest sounds like the same thing. So what's the difference between all of these?
Mercurial divides the information you need to keep track of in a versioning system into several levels:
Changesets -- the metadata about each revision. Who (author), when (date and time), why (the summary text) and what (the affected filenames), etc. is stored here.
Manifests -- each manifest lists the file revisions for the files at a given revision. This is like a linking table in a database; the file contents are not contained, only what version of a given file is part of this revision.
The file changes -- These files store the actual file data. It is inefficient to store each version ever produced of a given file entirely formed. Instead, this stores file data in a delta compression form; changes between versions are stored, with the occasional full copy to aid faster restoring to a version.
All 3 levels need to be copied into your repository from the remote server when cloning.
See the Mercurial Wiki Design page for details.

Given a file, how to find out which revision in a mercurial repository this is?

Assume that there is a file under hg version control. I have a particular version of that file, and I would like to find out in which revision this file was in this version.
I suspect that there are two possible ways to do this.
Do hg update in a loop and diff the file against subsequent versions (sloooow, but should work).
Make Mercurial put the rev number in a, say, comment in the second line of the file right before committing. From what I have read, a precommit hook might be of use. Then I don't have to compare anything, just look at the file itself (I'm assuming no-one will change this, of course, but this is rather safe assumption in my case).
My use case is a joint paper, written in LaTeX, with two coauthors who have no idea about version control at all, but I prefer to use it (for obvious reasons). We communicate by email, and there's effectively a human-based lock system ("I will not work on this file until you send me the next version, ok?"). The only problem that arises is that I'm sending version X to author B to proofread, then author C sends me a corrected version Y and I commit it into my repo, then author B sends his corrections Z (to version X) and I'm starting to get lost-but I can check the attachment in the email sent to B, and I only need to find out which revision it is.
So, my question is: which of the two ideas above would be better, or maybe there's yet another one to help me deal with this mess?
hg archive is good method for future work, but I can suggest at least 3 alternative work-styles and 1 fix for find-correct-version with updates
Future work
You can use separate named branches for co-authors and default for merged results, send co-author always head from his branch, update his branch after getting corrections (you'll always know, that you sent) and merge branches to default
One branch, revision-of-coworker marked with bookmark, which you later move to next point
Mercurial keywords considered somehow as a "feature of last resort", but in your case it's obvious and usable solution: just add keyword with hash-id in file (defaul extension instead of hook - easier and more reliable)
Current state
For finding changeset with source of file, you can try to use bisect (example) and test in test-script, f.e, CRC of file (you have needed CRC of unversioned file, check versioned file across history)
If you're happy to rely on finding the emails you send the reviewers, why not just include the revision hashes in them along with the files?
You can get this for almost zero extra effort by generating your attachment using hg archive, which will create a file containing 1) your files for review, and 2) .hg_archival.txt, complete with revision hash.
Though I'd be surprised if there isn't a more elegant way, even if your collaborators are dead-set against using version control.

Can I reconstruct Perforce/Mercurial linkage after an aborted Perfarce clone?

I have hit a problem with the Perfarce extension that I can’t seem to get past. I initially cloned part of my P4 depot by:
hg clone --startrev 71555 p4://perforce:1666/greg_nt_main-hg lwnthg
I chose a start rev that was just a few changelists behind the current head revision – trying a full clone with no startrev didn’t work, but that’s a separate issue I’ll perhaps write about separately.
During the clone I got the following error:
"abort:untracked file in working directory differs from requested revision on 'MAIN/apps/Win32/BenchMark/Jamfile'”
However, on inspection of what had appeared on my disk, it looked like all files had in fact been successfully cloned. The file mentioned was identical to that in Peforce, and the lwnthg folder was empty before the clone process. An ‘hg status’ showed a lot of files that had not yet been committed – I guess because the clone aborted? So I committed them, and all looked to be great.
I made some edits to my files, committed them to the local repo without problems. I enjoyed the loveliness of Mercurial ;)
But when I came to push my changes back to Perforce I get the following error:
abort: no p4 changelist revision found
I verified I had a valid P4 login ticket, P4 was up, etc, and all OK.
So my guess is that Perfarce stores somewhere the changelist that it last synced to from P4, and the first abort happened before this info was written out. If I try a pull operation from P4, I also get the same error. Assuming my theory is correct, is there any way to reconstruct this information in the local Perfarce config?
NOTE 'Perfarce' is NOT a typo. It is the name of the Mercurial extension to link to Perforce. The question loses some of its meaning if you change it to 'Perforce'. Appreciate the help in trying to clear up the question, but always worth checking facts first :)
To answer my own question, the answer appears to be no.
I've done some more digging and been in contact with the original author - Frank Kingswood - and the solution is to ensure your depot imports without errors in the first place. Once that's done, Perfarce works an absolute treat.
The original abort of the import was down to my usage. After following various instructions found elsewhere - including Stack Overflow - I was trying to use hg clone's destination parameter to get the right repo name. But it looks like the success of the import is sensitive to an interaction between the Perforce client spec root and the destination folder given as this final argument to hg clone.
Basically, make sure these folders do not overlap.
Depending on the files in Perforce, it may work if you have an overlap, but you could be setting yourself up for a whole heap of trouble in the future.
The recommendation is to keep the folders separate. My problem was that I did not want to take the default folder name of the hg folder as the Perforce client spec name - which is what it does if you do not supply a destination folder. But, possibly due to a bug, if you do supply the the destination folder then it has to match the client spec root. Because of this behaviour, I had assumed that the folders actually had to be the same.
In Mercurial it is safe to rename the top level folder after the repo has been created. So if you don't want the name to be dictated by the name of the Perforce client spec, then you can just rename afterwards. That's the approach I took.
Hope this helps others trying to dip their toes in the Mercurial waters.
Update
Frank has updated the Perfarce extension to better trap this case. Get the latest from the Perfarce repository.

How can I ignore all directories except one using .hgignore?

I'm managing $HOME using Mercurial, to keep my dotfiles nice and tracked, or at least the ones that matter to me.
However, there's a profusion of files and directories in ~ that do not need to be tracked, and that set is ever-changing and ever-growing.
Historically, I've dealt with this by having this .hgignore:
syntax: glob
*
This keeps my status clean, as far as it goes, making only previously tracked files visible. However, I have some directories (in my case, scripts, .emacs.d) that I would like to see untracked files in; I almost always want to track new additions to those directories.
I know that I can run hg st -u scripts to identify untracked files, but I want a means whereby I can achieve the same function using plain ole hg status.
Is there a way to do this?
Try this in .hgignore instead:
syntax: regexp
^(?!(scripts|foo|bar)/)[^/]+/
^ matches start of path
(?!(scripts|foo|bar) uses negative lookahead to ignore all files except those in directories scripts, foo or bar
/) ensures that directories which have a tracked directory as a prefix are ignored
[^/]+/ then actually matches any directory (excluding those ruled out by the lookahead), so that files in ~ aren't ignored
Credit for the central idea in this solution (the negative lookahead) goes to Michael La Voie's answer to this question
This question has been asked here on SO quite a few times, and you'll get a lot of convoluted answers using zero-width negative look ahead assertions, an oft abused regex trick, but the better solutions are to either (a) just make the repo in that directory alone or (b) just add the files in that directory. For option (b) you'd just put .* in your .hgignore file to ignore everything, and then manually hg add the files you want tracked. In mercurial, unlike svn and cvs, you can override an ignore with an add.

How good is my method of embedding version numbers into my application using Mercurial hooks?

This is not quite a specifc question, and more me like for a criticism of my current approach.
I would like to include the program version number in the program I am developing. This is not a commercial product, but a research application so it is important to know which version generated the results.
My method works as follows:
There is a "pre-commit" hook in my .hg/hgrc file link to version_gen.sh
version_gen.sh consists solely of:
hg parent --template "r{rev}_{date|shortdate}" > version.num
In the makefile, the line version="%__VERSION__% in the main script is replaced with the content of the version.num file.
Are there better ways of doing this? The only real short coming I can see is that if you only commit a specfic file, version.num will be updated, but it won't be commited, and if I tried to add always committing that file, that would result in an infite loop (unless I created some temp file to indicate I was already in a commit, but that seems ugly...).
The problem
As you've identified, you've really created a Catch-22 situation here.
You can't really put meaningful information in the version.num file until the changes are committed and because you are storing version.num in the repository, you can't commit changes to the repository until you have populated the version.num file.
My solution
What I would suggest is:
Get rid of the "pre-commit" hook and hg forget the version.num file.
Add version.num to your .hgignore file.
Adjust version_gen.sh to consist of:
hg parent --template "r{node|short}_{date|shortdate}" > version.num
In the makefile, make sure version_gen.sh is run before version.num is used to set the version parameter.
My reasons
As #Ry4an suggests, getting the build system to insert revision information into the software at build time, using information from the Version Control System is a much better option. The only problem with this is if you try to compile the code from an hg archive of the repository, where the build system cannot extract the relevant information.
I would be inclined to discourage this however - in my own build system, the build failed if revision information couldn't be extracted.
Also, as #Kai Inkinen suggests, using the revision number is not portable. Rev 21 on one machine might be rev 22 on another. While this may not be a problem right now, it could be in the future, if you start colaborating with other people.
Finally, I explain my reasons for not liking the Keyword extension in a question of mine, which touches on similar issues to your own question:
I looked at Mercurials Keyword extension, since it seemed like the obvious solution. However the more I looked at it and read peoples opinions, the more that I came to the conclusion that it wasn't the right thing to do.
I also remember the problems that keyword substitution has caused me in projects at previous companies. ...
Also, I don't particularly want to have to enable Mercurial extensions to get the build to complete. I want the solution to be self contained, so that it isn't easy for the application to be accidentally compiled without the embedded version information just because an extension isn't enabled or the right helper software hasn't been installed.
Then in comments to an answer which suggested using the keyword extension anyway:
... I rejected using the keyword extension as it would be too easy to end up with the string "$Id$" being compiled into the executable. If keyword expansion was built into mercurial rather than an extension, and on by default, I might consider it, but as it stands it just wouldn't be reliable. – Mark Booth
A don't think that there can be a more reliable solution. What if someone accidentally damages .hg or builds not from a clone but from an archive? – Mr.Cat
#Mr.Cat - I don't think there can be a less reliable solution than the keywords extension. Anywhere you haven't explicitly enabled the extension (or someone has disabled it) then you get the literal string "$ID$" compiled into the object file without complaint. If mercurial or the repo is damaged (not sure which you meant) you need to fix that first anyway. As for hg archive, my original solution fails to compile if you try to build it from an archive! That is precisely what I want. I don't want any source to be compiled into our apps without it source being under revision control! – Mark Booth
What you are trying to do is called Keyword Expansion, which is not supported in Mercurial core.
You can integrate that expansion in make file, or (simpler) with the Keyword extension.
This extension allows the expansion of RCS/CVS-like and user defined keys in text files tracked by Mercurial.
Expansion takes place in the working directory or/and when creating a distribution using "hg archive"
That you use a pre-commit hook is what's concerning. You shouldn't be putting the rest of version_gen.sh into the source files thesemves, just into the build/release artifacts which you can do more accurately with an 'update' hook.
You don't want the Makefile to actually change in the repo with each commit, that just makes merges hell. You want to insert the version after checking out the files in advance of a build, which is is what an update hook does.
In distributed systems like Mercurial, the actual "version number" does not necessarily mean the same thing in every environment. Even if this is a single person project, and you are really careful with having only your central repo, you would still probably want to use the sha1-sum instead, since that is truly unique for the given repository state. The sha1 can be fetched through the template {node}
As a suggestion, I think that a better workflow would be to use tags instead, which btw are also local to your repository until you push them upstream. Don't write your number into a file, but instead tag your release code with a meaningful tag like
RELEASE_2
or
RELEASE_2010-04-01
or maybe script this and use the template to create the tag?
You can then add the tag to your non-versioned (in .hgignore) version.num file to be added into the build. This way you can give meaningful names to the releases and you tie the release to the unique identifier.