Mercurial precommit script to change a file - mercurial

Despite the decentralized nature of Mercurial, we have a centralized server that we all push to and that does nightly builds, packaging, etc...
Here's what we want to achieve: One of the files that is source controlled contains the major+minor version numbers which ideally would have to be increased with every commit. Since centralized numbering is not possible on developer's machines, we were thinking of a precommit script on the main server that would write a new minor version number to that file for each commit that is pushed. The question is/are:
since it's precommit, can this file change be part of the same commit?
if not, can precommit cause another commit and how do you prevent it from cascading/recursing?
how would one do that?
is there a better solution?

A "precommit" script is triggered only at commit time. By the time users are pushing to the "central" server they have already committed, and it's too late for a precommit hook to do anything at all. You can have changegroup and incoming hooks that are triggered to run on the "central" server when the developers push, but those can't modify the commits -- the commits are already committed/baked/done at that point, they can only react to them.
As a suggestion don't actually put the version string in the file -- having a file that changes with every commit just makes merging a pain. Instead do one or more of these:
have a CI server (like Jenkins) do builds on every push and use the Jenkins build number, which can be passed into your build script
use the Mercurial nodeid (hash) as part of your version string so you can always knew exactly what revision is in a build -- and don't put it in a file, just query for it in your build (or deploy) script
use a changegroup hook to automatically tag-on-push, which applies a pretty (possibly sequential) name to the commits (note, this pretty much doubles your number of commits since every tag is a commit)
Personally, I use something like this in my build script:
build.sh --version_string=$(hg log -r . --template '{latesttag}.{latesttagdistance}-{node|short}')
That gets me version strings that look like "1.0.3-5fd8ed67272e" which can be roughly read as "built from the changeset three commits since version 1.0 was tagged with nodeid 5fd8ed67272e", which is pretty darn good -- and it's never saved into a file it's either baked into the compile (for compiled languages) or written into a VERSION file when my deploy script uploads it to the server.

See this page in the Mercurial documentation for some comments and ideas about this issue. See also How to expand some version keywords in Mercurial? and the other SO answers referenced there.

Related

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

Is it possible to add custom field to mercurial log?

we're moving from Subversion to Mercurial now. In Subversion there was possibility to add custom column into log (e.g. bug id) and force user to fill this column on every commit.
Is it possible to implement such feature in Mercurial?
Yes it's possible.
But before you go and do that, why isn't it enough to require bug fix commit messages to uphold to a certain pattern?
i.e. util: rename the util.localpath that uses url to urllocalpath (issue2875) (taken from Mercurial's repo)
Then you can install a hook on your central repository that scans incoming commit messages, and does whatever is needed when that pattern is found.
Furthermore, why would you want to force this on every commit? Is this for a QA team that should only commit bug fixes? If that's the case, a pre-commit hook that greps the commit message for the pattern sounds appropriate.
If you still want the extra field: when Mercurial commits something, it is possible to pass it a dictionary of strings, which you can fill with anything. See the transplant extension on how you might do that. You would also need to wrap the commit command and add a new command line option to it.
But I strongly suggest you think twice before doing this, because aside from the time consuming work involved in coding, testing (and maintaining this between Mercurial releases), you would also need to ensure that it is deployed on every environment where Mercurial is used.

mercurial temporarily ignore versioned files

My question is essentially the same as here but applies to mercurial. I have a set of files that are under version control, and one save operation changes quite a lot of files. Some of the resulting changes are important for revision control, and some of the changes are just junk. I can "partition" off the junk into separate files. These junk files need to be part of a basic checkout in order for it to work, but their contents (and changes over time) aren't that important for revision control. Right now I just tell all our developers not to commit these files, but we all forget and it creates a lot of extra baggage in the repository. I don't really like the svn solution proposed because there are quite a lot of files and I want a simple clone to just work without all this extra manual work, so I was wondering if mercurial has a better alternative. It's kind of like hg shelve but not quite, and kind of like ignore, but not quite. Is there some hg extension that allows for this? Can git do it?
Mercurial doesn't support this. The correct way to do it is to commit thefile.sample and then have your developers (or better you deploy script) do a copy from thefile.sample to thefile if thefile doesn't exist. That way anyone can update the example file, but there's no risk of them committing their local changes (say their personal database connect string).
Aha! So TortoiseHG's repository and global settings have an Auto Exclude List where you can define a list of files that will be unchecked by default when the status, commit, and shelve dialogs open. So they still show up, but the user has to check them in order to actually do a commit. The setting is stored in hgrc, but it's under the [tortoisehg] heading so it's not supported by mercurial per se. Nevertheless, it fits my needs.
One solution to this is to use nested tree support (submodule in git), where the "junk" would be put in a different repository (to avoid cluttering the main repo), while enabling checking out the whole thing out in a consistent manner (right version of both repos in sync).
https://www.mercurial-scm.org/wiki/Subrepository?action=show&redirect=subrepos
In git, submodules are one solution to this issue - but they are not that great UI-wise. What I do instead is to keep two completely independent repositories, and using the subtree merge strategy when I need to update the main repo with the junk repo: http://progit.org/book/ch6-7.html

How good is my method of embedding version numbers into my application using Mercurial hooks?

This is not quite a specifc question, and more me like for a criticism of my current approach.
I would like to include the program version number in the program I am developing. This is not a commercial product, but a research application so it is important to know which version generated the results.
My method works as follows:
There is a "pre-commit" hook in my .hg/hgrc file link to version_gen.sh
version_gen.sh consists solely of:
hg parent --template "r{rev}_{date|shortdate}" > version.num
In the makefile, the line version="%__VERSION__% in the main script is replaced with the content of the version.num file.
Are there better ways of doing this? The only real short coming I can see is that if you only commit a specfic file, version.num will be updated, but it won't be commited, and if I tried to add always committing that file, that would result in an infite loop (unless I created some temp file to indicate I was already in a commit, but that seems ugly...).
The problem
As you've identified, you've really created a Catch-22 situation here.
You can't really put meaningful information in the version.num file until the changes are committed and because you are storing version.num in the repository, you can't commit changes to the repository until you have populated the version.num file.
My solution
What I would suggest is:
Get rid of the "pre-commit" hook and hg forget the version.num file.
Add version.num to your .hgignore file.
Adjust version_gen.sh to consist of:
hg parent --template "r{node|short}_{date|shortdate}" > version.num
In the makefile, make sure version_gen.sh is run before version.num is used to set the version parameter.
My reasons
As #Ry4an suggests, getting the build system to insert revision information into the software at build time, using information from the Version Control System is a much better option. The only problem with this is if you try to compile the code from an hg archive of the repository, where the build system cannot extract the relevant information.
I would be inclined to discourage this however - in my own build system, the build failed if revision information couldn't be extracted.
Also, as #Kai Inkinen suggests, using the revision number is not portable. Rev 21 on one machine might be rev 22 on another. While this may not be a problem right now, it could be in the future, if you start colaborating with other people.
Finally, I explain my reasons for not liking the Keyword extension in a question of mine, which touches on similar issues to your own question:
I looked at Mercurials Keyword extension, since it seemed like the obvious solution. However the more I looked at it and read peoples opinions, the more that I came to the conclusion that it wasn't the right thing to do.
I also remember the problems that keyword substitution has caused me in projects at previous companies. ...
Also, I don't particularly want to have to enable Mercurial extensions to get the build to complete. I want the solution to be self contained, so that it isn't easy for the application to be accidentally compiled without the embedded version information just because an extension isn't enabled or the right helper software hasn't been installed.
Then in comments to an answer which suggested using the keyword extension anyway:
... I rejected using the keyword extension as it would be too easy to end up with the string "$Id$" being compiled into the executable. If keyword expansion was built into mercurial rather than an extension, and on by default, I might consider it, but as it stands it just wouldn't be reliable. – Mark Booth
A don't think that there can be a more reliable solution. What if someone accidentally damages .hg or builds not from a clone but from an archive? – Mr.Cat
#Mr.Cat - I don't think there can be a less reliable solution than the keywords extension. Anywhere you haven't explicitly enabled the extension (or someone has disabled it) then you get the literal string "$ID$" compiled into the object file without complaint. If mercurial or the repo is damaged (not sure which you meant) you need to fix that first anyway. As for hg archive, my original solution fails to compile if you try to build it from an archive! That is precisely what I want. I don't want any source to be compiled into our apps without it source being under revision control! – Mark Booth
What you are trying to do is called Keyword Expansion, which is not supported in Mercurial core.
You can integrate that expansion in make file, or (simpler) with the Keyword extension.
This extension allows the expansion of RCS/CVS-like and user defined keys in text files tracked by Mercurial.
Expansion takes place in the working directory or/and when creating a distribution using "hg archive"
That you use a pre-commit hook is what's concerning. You shouldn't be putting the rest of version_gen.sh into the source files thesemves, just into the build/release artifacts which you can do more accurately with an 'update' hook.
You don't want the Makefile to actually change in the repo with each commit, that just makes merges hell. You want to insert the version after checking out the files in advance of a build, which is is what an update hook does.
In distributed systems like Mercurial, the actual "version number" does not necessarily mean the same thing in every environment. Even if this is a single person project, and you are really careful with having only your central repo, you would still probably want to use the sha1-sum instead, since that is truly unique for the given repository state. The sha1 can be fetched through the template {node}
As a suggestion, I think that a better workflow would be to use tags instead, which btw are also local to your repository until you push them upstream. Don't write your number into a file, but instead tag your release code with a meaningful tag like
RELEASE_2
or
RELEASE_2010-04-01
or maybe script this and use the template to create the tag?
You can then add the tag to your non-versioned (in .hgignore) version.num file to be added into the build. This way you can give meaningful names to the releases and you tie the release to the unique identifier.

suggestions for using PATH to executables with version control (Mercurial)

So I'm pretty new to version control but I'm trying to use Mercurial on my Mac to keep a large Python data analysis program organized. I typically clone my main repository, tweak the clone's code a bit, and run the code on my data. If the changes were successful I commit and eventually push the changes back to my main repository. I guess that's a pretty typical workflow under version control.
My problem is that my code is run on the command-line, with several command-line arguments that refer to data files in the current working directory (and I have many such directories I need to test the code in, and they're outside of version control). So before using Mercurial I just kept my code in one ~/bin directory which was part of my PATH environment variable. Now, with version control, I need to either (1) after each edit, copy my current clone's executables to the ~/bin directory before running the code on the command line, or (2) each time I clone my code, add my current clone's path to the PATH, or (3) specify the entire/path/to/my/programs on the command line each time I run the code. None of these are very convenient, and I'm left feeling like there must be an elegant solution that I just don't know. Maybe something involving Mercurial's hooks? I want my under-revision code to be runnable on the command line between commits, so this seemed to rule out hooks, but I don't know... Many thanks for any suggestions!
Ry4an's answer is good if you want to continue with the multiple-clones workflow. But it's also worth being aware that Mercurial's powerful enough to allow you most of the benefits of that workflow without ever leaving your single "main" repo. I.e. you can create branches (named or anonymous) for experimental features, easily "hg update" to whatever version of the code you want to test, even use the mq extension to prune branches that didn't work out.
What I do in such a case is set up a two deep chain of symlinks to my binary in my current clone. For example I'll have:
/usr/bin/myappname
which is a symlink to
/home/me/repos/CURRENT/bin/myappname
where /home/me/repos/CURRENT is a symlink to whatever my current working clone is, for example:
/home/me/repos/myproject-expirment
After setting up the initial /usr/bin/myappname symlink all I have to do is update the CURRENT symlink when I create a new clone on which I'm working.