cvs to mercurial conversion gets tags wrong - mercurial

I've tried all the recommended conversion techniques
Mostly they manage to get the latest version of the files right, but every one of them trashes my history. Many (most?) of the tags from my cvs project have at least one file in error when I run "hg up $tag"
My cvs repo is not all that complicated. Why can't anything convert it?
I'd like to dump cvs and convert to mercurial, but not without history.
To recap my frustration:
I tried hg convert
(tried --branchsort,--timesort, fuzz=0)
I tried cvs2svn and then hg convert.
tailor does not work with recent versions of mercurial
fromcvs disappeared from the face of the earth
hg-cvs-import has been abandoned for 4 years and doesn't work with recent versions of hg
I have tried using the two most recent versions of mercurial ( 1.5 and 1.5.1 ).

Mark, it's a sub-optimal solution, but when a company I was with did a CVS->Mercurial migration we decided that all we cared about were tag snapshots, so we build a little for loop like:
for thetag in $(cat LIST_OF_RELEASE_TAGS) ; do
cvs update -r $thetag
hg commit --addremove -m "snapshot $thetag" -u "import"
hg tag $thetag
done
That assumed a linear chain of tags, but we only pulled in the main/production branch. A more sophisticated loop would call 'hg update' before each commit to get parentage that reflects CVS branching.
It's definitely not "full history" but it was enough to make us feel good about continuing in Mercurial without loosing our ability to say "What the hell was in version 1.1.11?!" and we could always go back to cvs is cvs blame level history was needed.

fromcvs is back. I'm testing it out on a very large repo of ours, and it's extremely fast and handles incremental conversion.

I found a solution of sorts. I'm not thrilled with it, but it will have to do for now.
I was able to detect the tags that were causing trouble and omit those tags from the conversion. Missing tags are much better than wrong tags (assuming the original cvs repo is kept for backup)
WARNING: The following assumes you have made a copy of CVSROOT and are working on that. Do not muck with your original.
This is a bash solution that works for me on my linux box. It will probably burn your house down and invite your grade school bully to move next door to you. You've been warned.
It uses cvsps to identify the problem tags, rcs to delete them and then removes the tags from the CVSROOT/history. After removing the cvsps cache, the hg conversion works as expected.
CVSROOT=/path/to/your/copy
MODULE=cvsmodule
rm -rf ~/.cvsps ~/.hg.cvsps # this cache is EVIL!
BADTAGS="`cvsps -q -x $MODULE |grep Tag: |grep -e FUNKY -e INVALID | awk '{print $2}' `"
while [ ! -z "$BADTAGS" ];do
cd $CVSROOT/$MODULE
for badtag in $BADTAGS;do
echo removing tag $badtag
grep -lr $badtag . | xargs --no-run-if-empty -l1 rcs -q -n$badtag
grep -v "$badtag|$MODULE" < $CVSROOT/CVSROOT/history > $CVSROOT/CVSROOT/history_
mv $CVSROOT/CVSROOT/history_ $CVSROOT/CVSROOT/history
done
BADTAGS="`cvsps -q -x $MODULE |grep Tag: |grep -e FUNKY -e INVALID | awk '{print $2}' `"
done
rm -rf ~/.cvsps ~/.hg.cvsps # this cache is EVIL!
mkdir ~/hgcvt
cd ~/hgcvt
cvs co $MODULE
hg convert $MODULE

I realize now that there are certain fundamental incompatibilities between cvs tags and hg tags.
In cvs, a version of a file have tags associated with its different versions.
In hg, a version is an alias for a changeset . In other words the state of the working files at some snapshot in time
The distinction is subtle, but important.
It is possible to make a tagged release in cvs of a version that does not represent a snapshot in time. This is not possible in hg.
Of course one could apply patches to get replicas. However, this would create a lot of new heads on the repository with arguably little benefit (assuming the cvs repo is still kept around for posterity).
I'm afraid a perfect conversion from cvs to mercurial is not practical. Ry4an's solution would work for those who care only about recreating the versions. I am more interested in the history and evolution of the source files.
I wrote the following script to simply munge all the cvs tags in the $CVSROOT prior to the conversion. e.g tag "v321" becomes "v321_prehg". That way developers will know those tags are not-authoritative and they must go back to the read-only cvs tree.
#!/usr/bin/python
import os
import sys
import stat
def die(msg):
sys.stderr.write(msg)
sys.exit(1)
cvsroot =os.getenv("CVSROOT")
if cvsroot is None:
die("CVSROOT not defined" )
print "CVSROOT=%s" % cvsroot
for rcsfile in os.popen("find %s -name '*,v'" % cvsroot).xreadlines():
rcsfile = rcsfile.replace('\n','')
print "rcsfile:%s" % rcsfile
st=os.stat(rcsfile)
if st.st_mode & stat.S_IWUSR == 0:
os.chmod(rcsfile,st.st_mode | stat.S_IWUSR)
f = open(rcsfile,"r")
inlines=f.readlines()
f.close()
outlines=[]
insymbols=False
symbolsDone=False
for l in inlines:
if insymbols and not symbolsDone:
if l.find('\t') == 0:#tag line
l= l.replace(":","_prehg:",1)
else:
symbolsDone=True
else:
if l == "symbols\n":
insymbols=True
outlines.append(l)
f = open(rcsfile,"w")
f.writelines( outlines )
f.close()

Related

Mercurial find files that have not been modified since a revision

I am trying to find a list of source files that have not been modified for the past few years.
This is one aspect I am trying to measure to try to help us understand the amount of stability and change in a given project over time.
Is there any way in mercurial to identify the files that have not been modified since a given revision?
There is some ambiguity in the question, but it can probably be answered using the status (st) command. For example, to obtain a listing based on a comparison of the files at revision R with those in the pwd, you could run:
hg st --rev R -cn
The -c option is equivalent to "--clean" (meaning in effect "no change").
To compare the files at revision R with those in the most recent commit:
hg st --rev R:-1 -cn
There are many ways to specify "R", e.g. 0 for the initial commit.
Posting my own answer.
I cloned the repository twice into new directories.
Then updated one to the current version and one to the original baseline revsion
hg update <rev>
Then used the diff command to find files that were identical (excluding whitespace changes)
diff -sqrbwB original current | grep "identical"
The diff flags are as follows:
-s reports identical files (facilitating the grep for "identical")
-q brief report (don't need a detailed report of differences)
-r recursively follow directories
-b ignore space changes
-w ignore all space
-B ignore Blank lines
Not sure if -b -w and -B are all necessary but it worked and output a list of files that have not chaged.

Mercurial, check if a particular version of a file has existed in the history before?

To somewhat elaborate on the title question, is there a way to have Mercurial search through the repository history for a particular version of a given file, and show all of the revisions (or just the most recent one) that contain that version?
For example, let's say that the current working revision is, say, 300 and a file was reverted to an earlier version (say, revision 200). I don't know this - all I can easily see is "how different" the new file is from the 300 version. How can I find out all of the possible revisions that it could have been reverted to?
And if Mercurial cannot do this natively, is there another tool that can? (TortoiseHG?)
I don't think this is possible with any of hg's built-in functionality, but you can get something like it with judicious xargs and md5sum application:
hg log --template "{rev}\\n" | xargs -I "{}" /bin/bash -c 'echo $(hg cat -r {} <filename> | md5sum) {}'
This will give you a list of md5 checksums with the revision number, which when you sort it will give you the revisions sharing versions of the file.

Is there an equivalent to git's "describe" function for Mercurial?

I'm currently adding packaging to a something that is maintained in Mercurial. Currently the version is defined in the Makefile. I would like to change this so I can build daily packages or properly versioned packages.
Git provides a use "describe" function that can give you a description of the closest tagged build and current revision. For example if I run this in the kernel:
git describe HEAD
GIT returns:
v3.0-rc7-68-g51414d4
telling me that the revision is later than v3.0-rc7, with a git commitish of 51414d4
Is there something similar I can do in Mercurial?
Maybe something like this?
hg log -r . --template '{latesttag}-{latesttagdistance}-{node|short}\n'
Of course you should make an alias for that with AliasExtension.
Note however, unlike "git describe", this command will always show the "latesttagdistance" and "node|short" parts, instead of omitting them when latesttagdistance is 0.
This is a close emulation of git describe:
hg log -r . -T "{latesttag}{sub('^-0-.*', '', '-{latesttagdistance}-m{node|short}')}"
The {sub(...)} function ensures that a working copy that's exactly at tag v0.1.0 will show up as v0.1.0 and not v0.1.0-0-m123456789abc.
Note that the m before the hash is for mercurial, similar to the way git describe uses a g for git.
For convenience, create an alias by adding the following to your ~/.hgrc:
[alias]
describe = log -r . -T "{latesttag}{sub('^-0-.*', '', '-{latesttagdistance}-m{node|short}')}"
Then use the alias by simply typing hg describe.
If you'd like to emulate git describe --dirty, things get even messier – but you can still hide it all in an hg alias:
[alias]
describe = !
dirtymark=;
case " $1 " in " --dirty ") dirtymark=-dirty; ;; esac;
echo $($HG log -r . --template "{latesttag}-{latesttagdistance}-m")$($HG id -i) |
sed -r -e "s/\+\$/${dirtymark}/" -e 's/-0-m[[:xdigit:]]+//'
Now running hg describe --dirty will produce strings like:
v0.1.0
v0.1.0-dirty
v0.1.0-1-mf6caaa650816
v0.1.0-1-mf6caaa650816-dirty
Omitting the --dirty option means that you'll never get a -dirty suffix like (2) and (4), even when the working copy contains uncommitted changes.

Using mercurial to manage linux kernel, tags, and hg id

Anybody using mercurial to manage a linux kernel? This is a bit long, but I'm not sure if there's an answer to this. I wanted to give some examples for help
Here's what I'm seeing:
Within the linux kernel, there's a command used when building called scripts/setlocalversion. Inside this script, it sets the kernel version based on the repository information. Currently, it understands git, mercurial, and svn repo's.
For git, it creates the tag with this code:
if head=`git rev-parse --verify --short HEAD 2>/dev/null`; then
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore it,
# because this version is defined in the top level Makefile.
if [ -z "`git describe --exact-match 2>/dev/null`" ]; then
# If we are past a tagged commit (like "v2.6.30-rc5-302-g72357d5"),
# we pretty print it.
if atag="`git describe 2>/dev/null`"; then
echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}'
.....
So, here's an example I did to understand how this works:
[1536][mcrowe:test]$ git tag -a -m"Creating a tag" KernelTest
[1537][mcrowe:test]$ git rev-parse --verify --short HEAD
d024e76
[1537][mcrowe:test]$ git describe --exact-match
KernelTest
[1537][mcrowe:test]$ git describe
KernelTest
So, in this example, the local version would be set to "KernelTest" when the kernel builds.
In mercurial, however, the code to get the local version is this:
if hgid=`hg id 2>/dev/null`; then
tag=`printf '%s' "$hgid" | cut -d' ' -f2`
# Do we have an untagged version?
if [ -z "$tag" -o "$tag" = tip ]; then
id=`printf '%s' "$hgid" | sed 's/[+ ].*//'`
printf '%s%s' -hg "$id"
fi
....
My expectation was that I could tag a release, and have that tag be what this script uses, as happens in git. However, it appears that "hg id" never prints out the tag like this script expects:
[1546][mcrowe:test2]$ hg tag -m"Creating a tag" KernelTest -r tip
[1548][mcrowe:test2]$ hg id
3ccda5e738ae+ tip
[1548][mcrowe:test2]$ hg tags
tip 115:3ccda5e738ae
KernelTest 114:be25df80ce76
So that act of tagging changes the revision so hg id will never show what the tag is.
Core Question: AFAIK, this would never work for the linux kernel. The question is how should this be implemented in the kernel tree to allow hg tag to perform like git tag?
Try doing:
hg log -r . --template '{latesttag}'
I think that does what you want.
There's also {latesttagdistance} which can let you know you're N commits past a tag for handy version string.

Can Mercurial do a reverse-patch?

Scenario: I've "inherited" a program, kept under Mercurial, that only works on my system with specific tweaks to certain files that are checked in. I do not want to check these tweaks in.
My most recent solution to this is to create a mercurial patch file (hg diff > patchfile) containing these tweaks; when I need to check in my changes, I'll just reverse-apply the patch, commit, and re-apply the patch. (If I had full control of the source, I'd just move all these little tweaks to a single configuration file that isn't under version control, putting a "sample" config file under version control)
Unfortunately, it seems that while the GNU patch command supports the --reverse flag, it does not support hg's multi-file diff format as a single patch file (or maybe it does, and I just don't know the switches for it?). OTOH, hg has its own patch command that can apply the diff, but that doesn't support any kind of reverse flag.
So my question is twofold:
How should this be done in mercurial? Surely hanging on to a "tweak patch" isn't the only way to handle this situation. Maybe mercurial has a plugin or something built in for such temporary, uncommittable changes.
Aside from how things should be done, is there any way to reverse-apply such a mercurial diff-patch to a mercurial repo like this? There are other situations where such functionality would be useful.
Mercurial's patch command (really import) doesn't support reverse, but hg diff does. Use --reverse on that and you'll have a reversed patch that Mercurial can apply.
However, what you're describing is a very common vendor-branch style workflow, which mercurial can better support using features other than diff and patch.
Specfically, Mercurial Queues does exactly what you want.
I found --reverse approach did not work when you have sub repos. i.e.
hg diff --reverse -S
. In case it helps anyone, this barely tested script seems to do the job:
#!/bin/bash
DIRS="$*"
if [[ $DIRS = "" ]]; then
DIRS=$(echo *)
fi
for arg in $DIRS; do
arg=$(echo $arg | sed 's/^\.\/*//g')
repo=$(echo $arg | cut -d'/' -f-1)
grep -q "^$repo = " .hgsub 2>/dev/null
if [ $? -eq 0 ]; then
if [ -d $repo ]; then
cd $repo
hg diff --reverse | sed -e "s;--- a;--- a/$repo;g" -e "s;+++ b;--- b/$repo;g"
cd ..
else
echo Error, unknown repo $repo
fi
else
hg diff $arg --reverse
fi
done