Mercurial: Automatically tagging a build

Mercurial: Automatically tagging a build - mercurial

In a mercurial set up, I'd like to automatically tag certain builds based on continuous integration scripts. For example, a tag such as branchName-buildId whenever a build of a branch is deployed, or perhaps latest-stable whenever a build passes all integration tests.
However, I'm worried that the straightforward approach of simply calling hg tag will cause problems:
Some tags may be duplicate - i.e. latest-stable. I don't really care which build gets tagged in this situation, but I don't want any conflicts because a script can't resolve those.
Tags cause commits. However, this means that those commits need to be pushed and they need to be robust in the face of concurrent pushes by humans and other scripts. In particular, the automatic push can create additional heads, which is Not Good. But by the time the additional head is detected (at push) the local tag commit has already happened, and even though the new heads are likely trivially mergeable, sometimes tags cause conflicts.
How can I automatically let the CI server tag a build robustly? Here it's more important that the end result is consistent (i.e. that it doesn't mess up the CI server or the repo), and it's less important that tags are reliably applied in the face of duplicates or conflicts (which should be very unlikely anyhow).

I think you're right to be cautious. Robots aren't always the best citizens, and can often do silly things.
What you end up doing depends on what you see the tags being used for. For example, if you only see the CI system using them, then I'd suggest keeping them local. No pull/push/merge issues at all.
Some tags may be duplicate - i.e. latest-stable. I don't really care which build gets tagged in this situation, but I don't want any conflicts because a script can't resolve those.
If a tag is already defined, and you call hg tag again, it will fail unless you force it, but what this does is add a newer, later definition of the same tag, and the latest one wins. On one hand this is good, because the merge is simple, but think about the case when you do:
hg update -r latest-stable
hg update -r latest-stable
hg update -r latest-stable
hg update -r latest-stable
Each time you'll update to the version you'll get a version before the tag was made (as normal), and at that version latest-stable will point to the previous latest-stable. The result is that this sequence of commands will move you back through time.
Hence I'd say it's better either to have unique tags (i.e. stable-2013-02-18) or tag in two commits; One that removes the old tag, and one to add the new one.
hg update -r latest-stable # You're now at the commit that removed the tag.
hg update -r latest-stable # This one will error because tag doesn't exist
Tags cause commits. However, this means that those commits need to be pushed and they need to be robust in the face of concurrent pushes by humans and other scripts. In particular, the automatic push can create additional heads, which is Not Good. But by the time the additional head is detected (at push) the local tag commit has already happened, and even though the new heads are likely trivially mergeable, sometimes tags cause conflicts.
The CI robot should tag; pull; merge (if necessary); push. If the merge fails, don't push, raise an alarm. If the push fails (i.e. there's been more changesets in the time it took to merge), pull and merge again. I'd just make sure your script is very explicit about the revisions it's merging. This process should leave you with no extra heads.
I believe Mercurial treats the .hgtags file differently for merging because it knows about the content, so conflicts should be very rare. Also, tag commits are, in general, easy to merge because all that changes is .hgtags, so a merge from the CI head should never conflict. The only reason it could is because someone else is using the same tag names as the CI server, and if they are doing that then they need to have honey poured on their keyboard so they can do any more damage.
The situation I can see causing problems is if you're doing CI tagging on multiple heads with the same tag names. e.g. Development and release branches both have CI run on them, both have tests-clean tags assigned, but to different revisions, and are then merged later. Solution is, don't do that.
Hope some of that is helpful.

If you care about history of builds then consider creating a named branch just for the build process. In Mercurial all tags from all branches are visible in whole repository.
If you don't care about history bookmarks should do the trick. Build process can set bookmark latest-stable after tests are run and then execute hg push --bookmark latest-stable to push that bookmark to the server.
In either way take you have to take care that you don't run tests on revisions which child has already been tested. Mercurial revsets are very powerful query language and should help.

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?

Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.

Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.

Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Is cvs2hg still potentially producing corrupted repositories?

Trying to migrate a repository from cvs to hg, I found the tool cvs2hg, and it seems to do nicely he job (conversion goes fine, and I have all the tags and branches).
However, the hg documentation warns about "fixup commits" making the repository somewhat corrupted or at least dangerous.
Is this still a problem ? Maybe hg or cvs2hg have benefited from fixes since this warning was written.
If it is, potentially, how can I check if I am in such a dangerous situation, on the resulting hg repository ?

Fixup commits are good and necessary. And cvs2hg does much better job than hg convert.
But maybe first about the problem. In CVS repository you can play various dirty tricks with tags and branches. For example, you can manually fine-tune some tag tagging today's version of 3 files, yesterday's version of 4 others, and month-long version of yet another. In practice, I did it a lot of times to make "patch tags" (there is some old tag, I have various commits afterwards, there turns out to be a bug, I fix the bug, make fixup tag by old tag, moving it on 1-2 files).
In the result, you get tag which points to release which naver has existed or will exist at any point of repository history, if the history is taken for whole repo.
Similar tricks could be made with branches. Or branches can start from "ugly" tag.
Any kind of „natural” conversion of CVS to HG is dead lost on such cases. There is no place in the time-based history at which such tag or branch could be hooked. And hg convert just binds such tags at more-or-less random places, and branches at very ugly places.
Fixup commits simply are those missing revisions: artificial commits which are bound at appropriate place and introduce changes which put repository at state at which it should be at given tag. With those, we get both "artificial" tags, and branches, properly bound to proper code.
So if you:
commited a.c(1.1), b.c(1.1) and c.c(1.1)
commited a.c(1.2), b.c(1.2)
commited c.c(1.2)
artificially created tag blah_1.0 which points to a.c(1.1), b.c(1.1) and c.c(1.2)
commited a.c(1.3), b.c(1.3)
...
then hg convert based history will have 4 edit changesets (just like those above) and blah_1.0 bound at some ugly place with wrong content. At the same time, cvs2hg will create "fixup commit" which will artificially create changeset at which we really have a.c(1.1), b.c(1.1) and c.c(1.2), and tag there. In a history, such changeset is reasonably similar to transplanted/grafted/cherry-picked commit.

You should carefully check the resulting repository to make sure it represents your code history and doesn't contain any of these crappy fixup commits.
BTW, it might be worthwhile to check out the newer http://www.catb.org/esr/reposurgeon/ tool.

Close an unmerged wasteful branch in mercurial

I decide to start an experiment in a branch
[default] $ hg branch experiment
[experiment] $ [... some commits ...]
Aargh! does not work! I want to throw it away.
[experiment] $ hg commit -m "did not work; closing ..." --close-branch
[experiment] $ hg update default
To get the real tip back -
[default] $ [... some commits ...]
[default] $ hg push
Is this a correct workflow to destroy an experimental branch?

You've got two fine answers on how to undo your branch, but the bigger point is don't use named branches for temporary concepts. Named branches are for long lived entities like 'development' and 'stable'. For features, expiriments, etc. you want either clones, bookmarks, or anonymous branches. All three are contrasted with named branches in this excellent article by Steve Losh:
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/
You can see similar advice from the Mercurial project here:
https://www.mercurial-scm.org/wiki/StandardBranching

The Mercurial wiki covers all the options for Pruning Dead Branches. Briefly, these options include:
Closing the branch (as done in your original post)
Create a new clone that does not include the dead branch
Use a no-op merge
Use the strip command that is bundled with the mq extension

Closing a branch will leave it in the repository, and the closed branch will be pushed with other changesets next time you do a push.
If you don't want this to happen, and your branch is local, just strip it.
On the other hand, if you have already pushed the experimental branch, stripping it won't help, so you can either close it or do a dummy merge (or both).

In my opinion, you should just close the branch and forget about it.
In the long run, there's no harm in a "dead" branch being present in the repository. Any given branch is almost certainly tiny in comparison to the contents of your repository and any additional "noise" created by the additional changesets is going to fade into the past relatively quickly.
However, by not worrying about cleaning up the branch, you achieve two things:
You don't have to deal with any of the potential issues associated with altering history in a DVCS.
(More importantly) You have a permanent record of your attempt.
That second point is key -- you can actually make use of what you learned if the branch is still around: any fellow developers can learn from it; you can go back and try again if you learn something else; you can prevent trying the same thing again by seeing this branch in history.
A lot of developers have a hard time with keeping history that isn't "pristine" in their DVCS, especially when they recently came from a centralized VCS.* Over time, I've come to realize that there's nothing bad or wrong about that "other" history and in fact it can turn out to be remarkably useful if kept around.
*I'm not necessarily implying that you fall into either of these camps, just making an observation.

Close head without committing any change

I'm working on a project which is using mercurial and it's gotten into a bit of a mess with a number of heads which for all intensive purposes are dead.
I want to kill off these heads and bring the commit graph back to a single line.
I've been told there's a way to merge branches but at the same time ignore any file changes, so essentially just merging the tree, but I can't seem to work out the command set.
Is there a way to do this, kill off branches by doing merges and ignoring the file changes? Or alternatively is there a way to bring in the graph again without the changes (which are not massively irrelevant in the project)?

If you are using TortoiseHg and named branches, you can select the branch option in the commit dialog to close a branch and it will allow you to commit without having an actual file change.
It will still leave you with a head, but it will be marked inactive.

I think this is just what you're looking for:
Keep My or Their files when doing a merge
It'll create new merge changesets that close down the "other" head w/o taking in any of its changes. You won't end up with a linear history but you'll end up with a single head.
Other inferior answers include using hg strip or hg clone -r to eliminate the heads/anonymous-branches you don't want. They're inferior because (a) if other clones exist you can't strip it doesn't work at all and (b) they throw away history which is the opposite of good version control practice -- even work you don't think you want now may be valuable someday.

Branching with Mercurial SCM

So right now I'm learning Ruby on Rails, and I'm working through the book "Agile Web Development with Rails". I've also decided that I want to give Mercurial a go, because I've read up on distributed SCM's, and it seems like an ideal situation. I still, however, prefer to push my code remotely to my Linux VPS just incase my hard drive decides to take a dive.
So, my question is specific to branching in Mercurial. Right now I've got a remote repository set up and I can push changes over SSH easily (hell I even set up an Nginx FastCGI site that lets me push, too). What I'd like to do, however, is create branches for each chapter as I work on them, so I can keep a nice organized history of my progress through the book. So this is what I'm doing:
$ hg branch chapter-10
(do chapter 10 stuff)
$ hg commit -m "Chapter 10 complete"
$ hg update default
$ hg merge chapter-10
$ hg commit -m "Merging chapter 10 into default"
$ hg push
Once I execute the push statement, I get this message from Mercurial:
pushing to ssh://myserver/hg/depot
searching for changes
abort: push creates new remote branch 'chapter-10'!
(did you forget to merge? use push -f to force)
So at this point I try to do an hg merge again, and it tells me there's nothing to merge, which is obviously true because I just merged it. When I force the push with -f, everything seems fine, and even the web interface shows the appropriate branches.
To sum up, my question is simple: Am I doing this the right way? Is there a more appropriate way to do this with Mercurial (i.e. the "Mercurial way")? Honestly I just want the repository to serve as a backup. I'm a fan of the distributed SCM model, but to me it feels sorta "dirty" to force pushes. Any insight is greatly appreciated! Thanks in advance.

The push -f is the right option for your case, and there was a discussion last month to add that command when this "push creates new remote branch" warning pops up: see issue 1513.
However, issue 1974 (this month) mentions some undesirable effects (not in your case though).
See this translated article to know more about creating a second head on a remote repo.
On the more general point, you can use branch if you are writing your chapter in parallel, and you want to merge them only at certain (stable) point in time
But if your writing process is more linear, you could use only one branch, and put some tags along the way.
However, should you go back to chapter 10 and add some lines, even though you already put tags 11 and 12, that would make the history harder to read. So branches are still a good idea in this case.

I don't know about your specific problem, but from your comments it seems that you use branches where you probably wanted to use tags.
Branches are generally used when multiple people cooperate on the same project and you want to create a work separation so one person can work on a stable piece of code, while the other does something experimental that temporarily breaks functionality. Alternatively branches are used to stabilize for release, while development is going on in trunk.
Tags (or labels) are used to primarily create a marker signifying some importance to the version of code. Like for example if you want to mark a completion of chapter 10, you just tag all current versions with a 'chapter-10' tag. There is no need to branch. You can branch from a tagged version at any point in future if it would be necessary for some reason.

In this case I feel that it's totally ok to use -f for the push. It just creates new branches, not heads. Creating remote heads is another matter entirely.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008