Why does Mercurial's strip command not rewrite history?

Why does Mercurial's strip command not rewrite history? - mercurial

The Mercurial help text says, that the "strip command removes the specified changesets and all their descendants." This sounds very much like rewriting history to me, and that it must cause problems if somebody has based his work on one of the changesets that suddenly is removed. But the help text also says that the command "is not a history-rewriting operation and can be used on changesets in the public phase." I am sure that the person who wrote the help text knew very well what he was doing, so what am I missing to understand this?

The key point is that if you strip a public changeset, and then pull it again from somewhere, you haven't caused any issues. You just get the original changeset back.
If you (for example) collapse two public changesets together, and then pull the original from somewhere, you now have two branches. One with the original two changesets, and one with the collapsed changeset, but both have the same changes. At that point hell breaks loose and child eating monsters roam the earth.
Hence 'history re-writing' isn't the same as 'history stripping'.
davidmc24 pointed out this post by Matt Mackall (Mercurial's father) in which he says basically the same thing

I can't say it with certainty, but my guess is that it's "grandfathered in". hg strip started out as part of mq which predates the addition of phases by at least three years.
Likely better phrasing would be:
is not *considered *a history-rewriting operation and can be used on changesets in the public phase
When phases were added a huge amount of care was taken to break no one's existing workflow. Commits start out in the draft phase and become public once pushed. Any phase-aware commands knows that after pushing the commit's phase is public and not to allow modification if it (unless the push was to a non-publishing repository...).
However, there were people already using strip manually and in scripts to remove changesets that had been pushed, and if strip had after an upgrade suddenly said "Hey, you can't strip that it's public!" then those people would have had their backward-compatibility promise broken.
Phases is slowly growing into a pretty amazing evolve system that will be a much better choice than mq for almost all cases, but I still doubt we'll ever get Matt to remove mq and strip -- he still insists on maintaining a Python 2.4 compatible codebase and that's 9 years old!
Tl;Dr: Even though strip was always a disabled extension, too many people use it to change it's behavior w/ the advent of phases.

Related

does mercurial have problems with files being added and removed

I had some code changes that I accidentally pushed to the team repo and as such I created extra heads.
Usually I keep those changes in my own repo but I just forgot to refer to the changeset to push and ended up with extra heads.
To solve this problem I used the tip in How to merge to get rid of head with Mercurial command line, like I can do with TortoiseHg?
Works great to me, in fact I am quite happy with the fact that this way my changes are not lost at all and that team members don't have to strip any files in their repo. As matter of fact even though this head is not used, I might want to refer to it in the future and do take some of those changes.
Now after I did this change I got a reaction from a colleque who is our mercurial expert as follows : [begin quote]
The main issue now will be that everyone also in MS will have have these changesets, will see merge ‘arrows’, will see this in history despite that it is not included. Also the ‘merge’ commit does not mention at all that this is a dummy merge. So in case you add other heads that later on have to be dummy merged, please clearly comment this in the next merge commits to clarify this is the case.
Also the adding and removing and changing files, can become a problem now. I am not sure if this will be the case now due to the dummy merge, it might optimize that these files were not impacted, this will have to be tested.
[end quote]
my reaction is that a dummy merge is easily seen, if you select it in TortoiseHg, you just don't see any changes. A commit message 'dummy merge' is probably a good idea. So that point is taken.
The fact that my changes are seen by everybody, is hardly a problem for me, on the contrary this way they are kept and are a future reference. I don't see the problem here.
But the last bullet in bold about the adding removing and changing files ? Can this really be a problem in mercurial? Is this not the essence of using a versioning system in the first place ? What should I test, like he seems to suggest ?
Is his comment correct ?

Your friend's fears are unfounded. He or she is right that a good detailed comment is always appreciated, but there's no pain coming in the future.
When the Evolve mechanism graduates from experimental extension to core functionality it will offer an even better solution. You'll strip the changeset, but it won't actually be removed (as strip does now). Instead it'll be marked obsolete and folks pulling will get it and the "obsolete" marker saying not to show it in normal usage.

Is cvs2hg still potentially producing corrupted repositories?

Trying to migrate a repository from cvs to hg, I found the tool cvs2hg, and it seems to do nicely he job (conversion goes fine, and I have all the tags and branches).
However, the hg documentation warns about "fixup commits" making the repository somewhat corrupted or at least dangerous.
Is this still a problem ? Maybe hg or cvs2hg have benefited from fixes since this warning was written.
If it is, potentially, how can I check if I am in such a dangerous situation, on the resulting hg repository ?

Fixup commits are good and necessary. And cvs2hg does much better job than hg convert.
But maybe first about the problem. In CVS repository you can play various dirty tricks with tags and branches. For example, you can manually fine-tune some tag tagging today's version of 3 files, yesterday's version of 4 others, and month-long version of yet another. In practice, I did it a lot of times to make "patch tags" (there is some old tag, I have various commits afterwards, there turns out to be a bug, I fix the bug, make fixup tag by old tag, moving it on 1-2 files).
In the result, you get tag which points to release which naver has existed or will exist at any point of repository history, if the history is taken for whole repo.
Similar tricks could be made with branches. Or branches can start from "ugly" tag.
Any kind of „natural” conversion of CVS to HG is dead lost on such cases. There is no place in the time-based history at which such tag or branch could be hooked. And hg convert just binds such tags at more-or-less random places, and branches at very ugly places.
Fixup commits simply are those missing revisions: artificial commits which are bound at appropriate place and introduce changes which put repository at state at which it should be at given tag. With those, we get both "artificial" tags, and branches, properly bound to proper code.
So if you:
commited a.c(1.1), b.c(1.1) and c.c(1.1)
commited a.c(1.2), b.c(1.2)
commited c.c(1.2)
artificially created tag blah_1.0 which points to a.c(1.1), b.c(1.1) and c.c(1.2)
commited a.c(1.3), b.c(1.3)
...
then hg convert based history will have 4 edit changesets (just like those above) and blah_1.0 bound at some ugly place with wrong content. At the same time, cvs2hg will create "fixup commit" which will artificially create changeset at which we really have a.c(1.1), b.c(1.1) and c.c(1.2), and tag there. In a history, such changeset is reasonably similar to transplanted/grafted/cherry-picked commit.

You should carefully check the resulting repository to make sure it represents your code history and doesn't contain any of these crappy fixup commits.
BTW, it might be worthwhile to check out the newer http://www.catb.org/esr/reposurgeon/ tool.

How to manage concurrent development with mercurial?

This is a best practice question, and I expect the answer to be "it depends". I just hope to learn more real world scenarios and workflows.
First of all, I'm talking about different changes for the same project, so no subrepo please.
Let's say you have your code base in an hg repository. You start to work on a complicated new feature A, then a complicated bug B is reported by your trusted tester (you have testers, right?).
It's trivial if (the fix for) B depends on A. You simlply ci A then ci B.
My question is what to do when they are independent (or at least it seems now).
I can think of the following ways:
Use a separate clone for B.
Use anonymous or named branches, or bookmarks, in the same repository.
Use MQ (with B patch on top of A).
Use branched MQ (I'll explain later).
Use multiple MQ (since 1.6)
1 and 2 are covered by an excellent blog by #Steve Losh linked from a slightly related question.
The one huge advantage of 1 over the other choices is that it doesn't require any rebuild when you switch from working on one thing to the other, because the files are physically separated and independent. So it's really the only choice if, for example, A and/or B touches a header file that defines a tri-state boolean and is included by thousands of C files (don't tell me you haven't seen such a legacy code base).
3 is probably the easiest (in terms of setup and overhead), and you can flip the order of A and B if B is a small and/or urgent fix. However it can get tricky if A and B touches the same file(s). It's easy to fix patch hunks that failed to apply if A and B changes are orthogonal within the same file(s), but conceptually it's still a bit risky.
4 can make you dizzy but it's the most powerful and flexible and scalable way. I default hg qinit with -c since I want to mark work-in-progress patches and push/pull them, but it does take a conceptual leap to realize that you can branch in MQ repo too. Here are the steps (mq = hg --mq):
hg qnew bugA; make changes for A; hg qref
mq branch branchA; hg qci
hg qpop; mq up -rtip^
hg qnew bugB; make changes for B; hg qref
mq branch branchB; hg qci
To work on A again: hg qpop; mq up branchA; hg qpush
It seems crazy to take so many steps, and whenever you need to switch work you must hg qci; hg qpop; mq up <branch>; hg qpush. But consider this: you have several named release branches in the same repository, and you need to work on several projects and bug fixes at the same time for all of them (you'd better get guaranteed bonus for this kind of work). You'd get lost very soon with the other approaches.
Now my fellow hg lovers, are there other/better alternatives?
(UPDATE) qqueue almost makes #4 obsolete. See Steve Losh's elegant description here.

I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code. Whether to have one clone or two sitting on your disk is generally an easy one, given my working style, at least:
Does your project lack a build process, so that you can test and run things right from the source code? Then I will be tempted to have just one clone, and hg up back and forth when I need to work on another branch.
But if you have a buildout, virtualenv, or other structure that gets built, and that might diverge between the two branches, then doing an hg up then waiting for the build process to re-run can be a big pain, especially if things like setting up a sample database are involved. In that case I would definitely use two clones, one sitting at the tip of trunk, and one sitting at the tip of the emergency feature branch.

It seems like there's no more or better choices than the ones I listed in the question. So here they are again.
Use one clone per project.
Pros: total separation, thus no rebuild when switching projects.
Cons: toolchain needs to switch between two clones.
Use anonymous or named branches, or bookmarks, in the same repository.
Pros: standard hg (or any DVCS) practice; clean and clear.
Cons: must commit before switching and rebuild after.
Use MQ with one patch (or multiple consecutive patches) per project.
Pros: simple and easy.
Cons: must qrefresh before switching and rebuild after; tricky and risky if projects are not orthogonal.
Use one MQ branch (or qqueue in 1.6+) per project.
Pros: ultra flexible and scalable (for the number of concurrent projects)
Cons: must qrefresh and qcommit before switching and rebuild after; feels complicated.
Like always, there's no silver bullet, so pick and choose the one right for the job.
(UPDATE) For anyone who's in love with MQ, using MQ on top of regular branches (#2 + #3) is probably the most common and preferable practice.
If you have two concurrent projects with baseline on two branches (for example next release and current release), it's trivial to hop between them like this:
hg qnew; {coding}; hg qrefresh; {repeat}
hg qfinish -a
hg update -r <branch/bookmark/rev>
hg qimport -r <rev>; {repeat}
For the last step, qimport should add a -a option to import a line of changesets at once. I hope Meister Geisler notices this :)

So the question is, at the point when you are told to stop working on feature A, and begin independent feature B, what alternative options are there, for: How to manage concurrent development with mercurial?
Let's look at the problem with concurrency removed, the same way you write threaded code- define a simple work flow for solving any problem given to you, and apply it to each problem. Mercurial will join the work, once it's done. So, programmer A will work on feature A. Programmer B will work on feature B. Both just happen to be you. (If only we had multi-core brains:)
I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code.
I agree with Brandon's sentiment, but I wonder if he overlooked that feature A has not been tested? In the worst case, the code compiles and passes unit tests, but some methods implement the previous requirements, and some methods implement the new ones. A diff against the previous check-in is the tool I would use to help me get back on track with feature A.
Is your code for feature A at a point when you would normally check it in? Switching from feature A to working on feature B is not a reason to commit code to the head or to a branch. Only check in code that compiles and passes your tests. My reason is, if programmer C needs to begin feature C, a fresh checkout of this branch is no longer the best place to start. Keeping your branch heads healthy, means you can respond quickly, with more reliable bug fixes.
The goal is to have your (tested and verified) code running, so you want all your code to end up merged into the head (of your development and legacy branches). My point seems to be, I've seen branching used inefficiently: code becomes stale and then not used, the merge becomes harder than the original problem.
Only your option 1 makes sense to me. In general:
You should think your code works, before someone else sees it.
Favor the head over a branch.
Branch and check-in if someone else is picking up the problem.
Branch if your automated system or testers need your code only.
Branch if you are part of a team, working on a problem. Consider it the head, see 1-4.
With the exception of config files, the build processes should be a checkout and a single build command. It should not be any more difficult to switch between clones, than for a new programmer to join the project. (I'll admit my project needs some work here.)

Mercurial: Concrete examples of issues using hg pull --rebase

I'm struggling to find the mercurial workflow that fits the way that we work.
I'm currently favouring a clone per feature but that is quite a change in mindset moving from Subversion. We'll also have issues with the current expense we have in setting up environments.
Using hg pull --rebase seems to give us more of a Subversion-like workflow but from reading around I'm wary of using it.
I think I understand the concepts and I can see that rewriting the history is not ideal but I can't seem to come up with any scenarios which I personally would consider unacceptable.
I'd like to know what are the 'worst' scenarios that hg pull --rebase could create either theoretical or from experience. I'd like concrete examples rather than views on whether you 'should' rewrite history. Not that I'm against people having opinions, just that there already seem to be a lot of them expressed on the internet without many examples to back them up ;)

The first thing new Mercurial converts need to learn is to get comfortable committing incomplete code. Subversion taught us that you shouldn't commit broken code. Now it's time to unlearn that habit. Committing frequently gives you a lot more flexibility in your workflow.
The main problem I see with hg pull --rebase is the ability to break a merge without any way to undo. The DVCS model is based on the idea of tracking history explicitly, and rebasing subverts that idea by saying that all of my changes came after all of your changes, even though we were really working on them at the same time. And because I don't know what your changes are (because I was basing my code off of earlier changesets) it's harder for me to know that my code, on top of yours, won't break something. You also lose the branching capabilities by rebasing, which is really the whole idea behind DVCSs.
Our workflow (which we've built an entire Mercurial hosting system around) is based on keeping multiple clones, or branch repositories, as we call them. Each dev or small team has their own branch repository, which is just a clone of the "central" repository. All of my new features and large bug fixes go into my personal branch repo. I can get that code peer reviewed, and once it's deemed ready, I can merge it into the central repo.
This gives me a few nice benefits. First, I won't be breaking the build, as all of my changes are in their own repo until they're "ready". Second, I can make another branch repo if I need to do a separate feature, or if I have something longer-running, like for the next major version. And third, I can easily get a change into the central repo if there's a bug that needs to be fixed quickly.
That said, there are a couple different ways you can use this workflow. The most simple, and the one I started with, is just keeping separate clones. So I'll have website-central, website-tghw, etc. It works well, especially since you can push and pull between them locally. More recently, I've started keeping multiple heads in the same repo, using the remotebranches extension to help manage them and hg nudge to keep from pushing everything at once.
Of course, some people don't like this workflow as much, usually because their Mercurial server makes it hard to make server-side clones. In that case, you can also look at using named branches to help keep your features straight. Unfortunately, they're not quite as flexible as Git branches (which is why we prefer branch repos) but they work well once you understand how to close branches, and why you can't really get rid of them once you start one.
This is getting a bit long, so I'll wrap it up by encouraging you to embrace the superior branching and merging that Mercurial provides (over SVN). There is definitely a learning curve, but once you get the hang of it, it really does make things easier.

From the question comments, your root issue is that you have developers working on several features/bug fixes/issues at one time and having uncommitted work in their working directory along with some completed work that is ready to be pushed back to the central repository.
There's a really nice exchange that covers the issue well and leads on to a number of ways forward.
http://thread.gmane.org/gmane.comp.version-control.mercurial.general/19704
There are ways you can get around keeping your uncommitted changes, e.g. by having a separate clone to handle merges, but my advice would be to embrace the distributed way of working and commit as often as you like - if you really feel the need you can combine the last few local commits into a single changeset (using MQ, for example) before pushing.

Doing without partial commits the "Mercurial way"

Subversion shop considering switching to Mercurial, trying to figure out in advance what all the complaints from developers are going to be. There's one fairly common use case here that I can't see how to handle.
I'm working on some largish feature, and I have a significant part of the code -- or possibly several significant parts of the code -- in pieces all over the garage floor, totally unsuitable for checkin, maybe not even compiling.
An urgent bugfix request comes in. The fix is nice and local and doesn't touch any of the code I've been working on.
I make the fix in my working copy.
Now what?
I've looked at "Mercurial cherry picking changes for commit" and "best practices in mercurial: branch vs. clone, and partial merges?" and all the suggestions seem to be extensions of varying complexity, from Record and Shelve to Queues.
The fact that there apparently isn't any core functionality for this makes me suspect that in some sense this working style is Doing It Wrong. What would a Mercurial-like solution to this use case look like?
Edited to add: git, by contrast, seems designed for this workflow: git add the bugfix files, don't git add anything else (or git reset HEAD anything you might have already added), git commit.

Here's how I would handle the case:
have a dev branch
have feature branches
have a personal branch
have a stable branch.
In your scenario, I would be committing frequently to my branch off the feature branch.
When the request came in, I would hg up -r XYZ where XYZ is the rev number that they are running, then branch a new feature branch off of that(or up branchname, whatever).
Perform work, then merge into the stable branch after the work is tested.
Switch back to my work and merge up from the top feature branch commit node, thus integrating the two streams of effort.

Lots of useful functionality for Mercurial is provided in the form of extensions -- don't be afraid to use them.
As for your question, record provides what you call partial commits (it allows you to select which hunks of changes you want to commit). On the other hand, shelve allows to temporarily make your working copy clean, while keeping the changes locally. Once you commit the bug fix, you can unshelve the changes and continue working.
The canonical way to go around this (i.e. using only core) would probably be to make a clone (note that local clones are cheap as hardlinks are created instead of copies).

You would clone the repository (i.e. create a bug-fix branch in SVN terms) and do the fix from there.
Alternatively if it really is a quick fix you can use the -I option on commit to explicitly check-in individual files.

Like any DVCS, branching is your friend. Branching a repository multiple ways is the bread and butter of these system. Here's a git model you might consider adopting that works quite well with Mercurial, also.

In addition to what Santa said about branching being your friend...
Small-granularity commits are your friend. Rather than making lots of code changes in a single commit, make each logically self-contained code change in its own commit. Then it will be a lot easier to cherry-pick changes to merge between branches.

Don't use Mercurial without using the Mq Extension (it comes pre-packaged in the default installation). In addition to solving your specific problem, it solves a lot of other general problems and really should be the default way that you work (especially if you're using an IDE that doesn't integrate directly with Hg, making switching branches on the fly a difficult way to work).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008