Dealing with shared dependencies in subdir projects - mercurial

This is (i'm guessing) a not too-too rare occurrence, but how do people deal with common links in subrepos? It may just be one of the things thats why subdirs are a pain in the butt to use.
Subrepo A has a subrepo B at rev 5
Subrepo C has a subrepo B at rev 10
Subrepo D has A and C. There is now a conflict between the Bs in some build systems.
So you get the dependency structure:
D___A__B
\__C__/
Even if you managed to get A and C pointed at the same revision, there is still two copies of the code that make a conflict.
What would probably be better is to say "A requires B at rev 5." and "C requires B at rev 10." and "D requires A at ref X" and "D requires B at ref X" and "A is here, B is here, C is here, D is here, FIND THE CONFLICTS" but i don't think that is currently possible.

You don't deal with it, this is instead a not-too often case.
First of all, it is greatly discouraged to have a sub-repo outside the directory, and that setup requires at least one outside sub-repo.
Second, even if you setup an outside repo, it is even less recommended to have more than one repo holding the reference to a single repo.
So yes, it is possible, but getting in that state is already hard to do, and if someone actually came to me with this, I'd tell him that he should rethink his configuration. There is a bad smell here anyway, and it is not mercurial's job to handle this.

Related

What is the simplest way to fix a single broken line in a committed Mercurial changeset?

I frequently run into a use case where I identify a small bug in a committed changeset far after the fact so that reverting the changeset is not an option.
I've looked at similar questions and the Mercurial docs for graft and general advice on backporting changes but either this "simple" use case isn't covered, or it's subsumed in a complex morass of DVCS rebasing/cloning/exporting/importing that is far more abuse than it's worth for what seems to be a trivial operation.
In short, in a repository consisting of
A -> B -> C -> D -> E
there's a one line bug that needs fixing in one file in changeset B which consists of a number of changes to multiple files. Is there a way to do this without reverting/fixing/reapplying all of B? Just being able to do
... -> B -> B' -> C -> ...
would solve this problem.
Note that I have zero context for the concept of rebasing so unless you're willing to spoon-feed it to me, it's not going to help much. My needs are usually pretty simple; I use Mecurial essentially in single-user mode as an advanced form of RCS or SVN, typically using only commit, branch, and merge (absolutely no push, pull, import, export, rebasing, or other 'distributed' features). Yes, I know I'm potentially ruling out a lot of options for solving this problem, but my focus is on fixing my code, not on understanding fine-grained behavior of Mercurial features I never use (sorry, just being honest here.)
If this isn't possible, please let me know so I can just commit my fix as F with the commit message that changesets B through E are broken.
If you've already pushed the change to another user, I would advise you to simply commit as F.
If you haven't, keep in mind the changeset IDs of C, D, ... will all change as well. So the commit hashes will become C', D', ...
I've started to notice that the new 'hg evolve' functionality is very good at what you're asking.
First, a small warning: evolve is (I believe) still experimental right now. Be sure to have a backup! However, evolve really seems like the best fit for this.
First follow the setup instructions.
Then, you can do the following steps:
Update to B: hg update -r B
Make your change.
Commit the amended changeset: hg commit --amend
You will now get a warning about unstable changesets (all the changesets that are descendants of B).
Run hg evolve --all. This will modify C, D, ... to be correctly based on top of B'.
"Dump" and obvious way:
hg up B
Fix
hg commit
hg up E
hg merge (with possible resolving of conflicts)
I.e you commit your fix as child of B (get anonymous branching and additional head - changeset F - in current branch) and merge F into E (as G changeset), which bring F changes into mainline. You can skip update to E: direction of merge in case of anonymous branching doesn't have a much sense
My experience is with git, not mercurial, so I may not be getting the commands quite right.
What you're wanting to do is, yes, sometimes termed a "rebase" in DVCS-land, but I come from a different version control, er, vintage, so I don't think of it as "rebase". I just think of it as "rewrite history". It's only a couple of commands.
Spoon-feed, eh? Here comes the plane ...
I think your commands are going to go something like this:
enable the rebase extension (seems like it comes with hg but is not enabled?)
hg up B
to make your fix to B: you'll make it against the code as it was in B
hope I have the command right? This should essentially set your working directory to be operating on the source as it was at point B.
... make your fix and commit the changeset on top of B: your tree of changes will look like:
A -> B -> C -> D -> E
\
-> FIX <--- you are here
Okay, now comes the rewriting history part. You want to take the history of changes leading up to E, and rewrite them as if they were based on A->B->FIX instead of on A->B.
hg up E (because this is the codeline you are interested in rewriting)
hg rebase --dest FIX
hg is smart enough to know that B is a common ancestor, so it will take subsequent changes and apply them in sequence.
That's it. After this, your history will look like this (as requested):
A -> B -> FIX -> C' -> D' -> E'
I wrote them as C' etc, because it's important to understand that the source at those points will be different than it was before: specifically, they will reflect the incorporation of FIX. The changes made to the code by changesets C, D, E should be the same as those made in C', D', E', unless there is some conflict or other overlap with the fix, in which case you'll probably have to manually interact with the application of the changes.
For example, a refactor of the code affected by the fix could mean making sure the change fix also gets refactored properly. It could even mean another instance of this exercise, if the version control system is unable to track the code between files.
Important, maybe: If someone else has been tracking changes from your repo, and has pulled the original changesets, then by doing this you've taken some of their history away, which might really mess them up, which is why you see people screaming and pulling their online hair out about "rebase".

Mercurial: How to synchronize multiple people with the intention to pull/merge/push from/with/to a central repository?

Given a central mercurial respository and mutliple teams with access to this repository, every team for once in a while has to synchronize its changes to the centralized one.
Let's say team A and team B are in the state and have the intention to pull/merge/push to the centralized repository.
A pulls.
B pulls.
A merges
B merges.
B pushes. => A has to pull and merge again before pushing!
Is there a (maybe technical) way to avoid this conflict? Maybe some kind of lock? Or is this something one has to live with? :-)
By the way this "conflict" occures in a smaller scale within the teams, too. But since the team members are in the same room one can solve this by just "shouting" his intentions throughthe room to avoid the conflict.
The simplest way may to use hg pull --rebase when pulling -- this will rebase local changes on top of the changes from the main repository and avoid having too many merges (although this actually isn't a problem for Mercurial, it may be a problem for developers trying to understand the history).
In your example, if you change B merges to B rebases, then A will still have to pull and merge again, but there will be fewer unnamed branches to try to follow.
In any case, whether rebasing or merging following a pull, I would recommend doing an immediate push so that the changes are available to everybody.
But again, I would like to emphasize that this "problematic" scenario is only problematic because of the inconvenience for the developer. This is actually the workflow DVCS's like Mercurial are designed to handle. The "standard" workflow has always been to "pull and merge/rebase" before pushing -- hence Mercurial's warning about creating new heads if you try to push without merging.
Usually you can live with it, IMHO it is common sense to work this way.
You can have a workaround, but it misuses the meaning of branches.
A and b creating branches, leaving the tip/default/master untouched.
Both can commit and push their changes, but this will create branches on the
central repository.
This could cause serious conflicts if the naming is equal, there a naming convention might help.
Someone, mostly the build engineer or system architect has to merge the changes from the different branches.
If both working on different features on the same component, it might be a good way to do so.
I would recommend you to read this chapter http://hgbook.red-bean.com/read/managing-releases-and-branchy-development.html.

Hg sub-repository dependencies

There have been a couple of questions about Hg sub-repo dependencies in the past (here and here) but the accepted answers don't seem to address the problem for me.
A project of mine has 4 dependencies: A, B, C, D. D is dependent on A, B and C; and B and C are dependent on A:
I want to use Hg sub-repositories to store them so I can track what version of each they rely on. This is because, while I am using A,B,C and D in this project, other projects will require just A and B. Therefore B and C must track what version of A they need independently of D. At the same time, in my application the versions of B and C referenced by a given version of D must always use the same version of A as that referenced by the given version of D (otherwise it will just fall over at runtime). What I really want is to allow them to reference each other as siblings in the same directory - i.e. D's .hgsub would look like the following, and B and C's would look like the first line.
..\A = https:(central kiln repo)\A
..\B = https:(central kiln repo)\B
..\C = https:(central kiln repo)\C
However this doesn't seem to work: I can see why (it'd be easy to give people enough rope to hang themselves with) but its a shame as I think its the neatest solution to my dependencies. I've read a few suggested solutions which I'll quickly outline and why they don't work for me:
Include copies as nested sub-directories, reference these as Hg sub-repositories. This yields the following directory structure (I've removed the primary copies of A, B, C, B\A, C\A as I can accept referencing the copies inside \D instead):
project\ (all main project files)
project\D
project\D\A
project\D\B
project\D\B\A
project\D\C
project\D\C\A
Problems with this approach:
I now have 3 copies of A on disk, all of which could have independent modifications which must be synced and merged before pushing to a central repo.
I have to use other mechanisms to ensure that B, C and D are referencing the same version of A (e.g. D could use v1 while D\B could use v2)
A variation: use the above but specify the RHS of the .hgsub to point to a copy in the parent copy (i.e. B and C should have the .hgsub below):
A = ..\A
Problems with this approach:
I still have three copies on disk
The first time I clone B or C it will attempt to recursively pull the referenced version of A from "..\A", which may not exist, presumably causing an error. If it doesn't exist it gives no clue as to where the repo should be found.
When I do a recursive push of changes, the changes in D\B\A do not go into the shared central repo; they just get pushed to D\A instead. So if I push twice in a row I can guarantee that all changes will have propagated correctly, but this is quite a fudge.
Similarly if I do a (manual) recursive pull, I have to get the order right to get the latest changes (i.e. pull D\A before I pull D\B\A)
Use symlinks to point folder \D\B\A to D\A etc.
Problems with this approach:
symlinks cannot be encoded in the Hg repo itself so every time a team member clones the repo, they have to manually/with a script re-create the symlinks. This may be acceptable but I'd prefer a better solution. Also (personal preference) I find symlinks highly unintuitive.
Are these the best available solutions? Is there a good reason why my initial .hgsub (see top) is a pipe-dream, or is there a way I can request/implement this change?
UPDATED to better explain the wider usage of A,B,C,D
Instead of trying to manage your dependencies via Mercurial (or with any SCM for that matter), try using a dependency management tool instead, such as Apache Ivy.
Using an Ivy based approach, you don't have any sub-repos, you would just have projects A, B, C and D. A produces an artifact (e.g. a .jar, .so or .dll, etc), which is published into an artifact repository (basically a place where you keep your build artefacts) with a version. Projects B and C can then depend on a specific version of A (controlled via a ivy.xml file in each project) which Ivy will retrieve from the artifact repository. Projects B and C also produce artefacts that are published to your repository. Project D depends on B and C and Ivy can be told to retrieve the dependencies transitively, which means it will get the artifacts for B, C and A (because they depend on A).
A similar approach can be used with Apache Maven and Gradle (the later uses Ivy)
The main advantages are that:
it makes it very clear what versions of each component a project is using (sometimes people forget to check .hgsub, so they don't know they are working with subrepos),
it makes it impossible to change a dependant project (as you are working with artifacts, not code)
and it saves you from having to rebuild dependent projects and being unsure of what version you are using.
saves you from having multiple redundant copies of projects that are used by other projects.
EDIT: Similar answer with a slightly different spin at Best Practices for Project Feature Sub-Modules with Mercurial and Eclipse?
You say you want to track which version they each rely on but you'd also be happy with a single copy of A shared between B, C and D. These are mutually exclusive - with a single copy of A, any change to A will cause a change in the .hgsub of each of B, C and D, so there is no independence in the versioning (as all of B, C and D will commit after a change to A).
Having separate copies will be awkward too. If you make a change that affects both B's copy of A and C's copy then attempt to push the whole structure, the changes to (say) B will succeed but the changes to C will fail because they require merging with the changes you just pushed from B, to avoid creating new heads. And that will be a pain.
The way I would do this (and maybe there are better ways) would be to create a D repo with subrepos of A, B and C. Each of B and C would have some untracked A-location file (which you're prompted to enter via a post-clone hook), telling your build system where to look for its A repository. This has the advantage of working but you lose the convenience of a system which tracks concurrent versions of {B, C} and A. Again, you could do this manually with an A-version file in each of B or C updated by a hook, read from by a hook, and you could make that work, but I don't think it's possible using the subrepos implementation in hg. My suggestions really boil down to implementing a simplified subrepo system of your own.

Using mercurial on divergent branches

What is a good workflow for using mercurial with two long-running branches that are slightly divergent (i.e. I never intend to entirely merge them back
together)?
In my case, this is CMS software that has been customized differently for two
different web sites. I started with projectA, and once that was working cloned it to projectB and make further tweaks to both A and B to customize them. Now I want to develop some features that show up in both A and B, without merging the site-specific customizations. How?
hg push will push everything, so that won't work
Transplant appears to give me different changeset hashes, which worries me
I feel like maybe the repositories should be set up differently, but I'm not
sure how.
As Thilo comments, the common part would be best developed (and published in A and B) as a third repo declared as a SubRepo.
That way, you respect the first two repos which are independent (one evolution on A doesn't always mean an evolution on B), and you can develop the common part in subrepo C.
A solution for Mercurial might be if you can put the different areas in files that can be in .hgignore, but then they won't be versioned, so that may not be so good.
Another way is to just use 1 repo, and set a global flag, and use template A or B depending on the flag, and / or include different code source file depending on the flag. If the difference is small, then can use if-then-else inside the same file.
You can use hg push to push the changes back together, but you don't necessarily have to merge all the changesets into the trunk. Just take the ones you want.
As stated above, a subrepo is probably the best option. Another alternative would be to have a third branch with the common work, and merge from that branch to projectA and projectB (but never back to the common branch).
This alternative is more likely to have accidents (merging the wrong way) but you might find that it is easier to set up and get working quickly.

How to manage concurrent development with mercurial?

This is a best practice question, and I expect the answer to be "it depends". I just hope to learn more real world scenarios and workflows.
First of all, I'm talking about different changes for the same project, so no subrepo please.
Let's say you have your code base in an hg repository. You start to work on a complicated new feature A, then a complicated bug B is reported by your trusted tester (you have testers, right?).
It's trivial if (the fix for) B depends on A. You simlply ci A then ci B.
My question is what to do when they are independent (or at least it seems now).
I can think of the following ways:
Use a separate clone for B.
Use anonymous or named branches, or bookmarks, in the same repository.
Use MQ (with B patch on top of A).
Use branched MQ (I'll explain later).
Use multiple MQ (since 1.6)
1 and 2 are covered by an excellent blog by #Steve Losh linked from a slightly related question.
The one huge advantage of 1 over the other choices is that it doesn't require any rebuild when you switch from working on one thing to the other, because the files are physically separated and independent. So it's really the only choice if, for example, A and/or B touches a header file that defines a tri-state boolean and is included by thousands of C files (don't tell me you haven't seen such a legacy code base).
3 is probably the easiest (in terms of setup and overhead), and you can flip the order of A and B if B is a small and/or urgent fix. However it can get tricky if A and B touches the same file(s). It's easy to fix patch hunks that failed to apply if A and B changes are orthogonal within the same file(s), but conceptually it's still a bit risky.
4 can make you dizzy but it's the most powerful and flexible and scalable way. I default hg qinit with -c since I want to mark work-in-progress patches and push/pull them, but it does take a conceptual leap to realize that you can branch in MQ repo too. Here are the steps (mq = hg --mq):
hg qnew bugA; make changes for A; hg qref
mq branch branchA; hg qci
hg qpop; mq up -rtip^
hg qnew bugB; make changes for B; hg qref
mq branch branchB; hg qci
To work on A again: hg qpop; mq up branchA; hg qpush
It seems crazy to take so many steps, and whenever you need to switch work you must hg qci; hg qpop; mq up <branch>; hg qpush. But consider this: you have several named release branches in the same repository, and you need to work on several projects and bug fixes at the same time for all of them (you'd better get guaranteed bonus for this kind of work). You'd get lost very soon with the other approaches.
Now my fellow hg lovers, are there other/better alternatives?
(UPDATE) qqueue almost makes #4 obsolete. See Steve Losh's elegant description here.
I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code. Whether to have one clone or two sitting on your disk is generally an easy one, given my working style, at least:
Does your project lack a build process, so that you can test and run things right from the source code? Then I will be tempted to have just one clone, and hg up back and forth when I need to work on another branch.
But if you have a buildout, virtualenv, or other structure that gets built, and that might diverge between the two branches, then doing an hg up then waiting for the build process to re-run can be a big pain, especially if things like setting up a sample database are involved. In that case I would definitely use two clones, one sitting at the tip of trunk, and one sitting at the tip of the emergency feature branch.
It seems like there's no more or better choices than the ones I listed in the question. So here they are again.
Use one clone per project.
Pros: total separation, thus no rebuild when switching projects.
Cons: toolchain needs to switch between two clones.
Use anonymous or named branches, or bookmarks, in the same repository.
Pros: standard hg (or any DVCS) practice; clean and clear.
Cons: must commit before switching and rebuild after.
Use MQ with one patch (or multiple consecutive patches) per project.
Pros: simple and easy.
Cons: must qrefresh before switching and rebuild after; tricky and risky if projects are not orthogonal.
Use one MQ branch (or qqueue in 1.6+) per project.
Pros: ultra flexible and scalable (for the number of concurrent projects)
Cons: must qrefresh and qcommit before switching and rebuild after; feels complicated.
Like always, there's no silver bullet, so pick and choose the one right for the job.
(UPDATE) For anyone who's in love with MQ, using MQ on top of regular branches (#2 + #3) is probably the most common and preferable practice.
If you have two concurrent projects with baseline on two branches (for example next release and current release), it's trivial to hop between them like this:
hg qnew; {coding}; hg qrefresh; {repeat}
hg qfinish -a
hg update -r <branch/bookmark/rev>
hg qimport -r <rev>; {repeat}
For the last step, qimport should add a -a option to import a line of changesets at once. I hope Meister Geisler notices this :)
So the question is, at the point when you are told to stop working on feature A, and begin independent feature B, what alternative options are there, for: How to manage concurrent development with mercurial?
Let's look at the problem with concurrency removed, the same way you write threaded code- define a simple work flow for solving any problem given to you, and apply it to each problem. Mercurial will join the work, once it's done. So, programmer A will work on feature A. Programmer B will work on feature B. Both just happen to be you. (If only we had multi-core brains:)
I would always use named branches, because that lets Mercurial do its job: to keep your project history, and to remember why you made which changes in what order to your source code.
I agree with Brandon's sentiment, but I wonder if he overlooked that feature A has not been tested? In the worst case, the code compiles and passes unit tests, but some methods implement the previous requirements, and some methods implement the new ones. A diff against the previous check-in is the tool I would use to help me get back on track with feature A.
Is your code for feature A at a point when you would normally check it in? Switching from feature A to working on feature B is not a reason to commit code to the head or to a branch. Only check in code that compiles and passes your tests. My reason is, if programmer C needs to begin feature C, a fresh checkout of this branch is no longer the best place to start. Keeping your branch heads healthy, means you can respond quickly, with more reliable bug fixes.
The goal is to have your (tested and verified) code running, so you want all your code to end up merged into the head (of your development and legacy branches). My point seems to be, I've seen branching used inefficiently: code becomes stale and then not used, the merge becomes harder than the original problem.
Only your option 1 makes sense to me. In general:
You should think your code works, before someone else sees it.
Favor the head over a branch.
Branch and check-in if someone else is picking up the problem.
Branch if your automated system or testers need your code only.
Branch if you are part of a team, working on a problem. Consider it the head, see 1-4.
With the exception of config files, the build processes should be a checkout and a single build command. It should not be any more difficult to switch between clones, than for a new programmer to join the project. (I'll admit my project needs some work here.)