TL;DR Version: Is it possible to reorganize a Mercurial repo without breaking Kiln/Fogbuz history? Or do I have to start fresh?
I have a repository that is a real mess, in need of some serious cleanup, and am trying to figure out how best to do it. The goal is to remove a few files entirely -- they should not appear in any commits, ever -- move a few directories, and split one directory out into an entirely separate repository. I know, I know -- you're not supposed to be able to change history. In this case, however, it's either change history or start from scratch with new repositories.
The repository in question is managed in Mercurial, with the remote repository hosted in Kiln. Issues are tracked in Fogbugz. Thanks to some commit link-processing rules, any references in a commit message to an issue (case) number like Case 123 are converted to links to the Fogbugz case in question. In turn, the case that was mentioned has a note appended to it with the commit message.
Current Structure
The project file structure is currently something like this:
- /
+- includes/
| +- functions-related-to-abc.php
| +- functions-related-to-xyz.php
| +- class-something.php
| +- classes-several-things.php
| +- random-file.php
| ...
|
+- development/
| +- a-plugin-folder/
| | +- some-file.php
| | +- file-with-sensitive-and-non-sensitive-info.php
| | ...
| |
| +- some-backend-functions-related-to-coding.php
| ...
|
+- index.php
+- test-config-file.php
...
Target Structure
The structure I want is something like this:
- /
+- build/
+- doc/
+- src/
| +- functions/
| | +- abc.php // renamed from includes/functions-related-to-abc.php
| | +- xyz.php // renamed from includes/functions-related-to-xyz.php
| | ...
| |
| +- classes/
| | +- something.php // renamed from includes/class-something.php
| | +- several-things.php // renamed from includes/classes-several-things.php
| | ...
| |
| +- view/
| | +- random-file.php // formerly includes/random-file.php
| ...
|
| +- development/
| | +- some-backend-functions-related-to-coding.php
| | ...
| +- index.php
| ...
|
+- test/
...
a-plugin-folder would move to its own, separate repository. test-config-file.php would no longer be tracked in the repository at all. Ideally, I will also do some minor pruning and renaming of branches while I'm at it.
In my dream world, file-with-sensitive-and-non-sensitive-info.php would somehow be tracked consistently, but with the sensitive info (a couple of passwords) yanked out into a config file that is not under version control. I realize that's probably wishful thinking.
My Current Thinking
My current thinking is that my wish list is basically impossible: I can create new, properly structured repositories from this point forward, but cannot preserve my change history and also make the radical structural changes I need to make. In this view, I should take the current code base, reorganize it all the way I want it, and commit it as changeset 1 for two new repositories (the root repository and the plugin repository). I would then just keep a copy of the old repository backed up somewhere for reference. Major downsides: (1) I lose all my history and (2) the Kiln and Fogbugz cross-references for historical commits are all toast.
My Question
So, here's the question: is there any way to do what I want -- restructure, pull a few files out, and get everything looking pretty -- without losing all of my history?
I have considered using the hg convert extension, making heavy use of the filemap, splicemap, and branchmap options. The problems I see with that approach include: (1) breaking all prior builds, (2) not having file-with-sensitive-and-non-sensitive-info.php in prior builds at all (or leaving it in, which defeats the point), and (3) rendering many of the commit messages wildly incorrect to the extent they refer to file names or repo structure. In other words, I'm not sure this option gains me much as opposed to just starting clean, properly structured repositories.
I have also considered the extreme option: writing a custom script of some sort to build a new repository by going through each existing commit, stripping sensitive information out of file-with-sensitive-and-non-sensitive-info.php, rewriting commit messages to the extent necessary, and committing the revised version of everything. This, theoretically, could solve all of my problems, but at the cost of reinventing the wheel and probably taking a ridiculous amount of time. I'm looking for something that isn't the equivalent of writing an entire hg extension.
EDIT: I am considering creating an empty repository, then writing a script that uses hg export and hg import to bring changesets over one at a time, making edits where necessary to strip sensitive information like passwords out of files. Is there a reason this wouldn't work?
Edit: I ended up taking a different approach from the one described below. My other answer explains what I ended up doing. That said, I am still very interested in a plugin like the one described below, so I am leaving this post up for reference if I find time to do it or it anyone else wants to take on the project.
I have determined that this is possible using import, export, and some patching at appropriate points in the repository's history.
The Algorithm
The short version of the algorithm looks like this:
Create a new repository
Loop through the existing repository's changesets, doing the following:
Export a changeset from the old repository
Import the changeset into the new repository without committing it
Make any necessary edits to the commit message and/or sensitive files
Commit the changeset in the new repository, preserving the (possibly modified) commit message and other metadata
Swap out the old and new repositories
Caveats:
Obviously, as with all history edits, this only works for non-public repositories which haven't been pulled by third parties.
Step 2 can and should be heavily automated to batch-process changesets with no editing required.
It will be necessary to halt execution whenever changes are required.
Making it Work
I have a very basic proof of concept batch file that proves this can work.
I am working on a Mercurial plugin to make this as easy as possible. That said, I am still open to better suggestions if anyone has any.
I was able to accomplish my goals. Here's what I ended up doing:
First, I "flattened out" (straightened) the repository by eliminating all branches and merges and turning the repo into a single line of commits. I had to do this because hg histedit -- the key to the whole cleanup -- doesn't work on history containing merges. This was okay with me, because there were no really meaningful branches or merges in this particular repository and there is only one author in the relevant history. I probably could have retained the branches and merged again as necessary later, but this was easier for my purposes. To do this I used hg rebase and the MQ extension. (Special thanks to #tghw for this extremely helpful answer, which helped me understand for the first time how MQ really works.)
Next, I used hg convert to create several repositories from the original repository -- one for each library/plugin that I needed to put into its own repository and one main repository for the rest of the code. In the process, I used --filemap and --branchmap to reorganize everything as necessary.
Third, I used hg histedit on each new repository to (1) clean up irrelevant commit messages as needed and (2) remove sensitive information.
Fourth, I pushed all of the new repositories to Kiln, which automatically linked them to FogBugz cases using the same rules I had in place for the original repository (e.g., Case 123 in the commit message creates a link to FogBugz case # 123).
Finally, I "deleted" the original repository in Kiln. Kiln doesn't truly and permanently delete repositories as of right now, though I have proposed a use case for making that possible. Instead, it delinks FogBugz cases and puts the "deleted" repository into cold storage; an account administrator can restore it, but it is otherwise invisible.
All told, it took about 10 hours to split the original repository into 6 pieces and clean each part thereof. Some of that was learning curve; I could probably do the whole thing in more like 6 hours if I had to do it again. A long day, but worth it for the dramatically improved repository structure and cleaned-up code.
Everything is now as it should be. Hopefully, this will help other users. Please feel free to post a comment if you have a similar issue and would like additional insight from my experience.
Related
I just lost all of the changes in a Mercurial patch (fortunately, I had a backup), and I would like to figure out what went wrong.
The Setup
I had a pair of patches, call them patch1.diff and patch2.diff. They were both based on revision 123, but affected completely different files, with no overlap. So, my repository looked something like this in TortoiseHg (where p is a patch and r is a regular revision):
Graph Rev Branch Tags Message
p 125 develop patch2.diff Change to existing file baz.php
p 124 develop patch1.diff Add new files foo.php and bar.php
r 123 develop Last committed changeset
|
r 122 develop Old changes
...
What I Did
I wanted to switch the order of the patches, because my work on patch2.diff was complete and I wanted to commit those changes. So I tried rebasing that patch onto revision 123. That didn't work, and I ended up with something like this:
Graph Rev Branch Tags Message
r Working directory - not a head revision!
r 126 develop Change to existing file baz.php
|
p | 125 develop patch2.diff Change to existing file baz.php
|
p | 124 develop patch1.diff Add new files foo.php and bar.php
|
r-+ 123 develop Last committed changeset
|
r 122 develop Old changes
...
That was clearly wrong. I now had a revision 126 with the same changes as those in patch2.diff, but I also still had a patch2.diff, which wasn't rebased as I expected. On top of that, I was getting the "not a head revision" message, even though there weren't actually any changes in my working directory.
So I stripped revision 126. At that point, things went completely off the rails, leaving me with this:
Graph Rev Branch Tags Message
p 125 develop patch2.diff Change to existing file baz.php
p 124 develop patch1.diff
r 123 develop Last committed changeset
|
r 122 develop Old changes
...
patch1.diff still appeared in TortoiseHg, but the changes and commit message were gone. I tried hg qpush --all, and got these messages:
applying patch1.diff
unable to read patch1.diff
I couldn't even find patch1.diff on my file system anymore. Ultimately, I had to run hg qdelete --keep patch1.diff and then restore my lost changes from offsite backups.
I ended up where I wanted to be, but nearly lost hours of work on a new feature. I was able to recover only because I had an offsite backup of the new files. That was terrifying.
The Question
What in the world happened? Why did I lose patch1.diff? I could understand if I lost the changes in patch2.diff given the way I used hg strip, but I have no idea why patch1.diff got nuked.
You stumbled over the issues why mq might very soon not be recommended anymore. It wants to retain control over csets it controls and it looses that, when you modify history under mq control. Thus mq does not work well with rebase, strip, histedit...
The better way is to simply stop using mq at all. Make your default phase for new commits secret (or draft). Commit your patches as normal changesets - then mq cannot interfer with proper working of rebase and what you did try to do would simply have worked.
hg rebase -s125 -d123
hg rebase -s124 -d126
(given the state of your repo as in the first quote, just asusming r124, r125 are normal csets, not under mq control)
And if you're a little daring, you take a look at the evolve extension which is very useful for people who maintain patch queues with respect to upstream repos or juggle draft changesets with collaborators.
See http://www.logilab.org/blogentry/88203 for an introduction to mercurial phases
What we have for now:
(1) we have a single product with few components developed in separate Mercurial repositories (components are Desktop Client, Mobile Client, Server, etc).
(2) we use a revision number in version, like 1.0.0.REV
What we want to have:
(3) have a shared libraries between components (without commiting from one repository to another)
(4) have a common REV number in version for all components
Question: is it possible to have (4), (5) without merging all repositories into one?
This looks like it can be solved using sub-repositories.
I suggest to setup all your different repositories as sub-repositories of a main repo, which could include your shared libraries (as a sub-repository or directly in the main repo) and the revision file containing your current revision number. Your current repos can remain intact with this method.
Main repo
|-.hg
|
|-Shared libraries
|
|-Desktop Client
|--.hg
|
|-Server
|--.hg
|
...
|
|-.hgsubstate
|-revision.xml
|
At every change in the default branch of any sub-repo, you will have to also commit a change in the main repo to point to the new head of its sub-repo.
We develop .NET Enterprise Software in C#. We are looking to improve our version control system. I have used mercurial before and have been experimenting using it at our company. However, since we develop enterprise products we have a big focus on reusable components or modules. I have been attempting to use mercurial's sub-repos to manage components and dependencies but am having some difficulties. Here are the basic requirements for source control/dependency management:
Reusable components
Shared by source (for debugging)
Have dependencies on 3rd party binaries and other reusable components
Can be developed and commited to source control in the context of a consuming product
Dependencies
Products have dependencies on 3rd party binaries and other reusable components
Dependencies have their own dependencies
Developers should be notified of version conflicts in dependencies
Here is the structure in mercurial that I have been using:
A reusable component:
SHARED1_SLN-+-docs
|
+-libs----NLOG
|
+-misc----KEY
|
+-src-----SHARED1-+-proj1
| +-proj2
|
+-tools---NANT
A second reusable component, consuming the first:
SHARED2_SLN-+-docs
|
+-libs--+-SHARED1-+-proj1
| | +-proj2
| |
| +-NLOG
|
+-misc----KEY
|
+-src-----SHARED2-+-proj3
| +-proj4
|
+-tools---NANT
A product that consumes both components:
PROD_SLN----+-docs
|
+-libs--+-SHARED1-+-proj1
| | +-proj2
| |
| +-SHARED2-+-proj3
| | +-proj4
| |
| +-NLOG
|
+-misc----KEY
|
+-src-----prod----+-proj5
| +-proj6
|
+-tools---NANT
Notes
Repos are in CAPS
All child repos are assumed to be subrepos
3rd party (binary) libs and internal (source) components are all subrepos located in the libs folder
3rd party libs are kept in individual mercurial repos so that consuming projects can reference particular versions of the libs (i.e. an old project may reference NLog v1.0, and a newer project may reference NLog v2.0).
All Visual Studio .csproj files are at the 4th level (proj* folders) allowing for relative references to dependencies (i.e. ../../../libs/NLog/NLog.dll for all Visual Studio projects that reference NLog)
All Visual Studio .sln files are at the 2nd level (src folders) so that they are not included when "sharing" a component into a consuming component or product
Developers are free to organize their source files as they see fit, as long as the sources are children of proj* folder of the consuming Visual Studio project (i.e., there can be n children to the proj* folders, containing various sources/resources)
If Bob is developing SHARED2 component and PROD1 product, it is perfectly legal for him to make changes the SHARED2 source (say sources belonging to proj3) within the PROD1_SLN repository and commit those changes. We don't mind if someone develops a library in the context of a consuming project.
Internally developed components (SHARED1 and SHARED2) are generally included by source in consuming project (in Visual Studio adding a reference to a project rather than browsing to a dll reference). This allows for enhanced debugging (stepping into library code), allows Visual Studio to manage when it needs to rebuild projects (when dependencies are modified), and allows the modification of libraries when required (as described in the above note).
Questions
If Bob is working on PROD1 and Alice is working on SHARED1, how can Bob know when Alice commits changes to SHARED1. Currently with Mercurial, Bob is forced to manually pull and update within each subrepo. If he pushes/pulls to the server from PROD_SLN repo, he never knows about updates to subrepos. This is described at Mercurial wiki. How can Bob be notified of updates to subrepos when he pulls the latest of PROD_SLN from the server? Ideally, he should be notified (preferable during the pull) and then have to manually decide which subrepos he wants to updated.
Assume SHARED1 references NLog v1.0 (commit/rev abc in mercurial) and SHARED2 references Nlog v2.0 (commit/rev xyz in mercurial). If Bob is absorbing these two components in PROD1, he should be be made aware of this discrepancy. While technically Visual Studio/.NET would allow 2 assemblies to reference different versions of dependencies, my structure does not allow this because the path to NLog is fixed for all .NET projects that depend on NLog. How can Bob know that two of his dependencies have version conflicts?
If Bob is setting up the repository structure for PROD1 and wants to include SHARED2, how can he know what dependencies are required for SHARED2? With my structure, he would have to manually clone (or browse on the server) the SHARED2_SLN repo and either look in the libs folder, or peak at the .hgsub file to determine what dependencies he needs to include. Ideally this would be automated. If I include SHARED2 in my product, SHARED1 and NLog are auto-magically included too, notifying me if there is version conflict with some other dependency (see question 2 above).
Bigger Questions
Is mercurial the correct solution?
Is there a better mercurial structure?
Is this a valid use for subrepos (i.e. Mercurial developers marked subrepos as a feature of last resort)?
Does it make sense to use mercurial for dependency management? We could use yet another tool for dependency management (maybe an internal NuGet feed?). While this would work well for 3rd party dependencies, it really would create a hassle for internally developed components (i.e. if they are actively developed, developers would have to constantly update the feed, we would have to serve them internally, and it would not allow components to be modified by a consuming project (Note 8 and Question 2).
Do you have better a solution for Enterprise .NET software projects?
References
I have read several SO questions and found this one to be helpful, but the accepted answer suggests using a dedicated tool for dependencies. While I like the features of such a tool it does not allowed for dependencies to be modified and committed from a consuming project (see Bigger Question 4).
This may not be the answer you were looking for, but we have recent experience of novice Mercurial users using sub-repos, and I've been looking for an opportunity to pass on our experience...
In summary, my advice based on experience is: however appealing Mercurial sub-repos may be, do not use them. Instead, find a way to lay out your directories side-by-side, and to adjust your builds to cope with that.
However appealing it seems to be to tie together revisions in the sub-repo with revisions in the parent repo, it just doesn't work in practice.
During all the preparation for the conversion, we received advice from multiple different sources that sub-repos were fragile and not well-implemented - but we went ahead anyway, as we wanted atomic commits between repo and sub-repo. The advice - or my understanding of it - talked more about the principles rather than the practical consequences.
It was only once we went live with Mercurial and a sub-repo, that I really understood the advice properly. Here (from memory) are examples of the sorts of problems we encountered.
Your users will end up fighting the update and merge process.
Some people will update the parent repo and not the sub-repo
Some people will push from the sub-repo, ang .hgsubstate won't get updated.
You will end up "losing" revisions that were made in the sub-repo, because someone will manage to leave the .hgsubstate in an incorrect state after a merge.
Some users will get into the situation where the .hgsubstate has been updated but the sub-repo hasn't, and then you'll get really cryptic error messages, and will spend many hours trying to work out what's going on.
And if you do tagging and branching for releases, the instructions for how to get this right for both parent and sub-repo will be many dozens of lines long. (And I even had a nice, tame Mercurial expert help me write the instructions!)
All of these things are annoying enough in the hands of expert users - but when you are rolling out Mercurial to novice users, they are a real nightmare, and the source of much wasted time.
So, having put in a lot of time to get a conversion with a sub-repo, several weeks later we then converted the sub-repo to a repo. Because we had large amounts of history in the conversion that referred to the sub-repo, via .hgsubstate, it's left us with something much more complicated.
I only wish I'd really appreciated the practical consequences of all the advice much earlier on, e.g. in Mercurial's Features of Last Resort page:
But I need to have managed subprojects!
Again, don't be so sure. Significant projects like Mozilla that have tons of dependencies do just fine without using subrepos. Most smaller projects will almost certainly be better off without using subrepos.
Edit: Thoughts on shell repos
With the disclaimer I don't have any experience of them...
No, I don't think many of them are. You are still using sub-repos, so all the same user issues apply (unless you can provide a wrapper script for every step, of course, to remove the need for humans to supply the correct options to handle sub-repos.)
Also note that the wiki page you quoted does list some specific issues with shell repos:
overly-strict tracking of relationship between project/ and somelib/
impossible to check or push project/ if somelib/ source repo becomes
unavailable lack of well-defined support for recursive diff, log, and
status recursive nature of commit surprising
Edit 2 - do a trial, involving all your users
The point at which we really started realising we had an issue was once multiple users started making commits, and pulling and pushing - including changes to the sub-repo. For us, it was too late in the day to respond to these issues. If we'd known them sooner, we could have responded much more easily and simply.
So at this point, the best advice I think I can offer is to recommend that you do a trial run of the project layout before the layout is carved in stone.
We left the full-scale trial until too late to make changes, and even then people only made changes in the parent repo, and not the sub-repos - so we still didn't see the full picture until too late.
In other words, whatever layout you consider, create a repository structure in that layout, and get lots of people making edits. Try to put enough real code into the various repos/sub-repos so that people can make real edits, even though they will be throw-way ones.
Possible outcomes:
You might find it all works fine - in which case, you'll have spent some time to gain certainty.
On the other hand, you might identify issues much more quickly than spending time trying to work out what the outcomes would be
And your users will learn a lot too.
Question 1:
This command, when executed in the parent "shell" repo will traverse all subrepos and list changesets on from the default pull location that are not present:
hg incoming --subrepos
The same thing can be accomplished by clicking on the "Incoming" button on the "Synchronize" pane in TortoiseHg if you have the "--subrepos" option checked (on the same pane).
Thanks to the users in the mercurial IRC channel for helping here.
Questions 2 & 3:
First I need to modify my repo structures so that the parent repos are truly "shell" repos as recommended on the hg wiki. I will take this to the extreme and say that the shell should contain no content, only subrepos as children. In summary, rename src to main, move docs into the subrepo under main, and change the prod folder to a subrepo.
SHARED1_SLN:
SHARED1_SLN-+-libs----NLOG
|
+-misc----KEY
|
+-main----SHARED1-+-docs
| +-proj1
| +-proj2
|
+-tools---NANT
SHARED2_SLN:
SHARED2_SLN-+-libs--+-SHARED1-+-docs
| | +-proj1
| | +-proj2
| |
| +-NLOG
|
+-misc----KEY
|
+-main----SHARED2-+-docs
| +-proj3
| +-proj4
|
+-tools---NANT
PROD_SLN:
PROD_SLN----+-libs--+-SHARED1-+-docs
| | +-proj2
| | +-proj2
| |
| +-SHARED2-+-docs
| | +-proj3
| | +-proj4
| |
| +-NLOG
|
+-misc----KEY
|
+-main----PROD----+-docs
| +-proj5
| +-proj6
|
+-tools---NANT
All shared libs and products have there own repo (SHARED1, SHARED2, and PROD).
If you need to work on a shared lib or product independently, there is a shell available (my repos ending with _SLN) that uses hg to manage the revisions of the dependencies. The shell is only for convenience because it contains no content, only subrepos.
When rolling a release of a shared lib or product, the developer should list the all of the dependencies and their hg revs/changesets (or preferably human friendly tags) that were used to create the release. This list should be saved in a file in the repo for the lib or product (SHARED1, SHARED2, or PROD), not the shell. See Note A below for how this could solve Questions 2 & 3.
If I roll a release of a shared lib or product I should put matching tags in the in the projects repo and it's shell for convenience, however, if the shell gets out of whack (a concern expressed from real experience in #Clare 's answer), it really should not matter because the shell itself is dumb and contains no content.
Visual Studio sln files go into the root of the shared lib or product's repo (SHARED1, SHARED2, or PROD), again, not the shell. The result being if I include SHARED1 in PROD, I may end up with some extra solutions that I never open, but it doesn't matter. Furthermore, if I really want to work on SHARED1 and run it's unit tests (while working in PROD_SLN shell), it's really easy, just open the said solution.
Note A:
In regards to point 3 above, if the dependency file use a format similar to .hgsub but with the addition of the rev/changeset/tag, then getting the dependencies could be automated. For example, I want SHARED1 in my new product. Clone SHARED1 to my libs folder and update to the tip or the last release label. Now, I need to look at the dependencies file and a) clone the dependency to the correct location and b) update to the specified rev/changeset/tag. Very feasible to automate this. To take it further, it could even track the rev/changeset/tag and alert the developer of there is dependency conflict between shared libs.
A hole remains if Alice is actively developing SHARED1 while Bob is developing PROD. If Alice updates SHARED1_SLN to use NLog v3.0, Bob may not ever know this. If Alice updates her dependency file to reflect the change then Bob does have the info, he just has to be made aware of the change.
Bigger Questions 1 & 4:
I believe that this is a source control issue and not a something that can be solved with a dependency management tool since they generally work with binaries and only get dependencies (don't allow committing changes back to the dependencies). My dependency problems are not unique to Mercurial. From my experience, all source control tools have the same problem. One solution in SVN would be to just use svn:externals (or svn copies) and recursively have every component include its dependencies, creating a possibly huge tree to build a product. However, this falls apart in Visual Studio where I really only want to include one instance of a shared project and reference it everywhere. As implied by #Clare 's answer and Greg's response to my email to the hg mail list, keep components as flat as possible.
Bigger Questions 2 & 3:
There is a better structure as I have laid out above. I believe we have a strong use case for using subrepos and I do not see a viable alternative. As mentioned in #Clare 's answer, there is a camp that believes dependencies can be managed without subrepos. However, I have yet to see any evidence or actual references to back this statement up.
Bigger Question 5:
Still open to better ideas...
Since I want to diff all the changes I made since 7 or 10 days ago, without seeing the changes of other team members, so I keep a clone, say
c:\dev\proj1
and then I keep another clone that is
c:\dev\proj2
so I can change code for proj1, and then in another shell, pull code from it, and merge with other team members, and run test. And then 10 days later, I can still diff all the code made by me and nobody else by going to the shell of proj1 and do a hg diff or hg vdiff.
I think this can be done by using branch as well. Does having 2 clones like this work exactly the same as having 2 branches? Any advantage of one over the other method?
The short answer is: Yes.
Mercurial doesn't care where the changesets come from, when you merge. In that sense, branches and clones work equally well when it comes time to merge changes.
Even better: The workflow you described is exactly the strategy in Chapter 3 of the Mercurial book.
The only advantage of branches is that they have a name, so you have less incentive to merge right off. If you want to keep those proj2 changes separate, while still pushing and pulling them from proj1, give them a real branch. Again, functionally, they're the same.
And yes, this is characteristic of DVCS, not uniquely Mercurial.
Note : I'm more familiar with git than hg but the ideas should be the same.
The difference will become apparent if you update both the clones (which are both editing the same branch) e.g. for a quick bug fix on the integration sandbox.
The right way would be for you to have a topic branch (your first clone) which is where you do your development and another one for integration (your second clone). Then you can merge changes from one to another as you please. If you do make a change on the integration branch, you'll know that it was made there.
hg diff -r <startrev> -r <endrev> can be used to compare any two points in Mercurial's history.
Example history:
rev author description
--- ------ ----------------------
# 6 me Merge
|\
| o 5 others More other changes.
| |
| o 4 others Other changes.
| |
o | 3 me More of my changes.
| |
o | 2 me My changes.
|/
o 1 others More Common Changes
|
o 0 others Common Changes
If revision 1 was the original clone:
Revs 2 and 3 represent your changes.
Revs 4 and 5 are other changes made during your branch development. They are pulled merged into your changes at rev 6.
At this point, to see only changes by me before the merge, run hg diff -r 1 -r 3 to display those changes only.
Why not simply have two branches? (Branching/merging is much easier and safer in a DVCS like Hg or Git than in a centralised VCS like TFS or SVN!) It would be much more secure and reliable.
This will become apparent e.g. when you will want to merge the two branches/clones back together. Also, editing one branch from two different physical locations can easily lead to confusion and errors. Hg is designed to avoid exactly these kinds of situations.
Thomas
As some answers already pointed out, branches (named or anonymous) are usually more convenient than two clones because you don't have to pull/push.
But two clones have the distinct advantage of total physical separation, so you can literally work on two things at the same time, and you don't ever need to rebuild when you switch project.
Earlier I asked a question about concurrent development with hg, with option 1 being two clones and option 2 being two branches.
I look at Mercurial repositories of some known products, like TortoiseHg and Python, and even though I can see multiple people committing changes, the timeline always looks pretty clean, with just one branch moving forward.
However, let's say you have 14 people working on the same product, won't this quickly get into a branch nightmare with 14 parallel branches at any given time?
For instance, with just two people, and the product at changeset X, now both developers start working on separate features on monday morning, so both start with the same parent changeset.
When they commit, we now have two branches, and then with 14 people, we would quickly have 10+ (might not be 14...) branches that needs to be merged back into the default.
Or... What am I not seeing here? Perhaps it's not really a problem?
Edit: I see there's some confusion as to what I'm really asking about here, so let me clarify.
I know full and well that Mercurial easily handles multiple branches and merging, and as one answer states, even when people work on the same files, they don't often work on the same lines, and even then, a conflict is easily handled. I also know that if two people end up creating a merge hell because they changed a lot of the same code in the same files, there's some overall planning failure here, since we've placed two features in the exact same place onto two developers, instead of perhaps trying them to work together, or just giving both to one developer in the first place.
So that's not it.
What I'm curious about is how these open source project manage such a clean history. It's not important to me (as one comment wondered) that the history is clean, I mean, we do work in parallel, that the repository is able to reflect that, so much the better (in my opinion), however these repositories I've looked at doesn't have that. They seem to be working along the Subversion model where you can't commit before you've updated and merged, in which case the history is just one straight line.
So how do they do it?
Are they "rebasing" the changes so that they appear to be following the latest tip of the branch even though they were originally committed a bit back in the branch history? Transplanting changesets to make them appear to' having been committed in the main branch to begin with?
Or are the projects I've looked at either so slow (at the moment, I didn't look far back in the history) at adding new things that in reality they've only been working one person at a time?
Or are they pushing changes to one central maintainer who reviews and then integrates? It doesn't look like that since many of the projects I looked at had different names on the changesets.
Or... What am I not seeing here?
Perhaps it's not really a problem?
It's not really a problem. In a large project even when people work on the same feature, they don't usually work on the same file. When they work on the same file, they don't usually modify the same lines. And when they modify the same lines, then a merge should be done manually (for the affected lines).
This means in practice that 80+% of the merges can be done automagically by Mercurial itself.
Let's take an example:
you have:
[branch 1] [branch2]
\ /
\ /
[base]
Edit: for clarity, by branch I refer here to unnamed branches.
If you have a file changed in branch 1 but the same file in branch 2 is the same as in base, then the version in branch 1 is chosen. If the file is modified in both branch 1 and branch 2 the files are merged line by line using the same algorithm: if line 1 in file1 in branch 1 is different than line 1 in file1 in base but branch 2 and base have the line 1 equal, line 1 in branch 1 is chosen (and so on and so forth).
For the lines that are modified in both branches, Mercurial interrupts the automated merging process and prompts the user to choose which lines to use, or edit the lines manually.
Since deciding which lines to use is best done by the person(s) who modified those lines, a good practice is to have the person that implemented a feature perform the merge. That means that if me and you work on the same project, I implement my feature, then make a pull from a central/common repository (get the latest version that everyone uses), then merge my new version with the pulled changes, then publish it to the common repository (at this point, the common repository has one main branch, with my merged changes into it). Then, you pull that from the server and do the same with your changes.
This implies that everyone is capable of doing whatever they want in their local repository, and the common/official repository has one branch. It also means that you need to decide on a time frame when people should merge their changes in.
I used to have three or four repositories on my machine already compiled on different product versions (different branches of the repository) and a few different branches in my main repository (one for refactoring, one for development and so on). Whenever I would bring one branch to a stable state (say - finish a refactoring) I would pull from the server, merge that branch into the pulled changes, then push it back to the server and let anyone know that if they made any changes to the affected files, they should pull first from the server.
We used to synchronize implemented features every Monday morning and it took us about an hour to merge everything, then make a weekly build on the server to give to QA (on bad days it would take two member of the team two hours or so, then everyone would pull the week's changes on their machine and use them as a new base for the week). This was for an eight-developers team.
In your updated question it seems that you are more interested in ways of tidying up the history. When you have a history and want to make it into a single, neat, straight line you want to use rebase, transplant and/or mercurial queues. Check the docs out for those three and you should realise the workflow for how its done.
Edit: Since Im waiting for a compile, here follows a specific example of what I mean:
> hg init
> echo test > a.txt
> hg addremove && hg commit -m "added a.txt"
> echo test > b.txt
> hg addremove && hg commit -m "added b.txt"
> hg update 0 # go back to initial revision
> echo test > c.txt
> hg addremove && hg commit -m "added c.txt"
Running hg glog now shows this (diverging) history with two branches:
# changeset: 2:c79893255a0f
| tag: tip
| parent: 0:7e1679006144
| user: mizipzor
| date: Mon Jul 05 12:20:37 2010 +0200
| summary: added c.txt
|
| o changeset: 1:74f6483b38f4
|/ user: mizipzor
| date: Mon Jul 05 12:20:07 2010 +0200
| summary: added b.txt
|
o changeset: 0:7e1679006144
user: mizipzor
date: Mon Jul 05 12:19:41 2010 +0200
summary: added a.txt
Do a rebase, making changeset 1 into a child of 2 rather than 0:
> hg rebase -s 1 -d 2
Now lets check history again:
# changeset: 2:ea0c9a705a70
| tag: tip
| user: mizipzor
| date: Mon Jul 05 12:20:07 2010 +0200
| summary: added b.txt
|
o changeset: 1:c79893255a0f
| user: mizipzor
| date: Mon Jul 05 12:20:37 2010 +0200
| summary: added c.txt
|
o changeset: 0:7e1679006144
user: mizipzor
date: Mon Jul 05 12:19:41 2010 +0200
summary: added a.txt
Presto! Single line. :)
Also note that I didnt do a merge. When you rebase like this, you will have to deal with merge conflicts and everything just like as if you did a merge. Because thats pretty much what happens under the hood. Experiment with this in a small test repo. For example, try changing the file added in revision 0 rather than just adding more files.
I'm a Mercurial developer, so let me explain how we/I do it.
In the Mercurial project we accept contributions in form of patches sent to the mailinglist. When we apply those with hg import, we do an implicit rebase to the tip of the branch we are working on. This help a lot with keeping the history clean.
As for my own changes, I use rebase or mq to linearize things before I push them, again to keep the history tidy. It's basically a matter of doing
hg push # abort: creates new remote head
hg pull
hg rebase
hg push
You can combine the pull and rebase if you like (hg pull --rebase) but I've always liked to take one step at a time.
By the way, there are some disagreements about this practice of linearizing the history -- some believe that the history should show how things really happened, with all the branches and merges and whatnot. I find that as long as you don't mess with public changesets, then it's okay and useful to linearize history.
The Linux kernel is stored in thousands of repositories and probably millions of branches, and this doesn't seem to pose a problem. For large projects you need a repository strategy (e.g., the dictator–lieutenants strategy), but having many branches is the main strength of the modern DVCSes and not a problem at all.
Yes, we'll have to merge and to avoid heads on the main repository, merging should be done on the child repositories by the developer.
So before you push your code to the parent repository you first pull the latest changes, merge on your side and (try to) push. This should avoid unwanted heads in the master repo
I don't know how the TortoiseHg team does things, but you can use Mercurial's rebase extension to "detach" a branch and drop it on the top of the tip, creating a single branch.
In practice, though, I don't get concerned about multiple branches, as long as I don't see more heads than there should be. Merging is not really a big deal.