HG: Alternative to merging of repositories - mercurial

What we have for now:
(1) we have a single product with few components developed in separate Mercurial repositories (components are Desktop Client, Mobile Client, Server, etc).
(2) we use a revision number in version, like 1.0.0.REV
What we want to have:
(3) have a shared libraries between components (without commiting from one repository to another)
(4) have a common REV number in version for all components
Question: is it possible to have (4), (5) without merging all repositories into one?

This looks like it can be solved using sub-repositories.
I suggest to setup all your different repositories as sub-repositories of a main repo, which could include your shared libraries (as a sub-repository or directly in the main repo) and the revision file containing your current revision number. Your current repos can remain intact with this method.
Main repo
|-.hg
|
|-Shared libraries
|
|-Desktop Client
|--.hg
|
|-Server
|--.hg
|
...
|
|-.hgsubstate
|-revision.xml
|
At every change in the default branch of any sub-repo, you will have to also commit a change in the main repo to point to the new head of its sub-repo.

Related

Mercurial repository cleanup preserving Kiln/Fogbugz history

TL;DR Version: Is it possible to reorganize a Mercurial repo without breaking Kiln/Fogbuz history? Or do I have to start fresh?
I have a repository that is a real mess, in need of some serious cleanup, and am trying to figure out how best to do it. The goal is to remove a few files entirely -- they should not appear in any commits, ever -- move a few directories, and split one directory out into an entirely separate repository. I know, I know -- you're not supposed to be able to change history. In this case, however, it's either change history or start from scratch with new repositories.
The repository in question is managed in Mercurial, with the remote repository hosted in Kiln. Issues are tracked in Fogbugz. Thanks to some commit link-processing rules, any references in a commit message to an issue (case) number like Case 123 are converted to links to the Fogbugz case in question. In turn, the case that was mentioned has a note appended to it with the commit message.
Current Structure
The project file structure is currently something like this:
- /
+- includes/
| +- functions-related-to-abc.php
| +- functions-related-to-xyz.php
| +- class-something.php
| +- classes-several-things.php
| +- random-file.php
| ...
|
+- development/
| +- a-plugin-folder/
| | +- some-file.php
| | +- file-with-sensitive-and-non-sensitive-info.php
| | ...
| |
| +- some-backend-functions-related-to-coding.php
| ...
|
+- index.php
+- test-config-file.php
...
Target Structure
The structure I want is something like this:
- /
+- build/
+- doc/
+- src/
| +- functions/
| | +- abc.php // renamed from includes/functions-related-to-abc.php
| | +- xyz.php // renamed from includes/functions-related-to-xyz.php
| | ...
| |
| +- classes/
| | +- something.php // renamed from includes/class-something.php
| | +- several-things.php // renamed from includes/classes-several-things.php
| | ...
| |
| +- view/
| | +- random-file.php // formerly includes/random-file.php
| ...
|
| +- development/
| | +- some-backend-functions-related-to-coding.php
| | ...
| +- index.php
| ...
|
+- test/
...
a-plugin-folder would move to its own, separate repository. test-config-file.php would no longer be tracked in the repository at all. Ideally, I will also do some minor pruning and renaming of branches while I'm at it.
In my dream world, file-with-sensitive-and-non-sensitive-info.php would somehow be tracked consistently, but with the sensitive info (a couple of passwords) yanked out into a config file that is not under version control. I realize that's probably wishful thinking.
My Current Thinking
My current thinking is that my wish list is basically impossible: I can create new, properly structured repositories from this point forward, but cannot preserve my change history and also make the radical structural changes I need to make. In this view, I should take the current code base, reorganize it all the way I want it, and commit it as changeset 1 for two new repositories (the root repository and the plugin repository). I would then just keep a copy of the old repository backed up somewhere for reference. Major downsides: (1) I lose all my history and (2) the Kiln and Fogbugz cross-references for historical commits are all toast.
My Question
So, here's the question: is there any way to do what I want -- restructure, pull a few files out, and get everything looking pretty -- without losing all of my history?
I have considered using the hg convert extension, making heavy use of the filemap, splicemap, and branchmap options. The problems I see with that approach include: (1) breaking all prior builds, (2) not having file-with-sensitive-and-non-sensitive-info.php in prior builds at all (or leaving it in, which defeats the point), and (3) rendering many of the commit messages wildly incorrect to the extent they refer to file names or repo structure. In other words, I'm not sure this option gains me much as opposed to just starting clean, properly structured repositories.
I have also considered the extreme option: writing a custom script of some sort to build a new repository by going through each existing commit, stripping sensitive information out of file-with-sensitive-and-non-sensitive-info.php, rewriting commit messages to the extent necessary, and committing the revised version of everything. This, theoretically, could solve all of my problems, but at the cost of reinventing the wheel and probably taking a ridiculous amount of time. I'm looking for something that isn't the equivalent of writing an entire hg extension.
EDIT: I am considering creating an empty repository, then writing a script that uses hg export and hg import to bring changesets over one at a time, making edits where necessary to strip sensitive information like passwords out of files. Is there a reason this wouldn't work?
Edit: I ended up taking a different approach from the one described below. My other answer explains what I ended up doing. That said, I am still very interested in a plugin like the one described below, so I am leaving this post up for reference if I find time to do it or it anyone else wants to take on the project.
I have determined that this is possible using import, export, and some patching at appropriate points in the repository's history.
The Algorithm
The short version of the algorithm looks like this:
Create a new repository
Loop through the existing repository's changesets, doing the following:
Export a changeset from the old repository
Import the changeset into the new repository without committing it
Make any necessary edits to the commit message and/or sensitive files
Commit the changeset in the new repository, preserving the (possibly modified) commit message and other metadata
Swap out the old and new repositories
Caveats:
Obviously, as with all history edits, this only works for non-public repositories which haven't been pulled by third parties.
Step 2 can and should be heavily automated to batch-process changesets with no editing required.
It will be necessary to halt execution whenever changes are required.
Making it Work
I have a very basic proof of concept batch file that proves this can work.
I am working on a Mercurial plugin to make this as easy as possible. That said, I am still open to better suggestions if anyone has any.
I was able to accomplish my goals. Here's what I ended up doing:
First, I "flattened out" (straightened) the repository by eliminating all branches and merges and turning the repo into a single line of commits. I had to do this because hg histedit -- the key to the whole cleanup -- doesn't work on history containing merges. This was okay with me, because there were no really meaningful branches or merges in this particular repository and there is only one author in the relevant history. I probably could have retained the branches and merged again as necessary later, but this was easier for my purposes. To do this I used hg rebase and the MQ extension. (Special thanks to #tghw for this extremely helpful answer, which helped me understand for the first time how MQ really works.)
Next, I used hg convert to create several repositories from the original repository -- one for each library/plugin that I needed to put into its own repository and one main repository for the rest of the code. In the process, I used --filemap and --branchmap to reorganize everything as necessary.
Third, I used hg histedit on each new repository to (1) clean up irrelevant commit messages as needed and (2) remove sensitive information.
Fourth, I pushed all of the new repositories to Kiln, which automatically linked them to FogBugz cases using the same rules I had in place for the original repository (e.g., Case 123 in the commit message creates a link to FogBugz case # 123).
Finally, I "deleted" the original repository in Kiln. Kiln doesn't truly and permanently delete repositories as of right now, though I have proposed a use case for making that possible. Instead, it delinks FogBugz cases and puts the "deleted" repository into cold storage; an account administrator can restore it, but it is otherwise invisible.
All told, it took about 10 hours to split the original repository into 6 pieces and clean each part thereof. Some of that was learning curve; I could probably do the whole thing in more like 6 hours if I had to do it again. A long day, but worth it for the dramatically improved repository structure and cleaned-up code.
Everything is now as it should be. Hopefully, this will help other users. Please feel free to post a comment if you have a similar issue and would like additional insight from my experience.

Ignore a directory when pull/push in mercurial

I've got a situation where I have a dev and QA version of a repo.
I have a directory foo, which contains code that I always want to be pushed/pulled.
I have a directory bar, which typically contains items specific to that region (what they are is irrelevant), but sometimes will contain items that I do want to push/pull.
push to qa - in such a way that it will only take foo, but ignore the changes in bar.
push to qa - in such a way that it will take both foo and bar.
push to qa - in such a way that it will take foo and some of bar.
From what I can tell, I would need to merge, then revert certain files back and then commit these. This is OK, but seems a bit backwards. Is there a better way to get Mercurial to support this workflow (Or a similar one)?
As Chris Morgan says, that's not really the way that Mercurial works. When you push to another repository, you push a number of changesets - anything that's in those changesets gets pushed.
One possible solution, although it's a very horrid workaround (in my opinion) would be to use the Convert extension to create a new repository, filtering out what you don't want by using the --filemap option.
However, if this is going to QA, would they need a Mercurial repository anyway? You could simply archive a (tagged) version. By default the archive will contain information on the changeset that you're archiving in the .hgarchival file. For example
hg archive -X "bar/**" archive.tgz
... will create a file archive.tgz containing an archive of your repo at the current point, without the "bar" directory. The -X option can be used multiple times if you wish to exclude specific files.

Creating Mercurial subrepositories, while maintaining the history

I am about to make some major changes to my Mercurial repositories. As I am going to be using a Feature of Last Resort, I am looking for some advice and reassurance that I am not doing something stupid.
Where I Am:
I have a Mercurial repository with a complete history of all of these files:
/source
/secret_subsystem
/unclassified_subsystem
/common_files
Source is the Mercurial repository.
The secret subsystem folder contains code which is intellectual property we want to keep in-house.
The unclassified subsystem folder contains code which we want to outsource to a third-party to maintain.
The common files folder contains code that both subsystems depend on. We will be keeping ownership, but we want to share it with the third-party.
Obviously, I can't just push out my whole repository to the third-party company. The third-party would see too much.
Where I Want To Be:
Having read up on subrepositories, this is where I think I need to be:
Have THREE subrepositories: secret_subsystem, unclassified_subsystem, common_files.
Ensure there are no other files at the /source level, due to this recommendation.
Have the outsourcers create a brand new respository at the source level on their machines, and two corresponding subrepositories.
Push the unclassified_subsystem and common_files to the out-sourcer, pulling back unclassified_subsystem as required, pushing out new common_files repositories as required.
Maintaining History:
I would like to maintain the commit history, as much as practical, for all of the subsystems.
To do this, I will run the hg convert extension command three times, once for each subrepository. I will filter down to only the files that belong in each subrepository. I may also need to map filenames to move the files from ./common_files/foo.py to ./foo.py (for example).
My Questions:
1) Is dividing up a repository into subrepository a reasonable way of implementing security - viz. that a third-party can only see and edit some of our files?
2) Is using hg convert a reasonable way to create a subrepository from an existing repository, while still maintaining the history?
3) Will hg convert's filter strip out (a) all commits messages about files NOT in the filtered respository? Will it filter out all diffs for files NOT in the filtered repository?
There is another implied question: Am I heading into a world of hurt? If so, I will simply give up on retaining file histories, or even make them seperate repositories and forget about cross-repository commits.
I've not used subrepos so far, but I can answer 2) and 3):
2) Yes, sounds reasonable.
3) Yes.
There was a similar question just 2 days ago: Convert mercurial repository to subrepositories with full history (like hg log -f)

Mercurial and Clearcase, moving existing repo to another view

We've been running out of a Mercurial repo from within a CC snapshot view successfully for some time now. We have the source repo on the view, and the team's base repo is a clone from that one. That keeps a layer of separation to make checkout-checkins in CC easier to manage.
Now, for reasons internal to where I work, we need to switch to a new view. How can we do this? There are other teams within the company checking in files directly to CC (hopefully we'll convince them away soon), so that should be a consideration.
How can I overlay our existing repo into a new view (and then I can rebase the team's base repo no problem)?
The problem is the delta that might exist between your current Mercurial repo and the new snapshot view (especially with a different config spec).
Since the OP mentions in the comments that the config spec of the new view won't change, he suggest a simpler method than the one below:
Load the new snapshot view content
remove all its files from the disk (not from ClearCase)
copy the .hg directory of the original Mercurial repo in the new (empty) view
update the working tree of said Mercurial repo (all the files are back in their original place, but detected by ClearCase as hijacked)
cleartool update -overwrite in order to force ClearCase to erase those files by the versions from ClearCase. (see man cleartool update)
Mercurial would then detect any change between the files restored by ClearCase and the ones managed in the repo.
(Original answer)
I would:
create a branch dedicated for that migration in Mercurial,
compare its content with the snapshot view (without putting yet any Mercurial repo in it)
import and resolve the differences from the ClearCase view to the Mercurial repo
and then, once the content is identical, clone the Mercurial repo directly within the snapshot view.
The rest would be about:
fetching that dedicated migration branch to the team's base repo
merging that branch to the main development branch within team's base repo

Mercurial - merging same changeset to a repository twice?

We have these Mercurial repositories:
Trunk
|
|
|---------myapp_1_0_23 (created off release 1.0.23)
|
|---------myapp-newstuff (created off rel 2.0.4)
Release schedule (nothing yet released):
v1.0 from myapp_1.0.23, any add'l changes in this repo will get merged to the trunk
v2.0 from the trunk
v3.0 or v4.0 released based on a merge of myapp-newstuff and the trunk. At the time of the merge the trunk may have v2.0 code or some new features that we'll release from the trunk as v3.0
After making changes in myapp_1.0.23, we merge them to the trunk, but let's say we also need them in myapp-newstuff so we also merge them there. What then happens when we eventually merge myapp-newstuff code to the trunk?
The trunk already has changes made in myapp_1.0.23 so what happens when we merge those same changesets from myapp-newstuff back to the trunk? Will Mercurial be smart enough to know those changesets are already in the trunk?
Mercurial will handle this situation just great -- because you're using 'merge'. When you're using export/import (or transplant), cherry picking as it's called, and you have the same changesets in there multiple times with different node ids (due to different parents) then Mercurial can't know "Oh, this one's already here". However, so long as you're merging Mercurial will do a great job of saying "oh, this repo already has that changeset so I don't need to re-apply it".
The general rule of thumb is: "Make every change with as early a parent as you can and then merge down". If I have a bug that's in version one, two, and three, I fix it in one and then merge into two and then merge into three. If instead you fix it first in three, then you have to try to get it into two without bringing all the other changes in version three with it -- which is hard and often requires the very cherry picking we're trying to avoid.