How to work with overlapping repositories in Mercurial - mercurial

Often I want to have a main repository of source, shared by several "similar" projects. Each sub-project contains most of the same files, but is a specific configurable instance. That means there are usually a bunch of files and directories that need to be different for each instance.
In CVS I used to create the main repository and the secondary ones, then use the modules file to bind the two together for a specific name. In SVN I used svn:externals to tie back the secondary directories into the main one.
What works in Mercurial?

It depends on the nature of the specific files that need to be different.
If you can transform them as template files, then you can:
have a main repo shared as a SubRepo (as said in the documentation: SubRepos are the "closest to what you can achieve with Subversion directories marked with the svn:externals property")
have "similar" projects which will:
include that main repo as a subrepo (referencing a specific revision)
run a versioned script which will take those template files and build the actual files with the right values per environment.
That way you keep separate the templates (in the main repo) and the values (which each similar projects know about depending on their specific environment).
That being said, not every variation of files can be processed as "templates to be instantiated".

Related

Creating Mercurial subrepositories, while maintaining the history

I am about to make some major changes to my Mercurial repositories. As I am going to be using a Feature of Last Resort, I am looking for some advice and reassurance that I am not doing something stupid.
Where I Am:
I have a Mercurial repository with a complete history of all of these files:
/source
/secret_subsystem
/unclassified_subsystem
/common_files
Source is the Mercurial repository.
The secret subsystem folder contains code which is intellectual property we want to keep in-house.
The unclassified subsystem folder contains code which we want to outsource to a third-party to maintain.
The common files folder contains code that both subsystems depend on. We will be keeping ownership, but we want to share it with the third-party.
Obviously, I can't just push out my whole repository to the third-party company. The third-party would see too much.
Where I Want To Be:
Having read up on subrepositories, this is where I think I need to be:
Have THREE subrepositories: secret_subsystem, unclassified_subsystem, common_files.
Ensure there are no other files at the /source level, due to this recommendation.
Have the outsourcers create a brand new respository at the source level on their machines, and two corresponding subrepositories.
Push the unclassified_subsystem and common_files to the out-sourcer, pulling back unclassified_subsystem as required, pushing out new common_files repositories as required.
Maintaining History:
I would like to maintain the commit history, as much as practical, for all of the subsystems.
To do this, I will run the hg convert extension command three times, once for each subrepository. I will filter down to only the files that belong in each subrepository. I may also need to map filenames to move the files from ./common_files/foo.py to ./foo.py (for example).
My Questions:
1) Is dividing up a repository into subrepository a reasonable way of implementing security - viz. that a third-party can only see and edit some of our files?
2) Is using hg convert a reasonable way to create a subrepository from an existing repository, while still maintaining the history?
3) Will hg convert's filter strip out (a) all commits messages about files NOT in the filtered respository? Will it filter out all diffs for files NOT in the filtered repository?
There is another implied question: Am I heading into a world of hurt? If so, I will simply give up on retaining file histories, or even make them seperate repositories and forget about cross-repository commits.
I've not used subrepos so far, but I can answer 2) and 3):
2) Yes, sounds reasonable.
3) Yes.
There was a similar question just 2 days ago: Convert mercurial repository to subrepositories with full history (like hg log -f)

"factorizing" a mercurial repository on kiln

Summarized questions:
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
Detailed Questions:
I have begun the process of creating multiple repositories representing several projects that have shared components, and that is going well, thanks to SO and some helpful answers to my question here
As I move on to adding a second project I notice there are a few files in my projects that are duplicated, and are essentially the same thing, with enough similarity to warrant taking them out of a main project repository and creating a new subrepository so they can be
used by any new projects I begin, and
removed from other existing repositories, since they are identical.
I am assuming the best way is to simply create a new repository, move the files across on the local file system, push both repositories, and then create a .hgsub file and proceeed as in the answer to my earlier question. This would obviously then shift the files concerned to a subfolder in the local file system under each main project, which i can live with, but it does raise the hypothetical question - is it possible to have a list of files in a repository that are effectively part of a sub repository but reside alongside other files (i.e. not in a sub folder).
If I wanted to (for example) have a "acme.h" file in each project that is part of another repository could I do this? as it happens, I don't need to do this at this point in time, and in my current situation it would be better from a design point of view to have the files I need to "refactor" into another repository in their own subfolder, however that might not always be the case. I use refactor in quotes here, as strictly speaking it's more about refactoring duplicated files that is refactoring code - however the same principle applies.
hopefully my questions are succinct enough to be answered without too much more explanation.
Thanks for summary, makes it much easier to answer!
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
You can use the convert extension to extract a directory from an existing Mercurial repository. You'll want to use the --filemap flag and in the filemap you include the directory you want and rename it to the root. See hg help convert for more info.
After you get a smaller repository with the
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
They must be in their own folders. This is simply because that's how a repository looks like in Mercurial, Git, Subversion, ... When you're dealing with subrepositories, then Mercurial is not tracking the files inside the subrepo: it's just asking some (other) system to make a checkout of repository foo at some location.
So when your .hgsub file has
foo = foo
bar = [git]bar
baz = [svn]baz
then Mercurial will notice this on hg update and run
hg clone default-path-of-this-repo/foo foo
git clone default-path-of-this-repo/bar bar
svn checkout default-path-of-this-repo/baz baz
for your. This explains why subrepostories are directories in the outer repository: that's simply what a clone/checkout looks like these days.
As you can see, subrepositories can be of different types. It's conceivable that someone could add a RCS subrepository type for tracking individual files. They would then not have to live in a directory.

Mercurial (Hg) and Binary Files

I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.

How can I retrieve only a subdirectory from a Mercurial Repository?

I'm trying to sell our group on using Mercurial as a source repository rather than VSS. In the process of updating our build scripts, I'm running into an issue trying to retrieve files from the Hg repository.
Our builds are automated with NAnt and currently work for local builds or builds from VSS (ie, pull the source as needed from VSS). I'm trying to update them to work with Mercurial as well.
Basically, when I'm working with single files, I don't have any issues since I can just use NAnt's 'get' task (after getting the appropriate revision hash) to retrieve the individual file.
The problem that I'm having is when I need to work with a directory (and subdirectories) of files that aren't at the root of the repository. I can't seem to figure out the proper commands to retrieve/copy a subdirectory from the repository to my 'working' directory for the builds. I've spent basically the whole afternoon trying to figure out how to do this with the mercurial executables (so I can use a NAnt 'exec' task), and have basically hit a wall so I figured I'd try posting here.
Can someone confirm whether this is possible, and provide some suggestions as to how I might be able to do this? I realize that Mercurial tracks changes by files and not directories, but it seems odd to me that this isn't available out of the box (from what I can tell).
If it's just not possible, the only workarounds I see are either maintaining NAnt fileset lists of expected files to work with (ugh!), or cloning the entire repository to a temporary directory and then just copying the files from that source as needed (this feels like a cludge to me).
I realize that I could simply create another repository for the directory that I want to work with, but I'd prefer to not go that route since I think that would increase the complexity of what I'm trying to do by a significant amount (I would have to apply this a large number of times for all of the different libraries that we build..).
Mercurial doesn't let you get only part of a repository. You have to get the whole tree. It's much more whole-repo focused than svn is.
You could try and segment your repository into multiple repos and manage them using the subrepos feature. Then you can pull the subdirectories independently.

Can one Mercurial repository live inside another Mercurial repository?

Can one hg repo live inside another hg repo on my local file system?
I am pulling down the bitbucket wiki for 'sandbox', and I want to know if this should be placed in repos/sandbox/wiki or repos/sandbox-wiki.
Is the former okay to do?
Edit: See Subrepository.
The short answer is yes, but I can't imagine why you would want to.
In your example, I think you should go with:
repos/sandbox-wiki
[edit] Additionaly:
Yo Dowg, I herd you like repositories.
So we put a repo in your repo so you can version while you version
:-)
Yes and no. Depends on what you want to do. You can create repo 'sandbox/wiki' but files in this inner repos won't be commited in the outer 'sandbox' repo (#Jason is right). If you don't want to, no problem.
Try explicitly adding files from wiki repos in sandox and you'll get the message below. If you just add path to some directory containing an inner repo the files will just be ignored.
From sandox root directoy:
hg add wiki/myfile
abort: path 'wiki/myfile' is inside repo 'wiki'
Mercurial does not allow nested repositories, but there is at least one reason for them:
Imagine that you are working in a project: /MyProject. In this folder you put everything: code, documentation, tests, etc.
You want to backup your work because it is very important, so you create a repository for /MyProject. Then, overtime you use bundles to save the evolution of /MyProject and back up them in a USB flash memory so that you can recover everything just in case your hard drive breaks.
Remember that /MyProject contains everything. And among all those things, there are the main code and some auxiliary projects. You also want to track the progress of an auxiliary project that is in /MyProject/AuxiliaryProject, so you use Mercurial to track its evolution.
Also, you want to have a separate repository for the main code: /MyProject/Main
In this situation you want nested repositories: one big one for being able to back-up everything using bundles and child repositories for managing each subproject.
I think Mercurial should give the user several options when initializing a repository. For example:
- ignore nested repositories
- include nested repositories but ignoring .Hg folders (i.e. act as if there were no nested repositories but do not ignore the information contained in the nested respositories).
- include nested repositories and also include .Hg folders (makes sense for back-up purposes)
--------- Edit:
Subrepositories is a feature that is work in progress:
https://www.mercurial-scm.org/wiki/subrepos
Also, there is an extension named "forest" that might become obsolete in the future:
https://www.mercurial-scm.org/ForestExtension
You'd need to set up an .hgignore file in sandbox to exclude wiki because mercurial assumes that it is responsible for all descendants. This would probably generate more user confusion than it is worth.