Common files in Mercurial - mercurial

We have a Mercurial repository with many projects, each resides in its own directory:
- Main Repo
- Project A
- Project B
- ...
Each of the projects is self contained and must reside in it's own directory, but there are some common files that should be similar between projects.
For example, some projects are websites, and they share a common javascript library we develop. When changing the library in one project, we would like it to change in other projects too, but the file must reside in each of the projects.
I read about sub-repos but they don't seem a good solution for this.
Is there a way to accomplish this in Mercurial?

You are looking for a feature for keeping the same file version in multiple places, also known as file cloning or file sharing in other types of source control, like Sourcesafe or Vault. There is no mechanism like this in Mercurial. Every file is a single entity with a single location.
The first solution you have is to keep the common libraries in a separate place. You need a single copy that can be accessed by all your projects. It does not matter if you use sub-repos or not, they can all be in the same repo, as long as your folder structure includes everything, but sub-repos can be easier to manage if your projects are not related.
The other solutions you have could be to state an internal policy to always sync and commit the common libraries manually (which I do not suggest as it is error-prone and requires effort), or to create a script, either as hook or not, to sync your files, before a commit or after an update (which is more tedious to establish and maintain anyway)...
Conclusion, go for the separation of your common libraries. You'll be glad you spent the extra time to set everything up correctly from the start.

Under Unix you could use soft-link (ln -s) for shared files and Mercurial will detect / save / create them. Just don't use absolute or empty path.
With Windows symbolic links won't work:
Tracking hard or symbolic links with mercurial on Windows
Bug 1825 - junction/parse point for windows directory symlinks
In my experience (local Linux repository) using symlinks to handle shared files works but it's usually better to create a library that contains the common files.

Even if you have one repository for all your projects, it is advised to have a separate library/tool/etc. repository(ies) for the common code(s).
The way you can "use" this code inside your project will then heavily depend on your technology and infrastructure: java/maven/ant world, linux distrib, ruby gems etc. You will generally have some kind of "dependencies specification" language where you can specify that you need such and such library. In a Gemfile for rails, using autoconf for C/C++ etc. Most of the time you can also specify a specific version (or greater than etc..) which allows taking care of API changes.
Basically it is not advised to solve this issue at SCM level but instead to use the right framework for decoupling your common code from the projects repositories.

Related

Storing PCB files *and* software files in the same Mercurial repo

I have my Banana Pi set up as my Mercurial server. It works well for me for my software as generally speaking I have firmware and that's about it in my repositories. I can access it via open VPN from anywhere in the world. However, I have started to use version control for my PCB files as well now, due to a new CAD system which complicates my old, crude but effective way of doing my PCB archiving and backup. (Also, everything in my new CAD system, all the PCBs and schamtics, are text files which makes version control work nicely.)
So, with Mercurial I started doing as I did with software and creating a new repo for my PCB for one of the boards I'm updating for a customer, and immediately came across an issue that svn seems to cope with easily and I was wondering whether Mercurial can do the same.
I have my BH0001 project repository which has all the embedded C in it and I have started creating a new issue of the PCB for which the C code is used. I had to create a new Mercurial repo called BH0001_pcb to differentiate between code and PCB. With svn you can have a project repo and then Hardware and Software directories within the project number, but still be able to check out the two different types of files to different places independently.
I could, of course, clone the BH0001 software repository to a local machine, add the PCB info in a new folder in the local Mercurial repo send it all back to the server and it would be perfectly happy. The problem then comes when checking out because I would be cloning both firmware and PCB on to a machine when I might only want one or the other.
Also, this goes against how I store stuff locally. In my /username/home directory I have a Software directory and a CAD directory and within those I have projects. So I would have:
home/CAD/CustomerName/BH0001
and
home/Software/CustomerName/BH0001.
If I'm to carry on using my current method do I have to:
Change my local directory structures to be something like:
home/Projects/CustomerName/BH0001/CAD
and
home/Projects/CustomerName/BH0001/Software
Suck it up and use things like ProjectName_pcb for separate repos.
Some other way I can't think of/can't find/am unaware of? e.g. There's a way of checking out part of a Mercurial repository to one directory and a different part of the repo to a different directory.
Or should I just use svn if I really want to carry on as I have?
With default mercurial you currently cannot do partial repository clones as you can do with SVN. So your approach to use separate repositories is a good choice.
However there ways to achieve a similar result: sub-repositories. In your case I'd create a parent repository which contains your two current repositories as sub-repositories. Mind though, sub-repositories have some rough edges, so read the linked page carefully - I'd like to especially stress that it's good practise to have a parent repo which basically only contains the 'real' repos but not much on its own.
There exist ideas like a thin or narrow clone (which is somewhat identical to what SVN does), but I haven't seen them in production.

Advice on the structure of my repository

I am an applied mathematician and I have recently joined a project that involves the development of production code for our scientific application. The code base is not small and it's deployed as part of a web application.
When I joined, the code was miraculously maintained without a revision control system. There was a central folder in a server and researchers would copy from it when they needed to work with the code. Inside this root directory there was a set of directories with different versions of the code, so people would start working on the latest version they found and create a new one with their modifications.
I created a Mercurial repository, added all code versions to it and convinced everyone to use it. However, since moving to Mercurial, we have felt little if any need to upgrade version numbers, even tough using hg copy allows us to keep revision history.
Here's where I need your advice on best practices of maintaining this code base. Does it make sense under a RCS to keep folders with different versions in a repo? If we keep a single copy of our code in the repo, what's the most common way to track versions? The README files? Should we keep snapshots of the code outside the repo specifying versions? Does it make sense to keep things as they are? What strategies do you use?
Our team is a bunch of scientists and no one has experience on how to maintain such a repo, so I'm interested in what is commonly done.
If you are going to use a version control system, forget about those version folders. Completly. Mercurial will do that for you, the repository is a complete history of all files of the project.
A common way to track version numbers is with tags. You assign a tag with the version number to a changeset.
To help you, as a "getting started guide" in version control, I suggest this book: Version Control By Example. It's free, and it starts from the beginning, it talks about CVCS, DVCS, fundamentals, what a repository is, basic commands, etc. It has also some interesting analogies, like the 3D file system: Directories x Files x Time. The book is fun and easy to understand, I highly recommend it.
I also recommend some GUI software like TortoiseHg. In daily usage, I spend most of the time in the console, but the GUI is very handy specially in the beginning when you still don't know all the commands. And the best part is the graph, you have a visual feedback of what is going on.
This is a good and quick introduction to Mercurial, it even starts out by talking about how using folders to keep different versions is not so great.
I think you're probably on the wrong track if you are using the hg copy command, I've never needed it ;)
The tutorial teaches the command line version of hg, which I personally prefer. When you need a better overview of your repository, you can run "hg serve" and open localhost:8000 in your web browser. I prefer that over TortoiseHG, but I realize that many people want a pure GUI tool.

How to cleanly handle source code and data in a repository

I'm working on a collaborative scientific project that is made up by a handful of Python scripts (1M max) and a relatively large dataset (1.5 GB). The datasets are tightly linked to the python scripts since the datasets themselves are the science and the scripts are a simple interface to them.
I'm using Mercurial as my source control tool, but I am not clear on a good mechanism to define the repository. Logistically it makes sense to bundle these together so that by cloning the repository you'd get the entire package. On the other hand, I'm concerned about the source control tool dealing with large amounts of data.
Is there a clean mechanism to handle this?
If the data files change rarely and you normally need all of them anyway, then just add them to Mercurial and be done with it. All your clones will be 1.5 GB, but that is just the way it has to be with that amount of data.
if the data is binary data and changed often, then you might try to avoid downloading all the old data. One way to do this is to use a Subversion subrepository. You will have a .hgsub file with
data = [svn]http://svn.some.edu/me/ourdata
which tells Mercurial to make a svn checkout from the right-hand side URL and put the Subversion working copy into your Mercurial clone as data. Mercurial will maintain an additional file for you called .hgsubstate, in which it records the SVN revision number to checkout for any given Mercurial changeset. By using Subversion like this, you only end up with the latest version of the data on your machine, but Mercurial will know how to get older versions of the data when needed. Please see this guide to subrepositories if you go down this route.
There is an article on the official wiki about large binary files. But the proposition of #MartinGeisler is a really nice new alternative.
My first inclination is to separate the python scripts out into their own repository, but I really need more domain information to make the "right" call.
On the one hand, if new datasets will be created then you would want a core set of tools to be able to handle all of them, right? But I can also see how new datasets may introduce cases that the scripts may not have previously handled... although it seems like in an ideal world you would want scripts that are written in a general way so they can handle future data and existing datasets??

Perforce like client specs mappings with Mercurial

We recently moved from Perforce to Mercurial and love it!
One little problem: after much research we can't figure out how to map a special directory in the repository to some special place on the client. Here is an example of our hg repo:
/foo/source files
/bar/source files
/build
/macosx/mac make files
/win/windows make files
With Perforce, we were using client spec mappings to map //depot/build/macosx/... to just /build/... on the Mac client, and //depot/build/win/... to /build/... on the Windows dev box. Directories foo and bar are synced as is. Makefiles in /foo and /bar assume that our build makefiles are located in /build and we would like to keep them as is. The final client set of files should look like this:
/foo/source files
/bar/source files
/build/client specific make files
I've read about subrepos, but this solution does not seem to be client specific.
Any idea how to solve this problem will be very much appreciated!
You can't check out only portions of a repository with Mercurial.
You always get a clone containing everything, and the working directory will also contain everything.
With Mercurial you should strive to have 1 repository for 1 project, so that everything you get logically belongs together, and then you shouldn't have much need for just a portion of it.
This also means that whatever directory structure you have in your Mercurial repository will always match exactly the structure you have on disk.
You can't do this with Mercurial as it doesn't have the concept of a client separate from a depot.
However, you can use a symlink on Mac OS X (ln -s) and a junction on windows (mklink on Vista and up using the junction tool on XP http://technet.microsoft.com/en-us/sysinternals/bb896768.aspx) to solve this problem on the file system level.
Alternatively you can use a variable in the Makefiles to refer to the build directory (eg $(BUILD)/something.ext instead of build/something.ext).
This sort of mapping cannot be done in Mercurial. There is an outstanding TODO item for 'narrow' clones so you can check out just a subdirectory. And I could see an implementation of that supporting that sort of functionality. But then again, I know that something like this would be considered a little too 'clever' (read complex) and there would be a lot of push-back on the idea.
In the meantime, I would suggest one of these two solutions.
Symbolic links. Put the symbolic link to your build directory in your .hgignore file. Then each person can make their own symbolic link to the appropriate directory of build files. This has the disadvantage of not working on a platform without symbolic links.
An environment variable that's used in a top level makefile to construct the path to the platform specific makefile it should be calling.

Mercurial (Hg) and Binary Files

I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.