I am an applied mathematician and I have recently joined a project that involves the development of production code for our scientific application. The code base is not small and it's deployed as part of a web application.
When I joined, the code was miraculously maintained without a revision control system. There was a central folder in a server and researchers would copy from it when they needed to work with the code. Inside this root directory there was a set of directories with different versions of the code, so people would start working on the latest version they found and create a new one with their modifications.
I created a Mercurial repository, added all code versions to it and convinced everyone to use it. However, since moving to Mercurial, we have felt little if any need to upgrade version numbers, even tough using hg copy allows us to keep revision history.
Here's where I need your advice on best practices of maintaining this code base. Does it make sense under a RCS to keep folders with different versions in a repo? If we keep a single copy of our code in the repo, what's the most common way to track versions? The README files? Should we keep snapshots of the code outside the repo specifying versions? Does it make sense to keep things as they are? What strategies do you use?
Our team is a bunch of scientists and no one has experience on how to maintain such a repo, so I'm interested in what is commonly done.
If you are going to use a version control system, forget about those version folders. Completly. Mercurial will do that for you, the repository is a complete history of all files of the project.
A common way to track version numbers is with tags. You assign a tag with the version number to a changeset.
To help you, as a "getting started guide" in version control, I suggest this book: Version Control By Example. It's free, and it starts from the beginning, it talks about CVCS, DVCS, fundamentals, what a repository is, basic commands, etc. It has also some interesting analogies, like the 3D file system: Directories x Files x Time. The book is fun and easy to understand, I highly recommend it.
I also recommend some GUI software like TortoiseHg. In daily usage, I spend most of the time in the console, but the GUI is very handy specially in the beginning when you still don't know all the commands. And the best part is the graph, you have a visual feedback of what is going on.
This is a good and quick introduction to Mercurial, it even starts out by talking about how using folders to keep different versions is not so great.
I think you're probably on the wrong track if you are using the hg copy command, I've never needed it ;)
The tutorial teaches the command line version of hg, which I personally prefer. When you need a better overview of your repository, you can run "hg serve" and open localhost:8000 in your web browser. I prefer that over TortoiseHG, but I realize that many people want a pure GUI tool.
Related
I have my Banana Pi set up as my Mercurial server. It works well for me for my software as generally speaking I have firmware and that's about it in my repositories. I can access it via open VPN from anywhere in the world. However, I have started to use version control for my PCB files as well now, due to a new CAD system which complicates my old, crude but effective way of doing my PCB archiving and backup. (Also, everything in my new CAD system, all the PCBs and schamtics, are text files which makes version control work nicely.)
So, with Mercurial I started doing as I did with software and creating a new repo for my PCB for one of the boards I'm updating for a customer, and immediately came across an issue that svn seems to cope with easily and I was wondering whether Mercurial can do the same.
I have my BH0001 project repository which has all the embedded C in it and I have started creating a new issue of the PCB for which the C code is used. I had to create a new Mercurial repo called BH0001_pcb to differentiate between code and PCB. With svn you can have a project repo and then Hardware and Software directories within the project number, but still be able to check out the two different types of files to different places independently.
I could, of course, clone the BH0001 software repository to a local machine, add the PCB info in a new folder in the local Mercurial repo send it all back to the server and it would be perfectly happy. The problem then comes when checking out because I would be cloning both firmware and PCB on to a machine when I might only want one or the other.
Also, this goes against how I store stuff locally. In my /username/home directory I have a Software directory and a CAD directory and within those I have projects. So I would have:
home/CAD/CustomerName/BH0001
and
home/Software/CustomerName/BH0001.
If I'm to carry on using my current method do I have to:
Change my local directory structures to be something like:
home/Projects/CustomerName/BH0001/CAD
and
home/Projects/CustomerName/BH0001/Software
Suck it up and use things like ProjectName_pcb for separate repos.
Some other way I can't think of/can't find/am unaware of? e.g. There's a way of checking out part of a Mercurial repository to one directory and a different part of the repo to a different directory.
Or should I just use svn if I really want to carry on as I have?
With default mercurial you currently cannot do partial repository clones as you can do with SVN. So your approach to use separate repositories is a good choice.
However there ways to achieve a similar result: sub-repositories. In your case I'd create a parent repository which contains your two current repositories as sub-repositories. Mind though, sub-repositories have some rough edges, so read the linked page carefully - I'd like to especially stress that it's good practise to have a parent repo which basically only contains the 'real' repos but not much on its own.
There exist ideas like a thin or narrow clone (which is somewhat identical to what SVN does), but I haven't seen them in production.
At work we're moving from no SCM to Mercurial. It's a bit of a learning curve, but after messing with it for two days I definitely feel more comfortable with it.
I still have one big, unresolved question though in my mind: Once code is finished, how do we handle the actual deployment?
Should we be running a copy of Mercurial on the production (live) server? Or should we set rsync or something up to sync from the repo to the web directory? What's the best practice here?
If we do go w/ just pointing apache to the repo, I assume this is okay as long as we're careful not to hg update to a different, non-stable branch? That still seems a little dangerous to me though. Is there some way to force it to only switch to certain builds?
Or is pointing apache to the repo just a terrible idea and I should be doing something else instead?
On a related topic, I've also heard some talk about putting any upgrade scripts (such as schema changes for MySQL) under version control so they can be ran when the version is deployed. But how would that even work as part of the workflow? I wouldn't want to keep it w/ everything else, because it's a temporary one-time use script...
Thanks for any advice you guys can give.
I recently discovered the hg archive command, so I think we'll go w/ this instead. I've written a bash script that changes to the head of the 'production' branch then archives it to a predetermined destination. Seems to work.
I'd still appreciate any feedback you guys have as to whether this is a good idea or not.
I think pointing apache to the repo is definitely a bad idea, hg archive is ok if all you want is to take a snapshot of the dev files.
I find my development source files and a deployed application (even for a web app that doesn't need compiling) are usually very different, the latter being derived from a subset of the former.
I tend to use a shell script or a even a Makefile to "build" a deployed application in a subdirectory of the development directory, this could just be creating a directory tree and copying necessary files or could include compressing scripts etc.
This way you have to make a conscious decision whether or not to include a file in the deployed version, thus helping prevent accidentally leaving development utility files in an online application that could cause a security risk.
The only part mercurial plays is, for a major release I create a new named branch (eg: 1.5), development continues on the default branch. Subsequent bug fixes or patches can be transplanted to the release branch if necessary and if a bug fix release is made I tag the release branch with the new version (eg: 1.5.1).
I'm an hg user since a couple a years and I'm happy about that!
I have to start a project as I never did before.
The idea is to develop a software with a batch mode and an GUI.
So there will be common sources to both batch and GUI mode but each one will also contain specific sources.
And, basically, I would like my coworkers to be able to clone the GUI version, work on it an commit changes.
Then, I'd like to be able to merge their changes on the common files with the batch version.
How can I deal with that?
Since I've been reading a bit on this topic, I would really appreciate any help!!
Thank you.
binoua
As the creator of subrepos, I strongly recommend against using subrepos for this.
While subrepos can be used for breaking up a larger project into smaller pieces, the benefits of this are often outweighed by the additional complexity and fragility that subrepos involve. Unless your project is going to be really large, you should just stick to one project repo for simplicity.
So what are subrepos for, then? Subrepos are best for managing collections of otherwise independent projects. For instance, let's say you're building a large GUI tool that wraps around an existing SCM. I'd recommend you structure it something like this:
scm-gui-build/ <- master build repo with subrepos:
scm-gui/ <- independent repo for all the code in your GUI tool
scm/ <- repo for the third-party SCM itself
gui-toolkit/ <- a third-party GUI toolkit you depend on
extensions/ <- some third-party extension to bundle
extension-foo/
Here you do all your work in a plain old repo (scm-gui), but use a master repo at a higher level to manage building/packaging/versioning/tagging/releasing the whole collection. The master scm-gui-build repo is just a thin wrapper around other normal repos, which means that if something breaks (like one of the repo's URLs goes offline) you can keep working in your project without problems.
(see also: https://www.mercurial-scm.org/wiki/Subrepository#Recommendations)
I'm using Mercurial with TortoiseHg. Each developer has their own repositories, and there's one central repository on the server for synchronizing our changes. (This will sound lame, but we're using it to manage the source for a legacy VB6 project. Nothing we can do about that...)
As has been pointed out elsewhere, there is a big problem in VB6 with merging the .frx (form resources) files. So code changes seem to merge fine, but if two developers both make changes at the same time in the form design view, we can't merge.
I'm ok with disallowing concurrent edits, but of course the whole point of Mercurial is that it's distributed so there is no option to force a file to be locked before editing. I don't believe there's a Mercurial solution for this, so I'm wondering: other developers who are using Mercurial for version control, do you have some 3rd party tool that assists with locking files for editing in the cases where it's necessary? Did we make a mistake using Mercurial instead of something like SVN?
Heard of some people using a standalone lock-server (this one in particular).
This is from Bryan O'Sullivan's book on Mercurial:
There is no single revision control
tool that is best in all situations.
As an example, Subversion is a good
choice for working with frequently
edited binary files, due to its
centralised nature and support for
file locking.
Im in the process of trying to get my head round a dvcs such as mercurial. Im getting quite confused with certain points though. Firstly, a bit of context:
At the minute i mostly use subversion, and it works fine for my workflow,
Mostly the repository is for my own use, im the only web developer,and i only ever submit raw code to my manager, he never has to see the repository.
I use the repo to create major versions, and as backup so i can revert to it when something doesnt work out.
The repo also acts a file share, enabling me to work from the same codebase at work and at home.
My main reason for wanting to switch to mercurial, is the offline commits and easier branching / merging.
Firstly can anyone tell me how i would get mercurial to fit this workflow?
How do i go about sharing multiple repositories (i.e. one for each project) between computers?
Any help would be hugely appreciated,
Thanks
http://hginit.com/
There is a fantastic pre-chapter there specifically for SVN users. The rest of the tutorial will get you on your feet fairly quickly.
I'll answer just one part of your question, that of how to manage access to your repository from both home and work, because this is one of the situations where distributed version control is really useful.
The answer is that your two repositories are clones of one-another (to be correct, one is the clone of the other). You do some work during the day, check it in, then pull that work to your home repository (or push, but that requires more work). The next morning, you do the same thing in reverse. Mercurial comes with a built-in read-only HTTP server that makes it really easy, provided that you can expose a port.
The end result is that you have two repositories (ie, automatic backup of the entire history). At any given point in time, one is "better" than the other, but since you're the sole committer to both, they won't diverge.