Setting up versioning repositories - one big or many small? - mercurial

I have worked with mercurial for some time now and it feels like a big asset to what I do. It's great to never mess up code again by accident.
I love the workflow and wants to setup versioning on all my projects. What is the best option of the two below illustrated alternatives?
Alternative A:
/repo/
/repo/ownWork/
/project1/
/repo/clients/
/client1/
Alternative B:
/repo/project1/
/repo/client1/

I don't think there is a "right" answer. As with many things, it depends.
Personally, I have a separate repository for each project and, possibly, one or more repositories for shared code. With distributed source control you have to check out/clone the whole repository, not just sub-folders like you can with, say, SVN. Therefore I like to keep each project/client as self contained as possible but, if necessary, clone shared repos too.
However, I still maintain a single 'central' web server to host them all. I like 'distributed' and I like 'centralised' too :-)
The good thing about hg is that it seems (to my newbie eyes, anyway) to be very easy to chop and change your layout/structure as time progresses.

In mercurial it's very easy to combine repos later, but not possible to separate them without invalidating existing clones. Start separate and merge later if it has become a hassle. Consider subrepos for code shared among projects.

Related

Mercurial managing subtle variations/configurations of a same project

I am currently using Mercurial, along with the Guestrepo extension, to manage and version the different components of a project. I have come to a quite stable workflow to manage the different versions of the components.
However, I can't come up with an effective solution when it comes to versioning subtle variations of a component. This is, for example, a slightly different embedded device driver (different serial port speed for example), or a GUI which is written in English instead of German.
I don't think stacking them in the Release/Stable branch is a good workflow, as the proliferation of different configurations (English,Spanish,Chinese,...) could lead to a serious and nonsense bloat of the Release branch.
On the other hand, creating a separate Release branch for each, would lead me to many, many branches, which is not the best solution IMHO.
Creating separate repositories for each of the configurations would suppose a quite tedious task whenever a structural change had to be made, as all of the repos would have to be updated with that change.
Any idea on this?
Thank you.
As #EldadAK suggests, creating a repo for each configuration and importing core functionalities from other repos seems a nice idea.
However, I still can't figure out how to arrange "same but slightly different" components, which differ in some subtle features but share their core.
Is it a code architecture issue? Should the components be refactored so that the main core and the differing features lay in different components, which are related using custom builds for each configuration?
IMHO, you should keep all changes in your main branch. Managing all branches or even multiple repositories is not scalable and will eventually get out of hand.
I don't think you should care about the size of your release branch. By keeping it all together, you will always know where you are, what is included in a release and when you really need to branch, have all the changes accumulated to that point.
It's my personal opinion that you should try and keep it simple to manage looking many revisions and years ahead...
Note - Project managers tend to think about next week. You need to think about next year...
I hope this helps.
effective solution when it comes to versioning subtle variations of a component
While I can't see any serious drawbacks from using named branches in one repo, you can use another ("default" de-facto) solution for configuration management inside Mercurial: MQ
You have only slightly adopt your workflow (same amount of branches, same amount of repos) to MQ

Monolithic vs multiple Mercurial repos for modules within a suite of related applications

At my organization, we'd like to switch from using CVS to Mercurial for a variety of reasons. We've done a lot of investigation when trying to determine how we should organize our Hg repositories based on what we have in our codebase and how we tend to work. We've come up with satisfactory answers to most of our questions, but there's one point that's stumping us a little, and it's driving me mad because the most convenient way to organize the repo for our workflow just seems like the wrong way to go from a conceptual standpoint. I'm trying to figure out whether our perception of how this is "supposed" to work is wrong, or whether we're just bumping up against a legitimate feature gap in the available tooling.
Primarily we maintain a medium sized codebase consisting of a suite of applications that get all get released in the same package. Conceptually you can divide our code into three categories:
Shared code
Application code for our primary suite (uses the shared code)
Miscellaneous small utilities that are infrequently maintained (uses the shared code)
This doesn't seem unusual to me, but I want to stress the point that we maintain the application code and the shared code at the same time and always want them to be bleeding edge with respect to each other. That is, we want all our application builds to always use the latest version and the same version of the shared code. We frequently add to or modify application code and shared code at the same time. Currently, the shared code is in one CVS module, and the applications are all in their own separate modules. The shared code and application modules are checked out such that the shared code gets built once and then linked into each application. We frequently do cvs commits that include changes across shared and application modules at once. We would really like to keep that ability.
I understand that commits in Hg are atomic within repositories -- that's fine but I'd like to be able to diff and commit to an application and a shared library at the "same time" (i.e. I don't care if it's really atomic but I don't want to have to manually do two separate diffs and two separate commit actions).
Conceptually, it seems like it would be correct to have one or a few repos for the shared code, and a separate repo for each application and each little utility program. This means you'd need to check out multiple repos for each build but that isn't a problem for us. The problem is there doesn't seem to be any tooling that lets you view updates or changes on multiple repos at once, or diff multiple repos at once and then sequentially commit them for you. This would be easy to script, but that wouldn't help developers who want to use various GUI frontends to complement the command line.
It seems like in order to be able to commit across multiple codebases at once (from a user's perspective) and keep everything on the bleeding edge together, the only two solutions are:
Use a monolithic repo with EVERYTHING in it.
Create some subrepos but access/commit everything through a big monolithic "main" repo that contains all the subrepos to keep everything on the latest revisions (which doesn't seem any better than (1) to me).
It can't be that unusual to want to work with multiple "peer" repositories at the same time, even if the actions aren't truly atomic across all of them -- and yet I'm not finding tons of articles or posts clamoring for this ability.
In summary:
We would like to organize our code such that we can diff and commit application code and shared code at the same time from the user's perspective (they need not truly be atomic).
It seems like we should be putting application code and shared code in separate repositories.
Subrepositories tie parent repositories to specific revisions, which we do not want.
What am I missing here?
In my shop, we have many projects that are simply in separate repos, but the main application's repo has 2 other projects in it. One is a module that shares a significant amount of code with the main application, and the other is for database migrations for the application (it's even in a different language). I wanted related changes in both the application and the migrator to be committed together, inseparably. Altogether, all source files in this repo are between 10 and 11 MB.
So if putting everything in one repository is really what makes sense because you don't want to deal with subrepositories, then there's nothing wrong with putting everything in one repository. The one of mine is on the small side of medium, in my opinion. TortoiseHg's source is around 20 MB, OGRE is over 100 MB.
Without knowing more about your projects and their relationships, the impression I get is that a single repository would work just fine, and that you're not looking at this incorrectly.
If you change your mind, hg convert can help you extract projects into their own repository, maintaining the history of those files.
If the one-repository approach is not for you, then I think subrepos should be given a chance, as that is the only other method I know of for treating multiple repos cohesively that is supported in TortoiseHg (see the Recommendations section).
However, I'm not sure how you would deal with the inter-department access, given that it doesn't seem there is an established subset already shared with others.

Mercurial: Concrete examples of issues using hg pull --rebase

I'm struggling to find the mercurial workflow that fits the way that we work.
I'm currently favouring a clone per feature but that is quite a change in mindset moving from Subversion. We'll also have issues with the current expense we have in setting up environments.
Using hg pull --rebase seems to give us more of a Subversion-like workflow but from reading around I'm wary of using it.
I think I understand the concepts and I can see that rewriting the history is not ideal but I can't seem to come up with any scenarios which I personally would consider unacceptable.
I'd like to know what are the 'worst' scenarios that hg pull --rebase could create either theoretical or from experience. I'd like concrete examples rather than views on whether you 'should' rewrite history. Not that I'm against people having opinions, just that there already seem to be a lot of them expressed on the internet without many examples to back them up ;)
The first thing new Mercurial converts need to learn is to get comfortable committing incomplete code. Subversion taught us that you shouldn't commit broken code. Now it's time to unlearn that habit. Committing frequently gives you a lot more flexibility in your workflow.
The main problem I see with hg pull --rebase is the ability to break a merge without any way to undo. The DVCS model is based on the idea of tracking history explicitly, and rebasing subverts that idea by saying that all of my changes came after all of your changes, even though we were really working on them at the same time. And because I don't know what your changes are (because I was basing my code off of earlier changesets) it's harder for me to know that my code, on top of yours, won't break something. You also lose the branching capabilities by rebasing, which is really the whole idea behind DVCSs.
Our workflow (which we've built an entire Mercurial hosting system around) is based on keeping multiple clones, or branch repositories, as we call them. Each dev or small team has their own branch repository, which is just a clone of the "central" repository. All of my new features and large bug fixes go into my personal branch repo. I can get that code peer reviewed, and once it's deemed ready, I can merge it into the central repo.
This gives me a few nice benefits. First, I won't be breaking the build, as all of my changes are in their own repo until they're "ready". Second, I can make another branch repo if I need to do a separate feature, or if I have something longer-running, like for the next major version. And third, I can easily get a change into the central repo if there's a bug that needs to be fixed quickly.
That said, there are a couple different ways you can use this workflow. The most simple, and the one I started with, is just keeping separate clones. So I'll have website-central, website-tghw, etc. It works well, especially since you can push and pull between them locally. More recently, I've started keeping multiple heads in the same repo, using the remotebranches extension to help manage them and hg nudge to keep from pushing everything at once.
Of course, some people don't like this workflow as much, usually because their Mercurial server makes it hard to make server-side clones. In that case, you can also look at using named branches to help keep your features straight. Unfortunately, they're not quite as flexible as Git branches (which is why we prefer branch repos) but they work well once you understand how to close branches, and why you can't really get rid of them once you start one.
This is getting a bit long, so I'll wrap it up by encouraging you to embrace the superior branching and merging that Mercurial provides (over SVN). There is definitely a learning curve, but once you get the hang of it, it really does make things easier.
From the question comments, your root issue is that you have developers working on several features/bug fixes/issues at one time and having uncommitted work in their working directory along with some completed work that is ready to be pushed back to the central repository.
There's a really nice exchange that covers the issue well and leads on to a number of ways forward.
http://thread.gmane.org/gmane.comp.version-control.mercurial.general/19704
There are ways you can get around keeping your uncommitted changes, e.g. by having a separate clone to handle merges, but my advice would be to embrace the distributed way of working and commit as often as you like - if you really feel the need you can combine the last few local commits into a single changeset (using MQ, for example) before pushing.

Mercurial setup: One central repo or several?

My company is switching from Subversion to Mercurial. We're using .NET for our product. We have a solution with about a dozen projects that are separate modules with no dependencies on each other. We're using a central repo on a server with push/pull for our integration build.
I'm trying to figure out if I should create one central repo with all the projects in it, or if I should create a separate repo for each project. One argument for separate repos is that branching the individual modules would be easier, but an argument for a single repo is easier management and workflow.
I'm very new to hg and DVCS, so some guidance is greatly appreciated.
ETA: At hginit.com, Joel says:
[I]f you’re used to having one big
gigantic repository for the whole
company, where some people only check
out and work on subdirectories that
they care about, this isn’t a very
good way to work with Mercurial—you’re
better off having lots of smaller
repositories for each project.
It'd be great if someone could expand on this or point me to more documentation.
One thing you should take into consideration here is the fact that Mercurial does not support checking out directories like subversion does. One typical subversion setup is to have one giant repo with multiple separate projects in it, and when somebody needs code they will just checkout a subdirectory containing that project. You can't do this in mercurial. You either take the whole repo, or nothing. If everybody working on these projects does not need all the code, all the time, you might want to split it up into separate repositories.
EDIT: This link might be helpful in setting things up, in particular the "Publishing Multiple Repositories" section.
if completely separate repos don't work for you maybe have each project as a subrepo of some umbrella repo. I have to say that seperate repos sounds like what you need though given that each project sounds totally independent.
I'm fairly new to Mercurial myself (my company is making the leap from SourceSafe) so I don't know what more experience would say.
For me it makes sense to have one repository per Visual Studio Solution. If your modules are truly not dependent on each other, why are they all in the same solution? If you have a good reason for them all being in one solution, then that's probably the reason to keep them in one repository. If there's not a good reason for them to be in one solution, then a repository and a solution for each makes more sense to me.
Edit: So, since all the modules are built together and need to integrate, that would push me towards a single solution and a single repository.
Mercurial does a great job of merging, but the one thing I've had issues with is the solution file when merging the addition of more than one project at a time. It gets confused with multiple End Project lines. So, as long as you aren't adding new projects very often, your merges should be smooth.
From my experience, and not based upon studies etc, I would say that each logical blob is a repository. If you share code between subprojects, they need to be in the same repo. There will be full subrepo functionality, but currently (apr 2010) it's not fully implemented.

Mercurial Novice - Very Confused

Im in the process of trying to get my head round a dvcs such as mercurial. Im getting quite confused with certain points though. Firstly, a bit of context:
At the minute i mostly use subversion, and it works fine for my workflow,
Mostly the repository is for my own use, im the only web developer,and i only ever submit raw code to my manager, he never has to see the repository.
I use the repo to create major versions, and as backup so i can revert to it when something doesnt work out.
The repo also acts a file share, enabling me to work from the same codebase at work and at home.
My main reason for wanting to switch to mercurial, is the offline commits and easier branching / merging.
Firstly can anyone tell me how i would get mercurial to fit this workflow?
How do i go about sharing multiple repositories (i.e. one for each project) between computers?
Any help would be hugely appreciated,
Thanks
http://hginit.com/
There is a fantastic pre-chapter there specifically for SVN users. The rest of the tutorial will get you on your feet fairly quickly.
I'll answer just one part of your question, that of how to manage access to your repository from both home and work, because this is one of the situations where distributed version control is really useful.
The answer is that your two repositories are clones of one-another (to be correct, one is the clone of the other). You do some work during the day, check it in, then pull that work to your home repository (or push, but that requires more work). The next morning, you do the same thing in reverse. Mercurial comes with a built-in read-only HTTP server that makes it really easy, provided that you can expose a port.
The end result is that you have two repositories (ie, automatic backup of the entire history). At any given point in time, one is "better" than the other, but since you're the sole committer to both, they won't diverge.