managing large numbers of mercurial repositories on a server

managing large numbers of mercurial repositories on a server - mercurial

I've read about the single central repository vs. multiple repositories approach in Mercurial (e.g. these SO questions), and it's pretty clear that small repositories (one per self-contained project) is the right approach.
But this implies a large number of repositories, where by "large number" I mean enough that it's hard to keep track of which contains what. I'm in a small team (< 7 developers) and with the number of projects we work on, different branches/versions, etc., I can see us having 50 or 60 repositories.
Are there any tools out there to help catalog/manage a Mercurial server with dozens or hundreds of repositories?
edit: I'm using both SCM-Manager (within our firewall) and bitbucket (outside our firewall). Oh, and of course each developer is going to have his/her own local clones of a bunch of them.
The problem is not how to put large numbers of mercurial repositories on a server, but how to keep track of them all.

How are you serving these repositories? If you use hgweb then the simplest way is to add descriptions to your hgrc files on the server. You do this in the [web] section like this:
[web]
description=Description goes here
This adds the descriptions to the web based interface which is fine for us (we have about 30 repositories)
Other than that, I think that most of the options that you have involve choosing a web based publishing method that includes this functionality such as Rhodecode. Other publishing methods are available here.

As #Steve suggested, the web approach is very good. I manage 50+ repositories this way.
I have a process running the hg serve in a batch file:
hg serve --prefix mercurial --address my-hg-server --port 8000 --web-conf hg-web.conf --accesslog hg-access.log --errorlog hg-error.log
The hg-web.conf looks like this:
[web]
style = default
allow_push = *
push_ssl = false
[paths]
repos1 = \\hg-repos-server\repos1
repos2 = \\hg-repos-server\repos2
....
then, by accessing:
http : // my-hg-server:8000/mercurial
I get a page with links to all my repositories. A link can be copied and cloned by the user.
This is a very simple setup and works great for us.
You can wrap the hg serve batch file or shell script to be run as a service and you can put this all behind an Apache web server to manage more security.
I hope this helps

Related

Storing PCB files and software files in the same Mercurial repo

I have my Banana Pi set up as my Mercurial server. It works well for me for my software as generally speaking I have firmware and that's about it in my repositories. I can access it via open VPN from anywhere in the world. However, I have started to use version control for my PCB files as well now, due to a new CAD system which complicates my old, crude but effective way of doing my PCB archiving and backup. (Also, everything in my new CAD system, all the PCBs and schamtics, are text files which makes version control work nicely.)
So, with Mercurial I started doing as I did with software and creating a new repo for my PCB for one of the boards I'm updating for a customer, and immediately came across an issue that svn seems to cope with easily and I was wondering whether Mercurial can do the same.
I have my BH0001 project repository which has all the embedded C in it and I have started creating a new issue of the PCB for which the C code is used. I had to create a new Mercurial repo called BH0001_pcb to differentiate between code and PCB. With svn you can have a project repo and then Hardware and Software directories within the project number, but still be able to check out the two different types of files to different places independently.
I could, of course, clone the BH0001 software repository to a local machine, add the PCB info in a new folder in the local Mercurial repo send it all back to the server and it would be perfectly happy. The problem then comes when checking out because I would be cloning both firmware and PCB on to a machine when I might only want one or the other.
Also, this goes against how I store stuff locally. In my /username/home directory I have a Software directory and a CAD directory and within those I have projects. So I would have:
home/CAD/CustomerName/BH0001
and
home/Software/CustomerName/BH0001.
If I'm to carry on using my current method do I have to:
Change my local directory structures to be something like:
home/Projects/CustomerName/BH0001/CAD
and
home/Projects/CustomerName/BH0001/Software
Suck it up and use things like ProjectName_pcb for separate repos.
Some other way I can't think of/can't find/am unaware of? e.g. There's a way of checking out part of a Mercurial repository to one directory and a different part of the repo to a different directory.
Or should I just use svn if I really want to carry on as I have?

With default mercurial you currently cannot do partial repository clones as you can do with SVN. So your approach to use separate repositories is a good choice.
However there ways to achieve a similar result: sub-repositories. In your case I'd create a parent repository which contains your two current repositories as sub-repositories. Mind though, sub-repositories have some rough edges, so read the linked page carefully - I'd like to especially stress that it's good practise to have a parent repo which basically only contains the 'real' repos but not much on its own.
There exist ideas like a thin or narrow clone (which is somewhat identical to what SVN does), but I haven't seen them in production.

How do large companies deal with Mercurial?

I am investigating how to migrate our source control from SVN to Mercurial. One thing I am not sure how to deal with is usernames in commits. From what I've seen, there is no way to force an HG user to use a specific username, even if specified in Mercurial.ini, the user can override it in commits with the -u flag in hg commit.
How do companies deal with this? there is nothing to prevent developer A to commit something in his repository as developer B, and then pushing it to someone else.
Thanks.

I wouldn't say our company is large (4 developers), but it's never been an issue for us so far. I haven't seen any way to prevent that behavior either in my searching. I guess it comes down to an issue of trust amongst your developers.
Unrelated, we did successfully migrate from SVN to Mercurial about two years ago so I may be able to answer other questions you have.
EDIT: An idea:
I'm not sure how you were planning on setting up your topology, but we have a server that functions as the central repository for all our repos. It is possible to push changes between developers (bypassing the central server), but we never do that. We always commit locally and then push/pull from/to the central server. Additionally, we use https and windows authentication to authenticate with this central server.
If you're planning on having something like this, you could create a hook on the server (see repository events) (maybe the precommit event) that would verify that the user name in each commit being pushed is the same as the authenticated user from the web server.
Not sure if this would work, but it sounds plausable.

Another attempt(s)
Path-based ACLs in pseudo-CVCS workflow
If you'll use "controlled anarchy" workflow (p2p communications aren't controlled, resticted AND trusted and single authoritative source is common push-target), you can use "Branch Per Developer" paradigm. I.e - with ACL extension on central repo the following restrictions apply:
Nobody can push to default branch
Each developer can push only in his personal branch (under any name, name means nothing, auth-data for tracking is branch-name)
Only trusted mergers can work with repo-Central (merge dev-branches to default, NO rebase|NO history rewrite in dev-branches)
Each mergeset in default branch contain authentication piece - source branch
Signing branches
If you can't trust (and you must not trust) username in commits, you can trust strong crypto. Mercurial have at least two extensions, which allow digitally sign commits, thus providing accurate (so-so, see notes below) information about the authorship with own advantages and disadvantages in both cases
Commitsigs Extension Wiki and Signing Mercurial Changesets on Windows mini-HowTo are complete enough to understand and demonstrate all aspects of the start. Pro: no additional commits for signing, you can't (by design) sign old historic commits. Contra: not-so-nice output of needed commands (see screenshots in Damian's post for log and verifysigs), because it's GnuPG (no PKI), theoretically it's possible to create and use key-pair for any name-email and only "extra" comparison will show two different keys for one user
GPG extension and Approval Reports from wiki as quick-start. Pro: can use pgp-keys or openssl-certs (TBT!!!) (where openssl means one corporate source of issued certs), more readable and informative output of sigcheck command. Contra:
commiting changes to a .hgsigs file in the root of the working copy
and so it requires extra changesets to be made. This makes it
infeasible to sign all changesets. The .hgsigs file must also be
merged like any other file when branches are merged.
and at last file can be modified by hand by malicious user as any other file in WC
Edit and bugfixing
Openssl can be used in Commitsigs, not GPG extension

Mercurial distributed repositories

Having myself found in a role of a build engineer and a systems guy I had to learn end figure out a few things - namely how to set up our infrastructure. Before I came on board they didn't have any. With this in mind please excuse me if I ask anything that should have been obvious.
We currently have 3 level distributed mercurial repositories: level one on each of developer machines, level two on central (trunk) server - only accessible from local network and the third layer on BitBucket. Workflow is as follows:
Local development: developer pulls change-sets from local network server. developer commits to local and pushes to our local server once merge conflicts are resolved. A scheduled script overnight backs everything up to BitBucket.
Working from home: developer pulls change-sets from BitBucket. Developer comits to their local repo and push to BitBucket.
TeamCity picks up repo changes from local network server for each project and runs a build / automated deploy to test environment.
The issue I'm hitting is scenario 2: at the moment if someone pushes something to bitbucket it's their responsibility to merge it back when they're back in office. And it's a bit of a time waster if it could be automated.
In case you're wondering, the reason we have a central repo on local network is because it would be slow to run TeamCity builds of BitBucket repositories. Haven't tested so it's just an educated guess.
Anyhow, the script that is scheduled and pushes all changes from central repository on local network just runs a "hg push" for each of repositories. It would have to do a pull / merge beforehand. How do I do this right?
This is what the pull would have to use switches for:
- update after pull
- in case of merge conflicts, always take newer file
- in case of error, send an email to system administrator(s)
- anything extra?
Please feel free to share your own setup as long as it's not vastly different to what's described.
UPDATE: In light of recent answers I feel an important aspect if the intended approach needs to be clarified. The idea is not to force merges on our local network central repo. Instead it should resolve merge conflicts in same was as per using HgWorkbench on developer machines with post pull: update + merge. All developers have this on by default so it should be OK.
So the script / batch file on server would do the following:
pull from BitBucket
update + auto merge
Any merge auto conflicts?
3.1 Yes -> Send an email to administrators to manually merge -> Break
3.2 No -> Cary on
Get outgoing changesets. Will push create multiple heads? (This might be redundant because of pull / update)
4.1 Yes -> Prompt administrators. Break.
4.2 No -> Push changes
Hope this clears things up a bit. Now, can this be done using hg commands alone - batch - or do I have to script it? Specifically can it send emails?
Thanks.

So all your work is available at BitBucket, right? Why not make BitBucket (as available from anywhere) you primary repo source and dropping your local servers? You can pull changes from BitBucket with TeamCity for your nightly builds and developers whould always work with current repo at BitBucket and resolve all merging problems themselves so there wouldn't be any subsequent merges for you.

I would not try to automatically merge the changes if they are conflicting, this will only lead to broken and inconsistent versions and "lost" changes causing confusion and chaos. Don't merge it automatically if it isn't clear how that merge should look like.
A better alternative would be to just keep the two heads around and push/pull them without merging. This way everybody still can get that version of the data he was working on from work/home. A manual merge will have to be done, but this can also be done at work or from home, enabling developers to resolve the issue from wherever they are. You can also send emails around in this scenario to make sure everybody is aware of the problem.

I guess that you could automize this using a script, I would try PowerShell if I were you. However, sometimes this may require manual change merges when there are conflicts (because when developers commit changes to both BB and local repos, these changes might be conflicting).

Can I work in the repository in a single user Mercurial workflow?

I use Mercurial in a single-user workflow to have the option to roll back changes if my coding or writing goes horribly wrong (I primarily use the Stata and R statistics packages and LaTeX). While working only locally, this has been easy since all I have is the main repo.
Recently I have started ssh-ing into a Linux server for more computational power. So far I have been manually copying files back and forth and using Mercurial only locally, but I would like to use Mercurial to take care of this and keep these two workflows synchronized. Also, I like the ability to code both locally (on my laptop or desktop) and on the server.
Do I need to work on a clone of the main repo on the server and keep the main repo untouched? Or can I work directly in the main repo when I am on the server? In this question #gizmo points to this workflow guide; the "single developer" discussion is helpful, but it's still not clear to me that I can work in the main repo while I'm on the server without causing some major problem that I don't yet understand.
Thanks!
Edit: I should add that I have worked through Joel Spolsky's HgInit.com tutorial and I'm comfortable pushing/pulling/cloning/etc over ssh, but I am still not sure if I can work in the main repo without causing heartache later. Or maybe this is more a philosophical question? Thanks!

Mercurial is DVCS, it means - in each location you have both: local working copy and local repository
Mercurial is DVCS, it means - you can freely exchange (pull|push) data between repos (if they provide remote-access methods).
If you
comfortable pushing/pulling/cloning/etc over ssh
and don't forget perform pull|push cycle around your work at home (in order to don't run hg serve at home-host and sync from server as source) you don't get any headache at all with perfect linear aggregated history on each place. And even you forget to sync repo sometimes, you get in worst case two heads later, which you'll be able to merge easy (doesn't know formats of Stata and R data-files, but LaTeX, as text, is mergeable)

There is no problem with working directly in the repository on your server. From Mercurial's point of view, the "main" repository is just another random repository — Mercurial doesn't consider it to be special.
You don't say this directly, but one thing that people ask is "What happens when I push to the server?" The answer is that hg push only sends data into the repository (the .hg/ folder). The working copy is not touched on the server when you push to it. Since you push new changesets to the server, you might need to run hg update the next time you work on the server. This is just like if you had run hg pull on the server — there you'll also merge or update afterwards.
I have this situation all the time: I create a repository at home and clone it to my computer at work. I change files in either location and push/pull between the two repositories. If I need to share my work with others, then I make a repository at Bitbucket and push the code there. That way Bitbucket serves as a nice canonical repository for the code and I typically change the default path to Bitbucket in the repositories at home and at work. So at home I would have:
[paths]
default = httsp://bitbucket.org/mg/<repo>/
work = ssh://mg#work/<repo>
so that I can do hg push to send things to Bitbucket and hg pull work to grab things directly from work (in case I forgot to push to Bitbucket before leaving).

What's the best way to track private files in a public Mercurial repository?

"If it’s not in source control, it doesn’t exist."
This question was addressed for Git here: Techniques to handle a private and public repository?. What about for Mercurial?
I have several public Bitbucket repos (with multiple committers) where I'd like the source to be public, but which load API, SSH keys and other sensitive info from untracked files. However this results in someone emailing around the new config file if we add a new Mailchimp or Hunch or Twilio API key. Is there a way to shield these files from public view somehow and still track them? Everyone is syncing their repo through Bitbucket.

There are two good ways to handle this (besides zerkms's solution, which doesn't offer the easy of synchronization you want, but is what I'd do anyway):
Use Mercurial Queues. When you create a mercurial queue with hg qinit --create-repo it creates an overlay system that can be qpushed atop the existing repo. So you keep your secrets in queues and qpush them when you need them, and qpop them when you don't. With --create-repo the set of overlays (patches) is handled in a repository of its own. So people in the know can push/pull the secret overlay repo and people w/o access to it can use the base repo. The patch repo can be a private repo on bitbucket or hosted elsewhere.
or
Use a subrepo exactly as described in the git solution.

Create filename.ext.sample files with templates inside (probably filled with dummy data), which need to be copied and filled with actual data in the particular working directory.
That is what I usually do ;-)

Zerkms' solution is fast, easy, and clean, and likely your best bet for preventing secure content from being tracked / published; however as you say, "If it’s not in source control, it doesn’t exist." I find that far more often what I'm trying to keep out of source control is not a security concern, but simply a configuration setting. I believe these should be tracked, and my current employer has a rather clever setup for dealing with this, which I'll attempt to simplify / generalize / summarize here.
REPOSITORY
code/
...
scripts/
configparse.sh
...
config/
common.conf
env/
development.conf
testing.conf
production.conf
users/
dimo414.conf
mycoworker.conf
...
hosts/
dimo414-laptop.conf
dimo414-server.conf
mycoworker-laptop.conf
...
local.conf*
makefile
.conf*
* untracked file
Hopefully the idea here is pretty clear, we define settings at each appropriate level, enabling highly granular control of the codebase's behavior in a logical and consistent fashion.
The scripts/configparse.sh script reads all the necessary configuration files in turn and builds .conf out of all the settings it finds.
config/common.conf is the starting point, and contains logical default values for every setting. Many will likely get overwritten, but something is specified here. It's an error for a setting to be found in another file that isn't first set in common.conf.
config/env/ controls the behavior in different environments, doing things like pointing to the correct database servers.
config/users/ looks for a $USER.conf file, useful for setting things I care about, such as increasing the logging level for aspects my team works on, or customizing behavior I prefer to use across all my machines.
config/hosts does the same for machines, looking for $HOSTNAME.conf. Useful for machine-specific settings like application paths or data directories.
config/local.conf is an untracked file, and lets you set checkout-specific values and/or content you don't want in version control.
The aggregate of all these settings is output to .conf, which is what the rest of the codebase looks for when loading settings.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008