mercurial cloned data inegrity - mercurial

I'm cloning openJDK source code to my local repo, and I'd like be sure that the file integrity has been maintained in transit.
hg clone http://hg.openjdk.java.net/jdk8/jdk8/
The Mercurial FAQ says that revlogs are checked against their hashes, "But this alone is not enough to ensure that someone hasn't tampered with a repository. For that, you need cryptographic signing." Does that mean I need to use
SSH like this?
hg clone ssh://hg.openjdk.java.net/jdk8/jdk8/
And to get SSH access, do I have to sign up as a contributor to the project? If so, is there a way to get a verified clone without becoming a contributor?

SSH provides transport encryption. It ensures the data cannot be altered in-transit from the OpenJDK repository to your computer.
The Mercurial FAQ is talking about signing the commits, which ensures that you can later verify that these commits have not been tampered with individually. It means an attacker cannot break into the OpenJDK servers or falsify a commit for upstream acceptance, adding in revisions that the project doesn't mean to be there. You would be able to recognise such revisions because they lack the right signature or are not signed at all.
SSH wouldn't protect you against such issues, because SSH doesn't care about the data it transfers. Malicious (altered or added) commits are transferred just as securely as valid revisions.
Signed commits is not something you as a consumer of the repository can add later. The OpenJDK project would have to build signing into their committing procedure from the start.

Related

How to check a Mercurial repository for consistency (checksums)?

Assume I recover a Mercurial repository from a broken file system (e.g. bad hard drive), and I want to be sure that this one was not affected.
How can I force a self-check in Mercurial? That is, Mercurial walks through the whole history and checks that all checksums fit their respective dataset, and that the repository as a whole is consistent.
Is it sufficient to perform a local "hg clone" to enforce that check?
It there something like "git fsck" for Mecurial?
The command for a pure check is:
hg verify
In case the repository is corrupt, the Mercural wiki provides recovery instructions:
https://www.mercurial-scm.org/wiki/RepositoryCorruption
Of course, this only checks the commits, not the working directory. That it, it neither checks local changes that were not yet committed, nor ignored files such as build results. All those can't be verified by Mercurial, of course. Those would either have to be verified by different means, or simply be reset using a fresh Mercurial checkout and a fresh build.

Is it possible to convert a googlecode hg repository to a largefile repository?

I have a remote hg repository hosted on googlecode. Thus I don't have admin access to run e.g. lfconvert on it (as far as I know), and of course lfconvert can only be used on local repositories.
So, is there any way to a convert an googlecode hg repository to a largefile repository?
(one idea is to convert a local clone of the repo to a largefile repo and then push the changes to the "central" googlecode repo, but I fear trying that without knowing if it is a valid approach).
Using your idea to do a local conversion and push, you can take advantage of the 'reset' feature for your repositories:
Do a local clone.
Convert to largefiles: `hg lfconvert normal_repo largefiles_repo``. Do NOT delete the original clone until you are sure everything works.
Reset the hosted repository (See https://code.google.com/p/support/wiki/MercurialFAQ#Mercurial_FAQ).
Push the largefiles repository.
Pushing the largefiles repository without reseting seems problematic because the largefiles repository is essentially a fork of the original one starting at the point the first largefile was committed.
If the push fails*, you can push the original clone and you'll be back where you started without any data loss. (One of the many advantages of DVCS. :-))
The big downside of course is that everybody who has ever cloned your project will now be working from a different fork of the repository. This is always a danger when you do anything involving changing history and is the motivation for Mercurial phases. If you want to be 'kinder', you can start a second project for the largefiles version and place a link at the original project cite describing the move.
[*] I can't figure out from Google Code's documentation whether the largefiles extension is supported. There is a reviewed feature request, but I couldn't find any mention of the request actually being implemented. The push failing would probably be a good indication that largefiles isn't supported though...

Using one Mercurial repository as local for two Mercurial installations

We have a dedicated issue tracking (Redmine) machine, which has a Mercurial repository (call it "Redmine repository"). Redmine is set up to use that repository, and as far as I understand, Redmine never makes any changes to that repository. All developers (eventually) push their changes to that repository.
We also have a dedicated production machine, which can execute the code, but is not used to make any changes to the code.
We have two choices:
Set up another Mercurial repository on the production machine (call it "production repository"). When a new production release is approved, pull the changes from the Redmine repository to the production repository, and then update the local working directory to the appropriate revision from the production repository.
Reuse the existing Redmine repository on the production machine designating it a local repository for the Mercurial installation there (the Redmine repository is on the shared drive that can be easily mounted on the production machine). Whenever a new production is approved, update the local working directory to the appropriate revision from the Redmine repository.
With option #2, we get rid of an extra "pull" step (from Redmine repository to production repository), which slightly simplifies the process. But I'm not sure if it's ok that a single repository is used by two Mercurial installations as if it's local.
Any comments on this choice (or any other aspect of this setup) is appreciated!
It sounds like a bad idea. Mercurial does a really good job of keeping reads and writes to its repository atomic, but it has a harder time doing that when the repository is on a shared drive -- even if it's only one local repository using it -- because network shares (especially on Windows) don't always make things atomic that they say they do.
Ideally your repositories (both the working dir and the repository) are local when possible, and you use push/pull to get changesets to/from a network share. If that's not possible then having a single local application using the repo on the remote file system is the best idea.
If you positively want to try having two clones using the same underlying repository check out the ShareExtension, which ships with Mercurial but is for advanced users only.
Instead of trying to piggy-back, why not just put a hook like this in your redmine repository:
[hooks]
changegroup = hg push //production/clone
That will automatically push changesets that arrive in redmine to production.

How to validate and enforce commit message in Mercurial?

What are all steps required to validate commit message with set of regular expressions?
We want to work in semi-centralized set-up so I need a solution for the developer clone (local repository) and for our central clone (global repository). I read about Mercurial Hooks but I am a little bit lost how to put all things together.
For local repository I need a way to distribute validation script across my developers. I know that hooks do not propagate when cloning so I need to a way to "enable" them in each fresh clone. It would be done as a part of our PrepareEnvironement.bat script that we run anyway on each clean clone.
To be double safe I need similar validation on my global repository. It should not be possible to push into global repository commit that are not validating. I can configure it manually - it is one time job.
I am on Windows so installing anything except TortoiseHG should not be required. It was already a fight to get Mercurial deployed. Any other dependencies are not welcomed.
You can use the Spellcheck example as a starting point. In each developer's configuration, you need to use the following hooks:
pretxnchangegroup - Runs after a group of changesets has been brought into local from another repository, but before it becomes permanent.
pretxncommit - Runs after a new changeset has been created in local, but before it becomes permanent.
For the centralized repo, I think you only need the pretxnchangegroup hook unless commits can happen on the server, too. However, you will need the Histedit extension for each of the developers if the remote repo is the one rejecting one or more of the changesets being pushed. This extension allows them to "edit" already committed changesets. I would think in most cases, the local hooks will catch the issue, but like you said, "just in case."
More details about handling events with hooks can be found in the Hg Book.

hg access control to central repository

We come from a subversion background where we have a QA manager who gives commit rights to the central repository once he has verified that all QC activities have been done.
Me and a couple of colleagues are starting to use mercurial, and we want to have a shared repository that would contain our QC-ed changes. Each of the developers hg clones the repository and pushes his changes back to the shared repository. I've read the HG init tutorial and skimmed through the red bean book, but could not find how to control who is allowed to push changes to the shared repository.
How would our existing model of QA-manager controlled commits translate to a mercurial 'central' repository?
HenriW's comment asking how you are serving up the repositories is exactly the right question. How you set up authentication depends entirely on how you're serving your repo (HTTP via Apache, HTTP via hg-serve,, ssh, etc.). The transport mechanism provides the authentication and then mercurial uses that with the commands from Mr. Cat's link (useless in and of themselves) to handle access control.
Since you didn't mention how you're serving the repo, it was probably someting easy to set up (you'd have remembered to mention the hassle fo an apache or ssh setup :). So I'll ugess those two:
If you're using hg serve then you don't have authentication setup. You need to use apache, lighttp, or nginx in front of hgweb or hgwebdir to provide authentication. Until you do the allow_* and deny_* options are strictly everyone or no one.
If you're using ssh then you're already getting your authentication fromm ssh (and probably your OS), so you can use the allow_* and deny_* directives (and file system access controls if you'd like).
serverfault.com has a relevant question and links to the Publishing Repositories Mercurial Wiki page. The first shows how to configure per-repository access when using hgweb on the server. I get a feeling that you're using ssh which the wiki page labels as "private" and am therefore inclined to believe you would have to fall back to file-system access control, i.e. make all the files in the repository belong to the group "commiters", give group members write access and everyone else read/only.