Mercurial: exposing a subset of history to the public - mercurial

I'd like to publish a subset of an existing private repository into the public. Given two repositories, private and public, I want to do the following:
private contains the project's entire history, including confidential information.
public should contain a subset of private's history, minus the confidential information.
I can generate a new branch in private that takes the latest changeset and strips away all confidential information, but I don't want to share ancestors of this branch with public.
Question: How do I strip history from public while keeping the repositories related? Meaning, I need to be able to hg pull from public into private.
Update:
What makes this question different from https://stackoverflow.com/a/5516141/14731 is that I need to hide existing ancestors from public (versus hiding new heads).
https://stackoverflow.com/a/4034084/14731 might work, but I'm wondering if there is a better approach than merging against a disjoint head.

Upon further reflection, I think it makes sense that https://stackoverflow.com/a/4034084/14731 produces a disjoint head since the remote changeset really does represent a head without ancestors. On the upside, this approach has a minimal diskspace cost. The files are not duplicated on disk. You only end up paying a bit (85k on my end) for the extra metadata.
Here is how to implement this approach:
hg archive to extract the latest changeset from private's sanitized branch.
hg init to create a new repository from this changeset.
hg pull [private] --force to pull public (an unrelated repository) into private as a new disjoint branch.
At this point you have two options: Merging the disjoint head into private's sanitized branch, or not.
Option 1: Merged head
Advantages
The private repository can see a historical link between private and public.
Disadvantages
You can not push changes from private to public because doing so will push the ancestors you worked so hard to exclude. Why? hg push cannot exclude ancestors of a merge.
You need to interact with private directly in order to modify the sanitized branch.
Contributing patches from private to public becomes more difficult (since you cannot make use of the history metadata directly).
Option 2: Unmerged Head
Advantages
Ability to push changes from private to public without revealing private changesets. You can do this using hg push -b disjointBranch.
Disadvantages
You lose the historical link between public and its ancestors in `private.
I'm still looking for a more elegant solution. If you have a better answer, please post it.

If you real task is really "hide private data", not "show only small subset of history" (see difference) you can
Activate and use MQ Extension
Convert all changesets, which change non-public data, into mq-patches
Eliminate from patches all edits, non-related to private data handling
Replace in code all occurences of private data by some keywords
Edit related patches in queue (which now must replace keywords by values)
Push "polished" private-repo to public (with all mq-patches previously unapplied)
In order to have in future "safe push" add alias to private repo (which, when used, push only changesets without /if any/ applied patches), smth. like.
[alias]
spush = hg qpop -a && hg push
or, in more modern way, for Mercurial, which have support for Phases, always have mq-patches in secret phase (i.e unpublishable) and don't worry about applied|unapplied state before push
[mq]
secret = True
in private repo's .hgrc

Related

Mercurial - repos with multiple ACL rules

I'm trying to use 2 set of rules for the ACL of my repos. I was thinking that maybe the action name could be useful from the hooks section but that not the case.
The rules are :
1) Only be able to commit on a specific branch (ex: acceptance)
2) Only be able to pull a specific branch. (Ex: bigNewFeature)
The second rule can look strange for a repo. We are in middle of releasing a big project where all the branches related to it had been merge under bigNewFeature for months on it. We are pushing it on our acceptance server and want to freeze the repos of any new branches except for bigNewFeature. That pretty much to avoid any mistake from one of the dev.
[extensions]
hgext.acl=
[hooks]
pretxncommit.acl = python:hgext.acl.hook
# Was expecting to be able to use any action name
pretxncommit.aclpull = python:hgext.acl.hook
# OR
pretxnchangegroup.aclpull = python:hgext.acl.hook
[acl]
sources = commit
[acl.allow.branches]
acceptance = *
[aclpull]
sources = pull
[aclpull.allow.branches]
bigNewFeature = *
My first rule work like usual but the second one don't look to be run at all. I have no error or what ever.

Error "Repository is unrelated" when pushing repository with sub-repository to bitbucket

There is repository (GameFramework) which I want to use as sub repository in another repositories.
I created a new main repository and then clone GameFramework into this repository and make GameFramework a sub repository. But when I'm trying to push main repository to bitbucket I'm getting error: repository is unrelated or repository is unrelated (in subrepo [path])
This is strange but when after error I clear bitbucket repository and then push again it works!
I taked the video https://youtu.be/WI86_3I2ZH0
Why is this happening?
Two repositories are considered unrelated by mercurial, if they do not have the same origin, thus were created independently.
Without the use of the --force option mercurial does not allow pushing to unrelated repositories.
In your case, you (or someone else) likely created a repository for the sub-project in each of the projects separately and independently - and not referencing the same repository as sub-repository.
Fixing the issue is a bit tricky. Likely the easiest approach is to change to one of the sub-repos. Pull from the other sub-repo and do a merge as needed. And also doing it the other way around.
You need to change .hgsub file.
By default this file has the next format:
[folder to sub repo] = [folder to sub repo]
You need to chane it on:
[folder to sub repo] = [sub repo url]

private add-on to a public project using Mercurial

I have a small open-source C++ project, hosted on Bitbucket using Mercurial.
Now, I am developing a new feature which adds a couple of new files and new build targets; otherwise it does not change the existing files.
I have opened a new branch, but after I pushed it to the main repo I was told that I cannot make the new feature open .. so I closed the repo and started looking for a solution. I have two questions:
What would have been a good approach for this situation? I need something that allows me to synchronize fixes made to the common code between the public and private repo. I do not mind having the private code only locally. I found two things:
using private stage for the new branch; but I don't know how to get fixes I make in the secret branch to the common files over to the open repo
using subrepos; this would need some code restructuring, but might be cleaner .. it just bothers me that this is marked as "feature of last resort" in the documentation.
How do I fix the situation where I have already pushed the closed code the the repo? Would it help to strip the branch and push, or do I need to delete the repo from Bitbucket and create a new one?
Since I am happy with having the private branch only locally, I have done the following:
I stripped the private branch from the bitbucket repo as well as my local copy of that repo.
In my copy of the private repo, I merged the changes from the public to to private branch and also then copied any common files changed in the private branch into the public one.
Then I marked the whole private branch as 'secret', to keep it local.
For future work, I plan to using hg merge for moving changes from the public to the private branch and hg graft for the reverse direction.
As far as I can see, this should work...

Push to two repositories that can't reach each other

The setup:
a laptop L
an office server hosting various repositories SOffice
a customer's database server SCustomer
I'm writing code on L for a customer, and regularly want to push it both to SOffice as well as SCustomer.
I know I could use a changegroup hook to push to a third repository from the second (as described in this answer), but this requires that the second can reach the third network-wise.
In my case, each is behind a firewall, and only my laptop typically accesses both through a VPN (or by being physically there). I could set up the VPN on SOffice to get to SCustomer, but I'd rather not.
Is there a way I can, say, set default to two repositories?
You can't default to two repositories, but you can define more than one repository in your hgrc file :
[paths]
default= /path/to/first/repo
scustomer = /path/to/second/repo
You can then push to the scustomer repository explicitly :
hg push scustomer
If you want to automate the process of pushing to both repository at once, I'm not aware of a Mercurial method to do it, but it is really easy to create a shell script, alias or something else to run both commands one after the other.
You can even use a hook on the repository to automatically push to the other one, but you will have to discriminate between a "manual" push and the automatic push in the hook, and I'm supposing this will be really messy.
Could you create a second clone of the repository with a hook that automatically pushes to both of the external repositories? Then push from your working clone to the second clone.
There's a MultirepoExtension that adds commands for doing any operation on multiple repositories.
Or you could create an alias to push to both like:
[aliases]
pushboth = !$HG push http://first ; $HG push http://second
or you could create a pre-push hook that pushes to the other one. Something like:
[hooks]
pre-push = hg push http://second
But I like (and upvoted) krtek's answer the most. Just give each a path alias and run push twice with the short names instead of the URLs.

How do I set up a hook in HG / Mercurial that gets dictated by the repository?

I have a need for a hook to run after update (this will build the solution they have updated) and I don't want to have to add that hook manually for each person that clones my central repository.
When someone first clones my central repository, is it possible to include hooks into that clone? It seems that the .hgrc file doesn't get cloned automatically.
I did read about site-wide hooks, but as far as I understand it, they work on each created repository, where I only want to have the hooks on some repos.
As Rudi already said, this is (thankfully) not possible for security reasons.
However, you can reduce the per-clone workload to set up hooks manually: Ship the hook scripts as part of your repository, e.g. in a directory .hghooks, and additionally include a script in your repo which sets up these hooks in a clone's hgrc. Each coworker now only needs to call the setup script once per clone.
This is not possible, since that hooks do not propagate to clones is a security measure. If this were possible, one could set up a rouge repository, which runs arbitrary commands on any machine where the repo is cloned.
See http://hgbook.red-bean.com/read/handling-repository-events-with-hooks.html#id402330 for more details.
This will allow for centralised per-repo hooks, with a single setup step per user. It will however cause problems for users who are disconnected from the network. An alternative if you tend to have disconnected developers (or ones over high-latency/low bandwidth links) would be to have a repo containing the hooks, and set up each user's global hgrc to point into that repo (and require regular pulls from a central hook repo).
Note that I treat the ID of the first commit as the "repo ID" - this assumes that the first commit in each repository is unique in some way - contents or commit message. If this is not the case you could do the same thing but applying it over the first N commits - but you would then have to account for repos that have fewer than N commits - can't just take repo[:5] for example as newer commits would then change the repo ID. I'd personally suggest that the first commit should probably be a standard .ignore file with a commit message unique to that repo.
Have a central shared_hgrc file, accessible from a network share (or in a hook repo).
Each user's global hgrc has:
%include /path/to/shared_hgrc
Create a shared repository of python hook modules. The hooks must be written in python.
Create your hook functions. In each function, check which repo the hook has been called on by checking the ID of the first commit:
# hooktest.py
import mercurial.util
FOOBAR_REPO = 'b88c69276866d73310be679b6a4b40d875e26d84'
ALLOW_PRECOMMIT_REPOS = set((
FOOBAR_REPO,
))
def precommit_deny_if_wrong_repo(ui, repo, **kwargs):
"""Aborts if the repo is not allowed to do this.
The repo ID is the ID of the first commit to the repo."""
repo_id = repo[0].hex().lower()
if repo_id not in ALLOW_PRECOMMIT_REPOS:
raise mercurial.util.Abort('Repository denied: %s' % (repo_id,))
ui.status('Repository allowed: %s\n' % (repo_id,))
def precommit_skip_if_wrong_repo(ui, repo, **kwargs):
"""Skips the hook if the repo is not allowed to do this.
The repo ID is the ID of the first commit to the repo."""
repo_id = repo[0].hex().lower()
if repo_id not in ALLOW_PRECOMMIT_REPOS:
ui.debug('Repository hook skipped: %s\n' % (repo_id,))
return
ui.status('Repository hook allowed: %s\n' % (repo_id,))
In the shared_hgrc file, set up the hooks you need (make sure you qualify the hook names to prevent conflicts):
[hooks]
pre-commit.00_skip = python:/path/to/hooktest.py:precommit_skip_if_wrong_repo
pre-commit.01_deny = python:/path/to/hooktest.py:precommit_deny_if_wrong_repo
As #Rudi said first, it can't be done for security reasons.
With some prior setup you can make it so that hooks are run on clone, but putting a hook with a repo-relative path in /etc/mercurial or in each user's ~/.hgrc, which in a corporate setting can be done via your system management tools or by building a custom Mercurial installer. In a non-corporate setting follow #Oben's advice and provide the scripts and a readme.