Cleaning up a mercurial repository for an external contractor - mercurial

I have an active project with some sensitive files and directories. I want to hire an external contractor to do some simple UI work. However, I don't want the contractor to have access to some directories and files. My project is in mercurial on Bitbucket.
What is the best way to clean up the project and give him access to commit his changes? I thought about forking into a new repository, but I am worried about removing directories I don't want him to have access to.
How to I remove them so they don't appear the original changesets? How to I merge his repo back without it removing those directories in my main repository? Is a fork the way to go?

Naturally a repository needs access to its whole history in order to self-check its integrity. I don't know of a way to selectively hide parts of the repository (there's the ACL extension, but it is for write access only).
In your case, I would
create a new repository where all sensitive information has been stripped off (use the convert extension for that task).
Then I would let the external guy work with that repository.
Once his work is finsihed, pull his repository into a clone of the original one (using -f to force pulling of an unrelated repository), and
rebase his first changeset and all its children onto a head of your original repository.
Finally, push the rebased head to the original repository.
For steps 3 to 5 you don't necessarily have to wait until the external developer is done. Rebasing intermediate states of his repository is also possible.
Yet, it's an theoretical idea .. one has to see how it performs in practice.
Alternative: In case you frequently have external contractors who shouldn't see some parts of your code, I would second #Anton's comment to setup permission related multiple repositories.

There are multiple ways to do this:
Using sub-repositories
Using multiple repositories
???
Regardless, you need to restructure and split your existing repository, so this will create havoc if you have lots of people working on this project, they will all need to stop working, synchronize their work, destroy their local clones and clone down fresh copies after the restructuring.
One way using multiple repositories would be that you do the following:
Make 2 extra clones of the repository (keep one around for fallback if everything fails, you can always go back)
The first clone you need to run the hg convert command on to get rid of all the bits and pieces your contractor should not access
Then you fix that repository so that it works by itself. You might have to change code to provide hooks and events for anything not present but which you intend to inject into the project before you build
Then you need to run hg convert on the other clone to get rid of everything now present in the first.
Then you pull from the first (contractor) repository into the second (private) repository, merge, and do necessary fix-ups so that the code still works as intended
What you have now is two repositories:
Contractor-repository, with only the bits you want to expose
A private repository, that has pulled and merged from the contractor-repository, and contains all the other bits and pieces
From now on, whenever the contractor has pushed work to his repository, you need to pull from it and into the private repository and then merge.
Your repositories would look like this:
Contractor: ---97---98---99---100---102---103---104
M M
Private: ---91---92---93---94---95---96---101---105---106---107
/ /
/ /
---97---98---99---100---102---103---104
The two changesets with M above are merge-changesets that merge contractor-supplied code into your private repository.
Note that you too would have to commit code to the contractor-repository, to work on and fix bugs in the code there, but all the private bits you can keep private.

Related

Project version control organization with multiple private entities contributing

I am looking for suggestions on how to best organize this environment descried below. We are currently on Mercurial and I would prefer to stay there, however if a different version control system will help us achieve our goals, we will switch.
Short summary - Our company has been contracting another company to do work. I have been merging from our shared repository to our private repository where we have other modifications, and building a release product from there. Now we want to add 2 more contractors (who are entire companies, not just a guy) that can also contribute, however all contractors have private information that I can see, but needs to be private from the other contractors.
More details:
ContractorA started repository
I forked ContractorA to MyCompany
MyCompany now contains additions to ContractorA
The way I have been handling this so far is this:
ContractorA pushes to ContractorA
My Local machine has a clone of MyCompany
I have an alias to ContractorA on my local machine
I Pull from ContractorA, handle any merge conflicts, and then push to MyCompany
So the MyCompany repo has all changes from ContractorA and MyCompany
This has been working perfect for my needs, however now ContractorB is going to enter the picture
ContractorA has proprietary stuff that ContractorB cant have access to and ContractorB has proprietary stuff that Contractor A can’t have access to.
There is a common section that ContractorA will be contributing to and MyCompany and ContractorB will be using.
Everything needs to end up in MyCompany as we do 1 build that will include everything -
Common code that ContractorA modifies
Proprietary code from ContractorA
Proprietary code from ContractorB
Common Code from MyCompany
Any thoughts on how to handle this?
Since MyCompany is a clone of ContractorA, I cannot just give access to ContractorB to MyCompany Repo, correct? Or is there a way to restrict access on a directory basis to each user?
Is there a way to fork MyCompany into a new repository ContractorB and remove all ContractorA proprietary code such that ContractorB can never see any ContractorA proprietary code. If that is the case, could I put another alias in my local machine, pull from ContractorA, merge and push to MyCompany, then pull from ContractorB, merge and push to MyCompany?
Does this make any sense at all?
Thank you!
If you have a Mercurial repo, you will need to have full and complete access to the entire history of the working directory. However, you can hg clone a repo and only include certain branches in the clone and as long as you are careful about how those branches interact, you can achieve your goal. You do have a couple options and I've listed them in order of how I would rank them:
Each does their work on separate branches You will need four branches: core, branch-a, branch-b, composed-branch. All merges between the branches must go only one way: core -> branch-a -> composed-branch.
core: the common code base that is shared by all. All updates to this code need to be done at the tip of core since you will never be merging any other branches into it. If you make updates to the shared codebase, you then merge it into each contractor branch. If a contractor does anything on their own branch that should be put into core, manually make the updates to core via copy/paste or via a patch.
branch-a and branch-b: these hold the independent work done by the separate contractors. The public repo to which ContractorA has access (for pushes and pulls) will only have changes core and branch-a present since those are the only ancestors of the tip of their branch. Similarly for the ContractorB's repo.
composed-branch: this is where the magic happens. You pull from branch-a and branch-b and merge them into this branch as you receive updates from the contractors. You handle any merge conflicts here and can also do any additional changes to integrate the work done by the contractors. This is the branch that your build system pulls.
Break your current repo into several smaller repos If the risk of an incorrect push with one repo is too great or if the work each contractor is more like a dll or a library that can be built independently, it may make more sense to treat them as independent repos. Basically the previous suggestion, but independent repos instead of independant branches. You would again separate things into "used by all", "created by contractor" and "combined total". Also note that you may find hg's sub-repo useful, though it is considered a "feature of last resort".
Do it all via patches Contractors can send you updates as exported patches and you can apply and clean them up. This is a bad idea so I won't go into details unless pressed, but it is possible.
In all cases, you will likely need to start a new repo based on your current codebase that cleans the boundaries about what goes into core and can be seen by all.
Note: even if you hg delete files in the repo, they are still fully accessible in the history!
If every contractor must get another contractor work except "secret part", you have
Have MQ extension enabled on MyCompany repo
Move all secret parts of every contractor into separate MQ-patch (one patch per contractor)
(Clone|push to contractor's repo) with only his patch applied (hook can help with such condition-check)
Your integrator's work must be performed with both patches applied to MyCompany repo

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Creating Mercurial subrepositories, while maintaining the history

I am about to make some major changes to my Mercurial repositories. As I am going to be using a Feature of Last Resort, I am looking for some advice and reassurance that I am not doing something stupid.
Where I Am:
I have a Mercurial repository with a complete history of all of these files:
/source
/secret_subsystem
/unclassified_subsystem
/common_files
Source is the Mercurial repository.
The secret subsystem folder contains code which is intellectual property we want to keep in-house.
The unclassified subsystem folder contains code which we want to outsource to a third-party to maintain.
The common files folder contains code that both subsystems depend on. We will be keeping ownership, but we want to share it with the third-party.
Obviously, I can't just push out my whole repository to the third-party company. The third-party would see too much.
Where I Want To Be:
Having read up on subrepositories, this is where I think I need to be:
Have THREE subrepositories: secret_subsystem, unclassified_subsystem, common_files.
Ensure there are no other files at the /source level, due to this recommendation.
Have the outsourcers create a brand new respository at the source level on their machines, and two corresponding subrepositories.
Push the unclassified_subsystem and common_files to the out-sourcer, pulling back unclassified_subsystem as required, pushing out new common_files repositories as required.
Maintaining History:
I would like to maintain the commit history, as much as practical, for all of the subsystems.
To do this, I will run the hg convert extension command three times, once for each subrepository. I will filter down to only the files that belong in each subrepository. I may also need to map filenames to move the files from ./common_files/foo.py to ./foo.py (for example).
My Questions:
1) Is dividing up a repository into subrepository a reasonable way of implementing security - viz. that a third-party can only see and edit some of our files?
2) Is using hg convert a reasonable way to create a subrepository from an existing repository, while still maintaining the history?
3) Will hg convert's filter strip out (a) all commits messages about files NOT in the filtered respository? Will it filter out all diffs for files NOT in the filtered repository?
There is another implied question: Am I heading into a world of hurt? If so, I will simply give up on retaining file histories, or even make them seperate repositories and forget about cross-repository commits.
I've not used subrepos so far, but I can answer 2) and 3):
2) Yes, sounds reasonable.
3) Yes.
There was a similar question just 2 days ago: Convert mercurial repository to subrepositories with full history (like hg log -f)

Simplest workflow for non-developers using mercurial, working on different files, without having to think about merging?

I currently use SVN for a number of things that aren't exactly code, for instance xml files, report templates, miscellaneous files, etc. I have several non-developers who are comfortable using TortoiseSVN for this. They typically work as follows:
Person A - does an SVN Update on the folder of interest to them. Or perhaps just on a single file.
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else is probably working on different files at this point
Person A - does an SVN Commit to save their changes to the repository.
Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine.
I'd like to move to Mercurial, however something holding me back is the prospect of having do 'merge' all the time, because Mercurial looks at the state of the entire repository, not just the files of interest at a particular time. e.g. the workflow would be like this:
Person A - does a pull and update on the repository. (let's assume there are no local changes so this is straightforward).
Person A - edits whichever file(s) they're working on. Perhaps add or remove files.
Person B - someone else edits, commits, and pushes a different file at this point
Person A - commits changes. Tries to push. Gets an error about multiple heads.
Person A - does a pull and update. update doesn't work: merge required.
Person A - does a merge. If using TortoiseHg it's a bit confusing working out what to click on to do the merge. I guess this is simpler on the command line, provided there are no complications.
Person A - commits the merge.
Person A - pushes the changes.
My resistance is that there are more steps, and the merge step is somewhat hard to get your head around if you're not a developer. Is there a way I can put these steps together to make the process nice and simple?
"Very occasionally they'll hit conflicts where more than one person has edited a file. Almost always this is just because they forgot step #1. Because they're always working on separate files, there are (almost) never real conflicts. As long as they do step #1 first everything works fine."
If this is the case why do you want to use a DVCS? Mercurial is great, but the benefits of a DVCS come from the ability to merge and fork and the ease of doing either, if your workflow requires neither why would you want to switch toolset?
Sounds like the rebase extension might work for you. The workflow becomes:
hg clone
make changes
hg commit
hg pull --rebase
hg push
The local revisions get "rebased" onto the latest tip on pull, which avoids the merge.
One possible approach is to have a point person who does all the real work of merging. I'm not a big fan of letting everyone push to one shared repos, expecially if they don't know what they are doing. An alternative approach is that A has local repos A, B has local repos B, and there is repos S, which combines A and B. Then, don't let A or B push to S. Instead let an expert pull from A and B, and do the merging in S. Then A and B never have to push to S. If they coordinate with the expert, then he/she will already have merged their changes into S by the time they pull updates from S, and so A and B will not have to merge either when pulling. This is actually the default mode in which DVCS works, since by default all repositories are read-only except by their owner.

Can one Mercurial repository live inside another Mercurial repository?

Can one hg repo live inside another hg repo on my local file system?
I am pulling down the bitbucket wiki for 'sandbox', and I want to know if this should be placed in repos/sandbox/wiki or repos/sandbox-wiki.
Is the former okay to do?
Edit: See Subrepository.
The short answer is yes, but I can't imagine why you would want to.
In your example, I think you should go with:
repos/sandbox-wiki
[edit] Additionaly:
Yo Dowg, I herd you like repositories.
So we put a repo in your repo so you can version while you version
:-)
Yes and no. Depends on what you want to do. You can create repo 'sandbox/wiki' but files in this inner repos won't be commited in the outer 'sandbox' repo (#Jason is right). If you don't want to, no problem.
Try explicitly adding files from wiki repos in sandox and you'll get the message below. If you just add path to some directory containing an inner repo the files will just be ignored.
From sandox root directoy:
hg add wiki/myfile
abort: path 'wiki/myfile' is inside repo 'wiki'
Mercurial does not allow nested repositories, but there is at least one reason for them:
Imagine that you are working in a project: /MyProject. In this folder you put everything: code, documentation, tests, etc.
You want to backup your work because it is very important, so you create a repository for /MyProject. Then, overtime you use bundles to save the evolution of /MyProject and back up them in a USB flash memory so that you can recover everything just in case your hard drive breaks.
Remember that /MyProject contains everything. And among all those things, there are the main code and some auxiliary projects. You also want to track the progress of an auxiliary project that is in /MyProject/AuxiliaryProject, so you use Mercurial to track its evolution.
Also, you want to have a separate repository for the main code: /MyProject/Main
In this situation you want nested repositories: one big one for being able to back-up everything using bundles and child repositories for managing each subproject.
I think Mercurial should give the user several options when initializing a repository. For example:
- ignore nested repositories
- include nested repositories but ignoring .Hg folders (i.e. act as if there were no nested repositories but do not ignore the information contained in the nested respositories).
- include nested repositories and also include .Hg folders (makes sense for back-up purposes)
--------- Edit:
Subrepositories is a feature that is work in progress:
https://www.mercurial-scm.org/wiki/subrepos
Also, there is an extension named "forest" that might become obsolete in the future:
https://www.mercurial-scm.org/ForestExtension
You'd need to set up an .hgignore file in sandbox to exclude wiki because mercurial assumes that it is responsible for all descendants. This would probably generate more user confusion than it is worth.