Project version control organization with multiple private entities contributing

Project version control organization with multiple private entities contributing - mercurial

I am looking for suggestions on how to best organize this environment descried below. We are currently on Mercurial and I would prefer to stay there, however if a different version control system will help us achieve our goals, we will switch.
Short summary - Our company has been contracting another company to do work. I have been merging from our shared repository to our private repository where we have other modifications, and building a release product from there. Now we want to add 2 more contractors (who are entire companies, not just a guy) that can also contribute, however all contractors have private information that I can see, but needs to be private from the other contractors.
More details:
ContractorA started repository
I forked ContractorA to MyCompany
MyCompany now contains additions to ContractorA
The way I have been handling this so far is this:
ContractorA pushes to ContractorA
My Local machine has a clone of MyCompany
I have an alias to ContractorA on my local machine
I Pull from ContractorA, handle any merge conflicts, and then push to MyCompany
So the MyCompany repo has all changes from ContractorA and MyCompany
This has been working perfect for my needs, however now ContractorB is going to enter the picture
ContractorA has proprietary stuff that ContractorB cant have access to and ContractorB has proprietary stuff that Contractor A can’t have access to.
There is a common section that ContractorA will be contributing to and MyCompany and ContractorB will be using.
Everything needs to end up in MyCompany as we do 1 build that will include everything -
Common code that ContractorA modifies
Proprietary code from ContractorA
Proprietary code from ContractorB
Common Code from MyCompany
Any thoughts on how to handle this?
Since MyCompany is a clone of ContractorA, I cannot just give access to ContractorB to MyCompany Repo, correct? Or is there a way to restrict access on a directory basis to each user?
Is there a way to fork MyCompany into a new repository ContractorB and remove all ContractorA proprietary code such that ContractorB can never see any ContractorA proprietary code. If that is the case, could I put another alias in my local machine, pull from ContractorA, merge and push to MyCompany, then pull from ContractorB, merge and push to MyCompany?
Does this make any sense at all?
Thank you!

If you have a Mercurial repo, you will need to have full and complete access to the entire history of the working directory. However, you can hg clone a repo and only include certain branches in the clone and as long as you are careful about how those branches interact, you can achieve your goal. You do have a couple options and I've listed them in order of how I would rank them:
Each does their work on separate branches You will need four branches: core, branch-a, branch-b, composed-branch. All merges between the branches must go only one way: core -> branch-a -> composed-branch.
core: the common code base that is shared by all. All updates to this code need to be done at the tip of core since you will never be merging any other branches into it. If you make updates to the shared codebase, you then merge it into each contractor branch. If a contractor does anything on their own branch that should be put into core, manually make the updates to core via copy/paste or via a patch.
branch-a and branch-b: these hold the independent work done by the separate contractors. The public repo to which ContractorA has access (for pushes and pulls) will only have changes core and branch-a present since those are the only ancestors of the tip of their branch. Similarly for the ContractorB's repo.
composed-branch: this is where the magic happens. You pull from branch-a and branch-b and merge them into this branch as you receive updates from the contractors. You handle any merge conflicts here and can also do any additional changes to integrate the work done by the contractors. This is the branch that your build system pulls.
Break your current repo into several smaller repos If the risk of an incorrect push with one repo is too great or if the work each contractor is more like a dll or a library that can be built independently, it may make more sense to treat them as independent repos. Basically the previous suggestion, but independent repos instead of independant branches. You would again separate things into "used by all", "created by contractor" and "combined total". Also note that you may find hg's sub-repo useful, though it is considered a "feature of last resort".
Do it all via patches Contractors can send you updates as exported patches and you can apply and clean them up. This is a bad idea so I won't go into details unless pressed, but it is possible.
In all cases, you will likely need to start a new repo based on your current codebase that cleans the boundaries about what goes into core and can be seen by all.
Note: even if you hg delete files in the repo, they are still fully accessible in the history!

If every contractor must get another contractor work except "secret part", you have
Have MQ extension enabled on MyCompany repo
Move all secret parts of every contractor into separate MQ-patch (one patch per contractor)
(Clone|push to contractor's repo) with only his patch applied (hook can help with such condition-check)
Your integrator's work must be performed with both patches applied to MyCompany repo

Related

How to setup a DVCS system that keeps some of the stuff we like about our centralized VCS?

Right now, we have a small team of developers using TFS for our version control. I'm evaluating the possibility of us moving to a DVCS, and am wondering if we'd need to give up some of the stuff we like about our current system if we moved to DVCS, or if we can find a way to support it.
Right now we a Stable branch, and 1 branch for each developer (you can think of each dev's branch as a feature branch that is reused from feature to feature). The stuff we like is:
1) Each time a dev checks in changes to his branch, we do a build and test of everything on the servers, followed by automatic deployment of all projects to that developers test environment.
2) Merging from any dev's branch to Stable is done by me, so that I can have 1 last check on what is happening to our stable branch before committing the changes.
3) If I want to help a dev with something, I can just grab latest from their branch and look at it on my machine.
I'm trying to understand how this could work with a DVCS (specifically we are testing with Mercurial).
I'm hoping to be able to pull off something like this:
1) We setup a central repository, and we create Main and Release branches in addition to 1 branch for each developer.
2) All devs clone the repository to their local machine.
3) All work is done in their personal branch against the local repository.
4) When they are done, the would pull from the central repository, and perform a local forward integration merge from Main to to their branch, to integrate any changes that have happened in the past to Main.
5) They would then push their changes to the central repository.
6) Some CI service would pickup this change, causing a build/test/deploy-to-dev of all our projects in that branch.
7) If everything was ok the dev would shoot me an email saying their branch was ready for merging to Main.
8) I could then merge in their changes, either by somehow connecting directly to the remove repository, or by doing a pull->merge->push.
So to deal with our requests:
1) I'm assuming there is some CI tools that can watch a branch in Mercurial, and kick off a build/test/deploy process (like CC.Net).
2) I can still manage the final merge process from DevX to Main either by connecting to the remote repo, or by pulling, merging and pushing through my local repo.
3) I believe I could either pull changes directly from another dev's repo, or I could just pull from the central repo, and then update my working directory to work on their code.
So do I have this mostly right?

Yep, you have all that right. Regarding you final assumptions:
1) There's a Mercurial Source Control Block for CruiseControl.NET. Another CI server I've heard of in use with Mercurial is Jenkins.
2) Correct. For integration with Main, I would prefer pulling (from either) and merging on my own machine before pushing, rather than merging on the server.
3) Exactly so.
It sounds like your developers are fairly disciplined, but just in case you need better control certain aspects of your operations:
You can use hooks to issue warnings when someone tries to merge their branch to Main. In-process hooks have to be written in Python, but they have access to the Mercurial API that way. You could also place hooks on the server that reject pushes containing a merge to Main not done by certain users.
One way some organizations control integration is a pull-only scenario. Only a few developers can push to the official repository and other developers send them pull requests. The Mercurial book's Chapter 6 covers this a bit, too.
A branch per developer is good. A branch per feature is also useful, allowing each developer to work on multiple things in parallel, then merging each to their branch when done. They just have to remember to close that feature branch before doing so, so the branch name doesn't keep popping up. This can be done with with clones as well, but I find myself preferring named branches since I have to keep work/backup/laptop development clones all synced so I can work on whateve, whenever. I still do expendable work as a clone first.

Cleaning up a mercurial repository for an external contractor

I have an active project with some sensitive files and directories. I want to hire an external contractor to do some simple UI work. However, I don't want the contractor to have access to some directories and files. My project is in mercurial on Bitbucket.
What is the best way to clean up the project and give him access to commit his changes? I thought about forking into a new repository, but I am worried about removing directories I don't want him to have access to.
How to I remove them so they don't appear the original changesets? How to I merge his repo back without it removing those directories in my main repository? Is a fork the way to go?

Naturally a repository needs access to its whole history in order to self-check its integrity. I don't know of a way to selectively hide parts of the repository (there's the ACL extension, but it is for write access only).
In your case, I would
create a new repository where all sensitive information has been stripped off (use the convert extension for that task).
Then I would let the external guy work with that repository.
Once his work is finsihed, pull his repository into a clone of the original one (using -f to force pulling of an unrelated repository), and
rebase his first changeset and all its children onto a head of your original repository.
Finally, push the rebased head to the original repository.
For steps 3 to 5 you don't necessarily have to wait until the external developer is done. Rebasing intermediate states of his repository is also possible.
Yet, it's an theoretical idea .. one has to see how it performs in practice.
Alternative: In case you frequently have external contractors who shouldn't see some parts of your code, I would second #Anton's comment to setup permission related multiple repositories.

There are multiple ways to do this:
Using sub-repositories
Using multiple repositories
???
Regardless, you need to restructure and split your existing repository, so this will create havoc if you have lots of people working on this project, they will all need to stop working, synchronize their work, destroy their local clones and clone down fresh copies after the restructuring.
One way using multiple repositories would be that you do the following:
Make 2 extra clones of the repository (keep one around for fallback if everything fails, you can always go back)
The first clone you need to run the hg convert command on to get rid of all the bits and pieces your contractor should not access
Then you fix that repository so that it works by itself. You might have to change code to provide hooks and events for anything not present but which you intend to inject into the project before you build
Then you need to run hg convert on the other clone to get rid of everything now present in the first.
Then you pull from the first (contractor) repository into the second (private) repository, merge, and do necessary fix-ups so that the code still works as intended
What you have now is two repositories:
Contractor-repository, with only the bits you want to expose
A private repository, that has pulled and merged from the contractor-repository, and contains all the other bits and pieces
From now on, whenever the contractor has pushed work to his repository, you need to pull from it and into the private repository and then merge.
Your repositories would look like this:
Contractor: ---97---98---99---100---102---103---104
M M
Private: ---91---92---93---94---95---96---101---105---106---107
/ /
/ /
---97---98---99---100---102---103---104
The two changesets with M above are merge-changesets that merge contractor-supplied code into your private repository.
Note that you too would have to commit code to the contractor-repository, to work on and fix bugs in the code there, but all the private bits you can keep private.

Mercurial and code reviews; good workflow?

I'm in a small distributed team using Mercurial for a central repository. We each clone it via ssh onto our own linux boxes. Our intent is to review each others' work before pushing changes up to the central repository, to help keep central's tip clean. What is a good way to share code between developers on different linux boxes? I'm new to Mercurial. The options I can think of (through reading, not experience) are:
1: Author commits all local changes and updates working clone with central's tip. Author uses hg bundle, if there's a way to specify which local revs to include in the bundle. (an experiment showed me "bundle" only grabs uncommited changes, even if there are previous local commits that central doesn't know about) Author gets bundle file to reviewer. Reviewer creates a new clean clone from central's tip, and imports the bundle into that clone.
or,
2: After both author and reviewer fetch from central's tip, author uses patch and reviewer imports the patch.
or,
3: Author pushes to reviewer or reviewer pulls from author (but how, exactly? What I read is only about pushing and pulling to/from the original served repository, and/or on the same box instead of between different linux boxes.)
4: Forget reviewing the code prior to pushing to central; go ahead and push, using tags to identify what's been reviewed or not, and use Hudson (already working) to tag the latest safe build so team members can know which one to pull from.
If your team uses Mercurial and does code reviews, how do you do get the reviewer to see your changes?

Most of these are possible, some are more tedious than others.
You can use bundle by specifying the tip of the central repo as the --base:
hg bundle --base 4a3b2c1d review.bundle
Might as well just use bundle. That way, the changeset data is also included.
You can push (and pull) to (from) any repository that has a common ancestor(s). If you want to pull from one of your colleagues, they just need to run hg serve on their machine, and you will be able to pull.
This also works, but you will have to maintain multiple heads and be careful about merging. If you don't, it can become easy to base a stable change on top of an unreviewed changeset, which will make it hard to undo if you need to fix that unreviewed changeset later.
Of the options you presented, #1 and #3 are probably easiest, just depending on whether or not you can reach each other's boxes.
On a related note: This is the question that got my colleague and I started on developing Kiln, our (Fog Creek's) Mercurial hosting and code review tool. Our plan, and the initial prototype, would keep multiple repositories around, one "central" repository, and a bunch of "review" repositories. The review process would be started by cloning the central repo into a review repo on the server, and then running a full repo diff between the two, with a simple web interface for getting and viewing the diffs.
We've evolved that workflow quite a bit, but the general idea, having a branch repo to push unreviewed changes to and an interface to review them before you push them into the central repo, is still the same. I don't want to advertise here, but I do recommend giving it a try.

Half answer to this question is using ReviewBoard with Mercurial extention. It allows to push certain revisions for review by issuing the following command
hg postreview tip

I'll add a 5th option - do all development work on named branches, preferably one per task. Allow anything to be committed to a "development" named branch, whether it's in working state or not.
Push to the central repository, have reviewer pull the branch. Perform the review on the branch.
When the review has passed, merge the development work into the appropriate feature branch.
This workflow, which is (to me) surprisingly unpopular, has many advantages:
All work gets committed - you do not have to wait for a review to be done before committing.
You will not build off the wrong version. You only ever build from the feature branch.
In-progress work does not interfere with other developers.
From the development branch, you can either look at the latest changes (e.g. the changesets addressing review comments), compare with the branch point, or compare with the latest feature branch - all of which can give useful information.

Moving from Subversion to Mercurial - how to adapt the workflow and staging/integration systems?

We got all psyched about from from svn to hg and as the development workflow is more or less flushed out, here remains the most difficult part - staging and integration system.
Hopefully this question goes a bit further then your common 'how do I move from xxx to Mercurial'. Please forgive long and probably poorly written question :)
We are web shop that does a lot of projects(mainly PHP and Zend), so we have one huge svn repo, with like 100+ folders, each representing a project with it's own tags,branches and trunk of course. On our integration and testing server(where QA and clients look at work results and test stuff) everything is pretty much automated - Apache is set to pick up new projects automatically creating vhost for each project/trunk; mysql migration scripts right there in trunk too and developers can apply them through simple web-interface. Long story short our workflow is this now:
Checkout code, do work, commit
Run update on the server via web interface(this basically does svn up on server on a particular project and also run db-migration script if needed)
QA changes on the server
This approach is certainly suboptimal for large projects when we have 2+ developers working on the same code. Branching in svn was only causing more headaches, well, hence moving to Mercurial. And here is where the question lies - how does one organize efficient staging/integration/testing server for this type of work(where you have many projects, say single developer could be working on 3 different projects in 1 day).
We decided to have 'default' branch tracking production essentially and then make all changes in individual branches. In this case though how can we automate staging updates for each branch? If earlier for one project we almost always were working on trunk, so we needed one DB, one vhost, etc. now we potentially talking about N-databases per project, N-vhost configs and etc. Then what about CI stuff(such as running phpDocumentor and/or unit tests)? Should it only be done on the 'default'? On branches?
I wonder how other teams solve this issue, perhaps some best practices that we're not using or overlooking?
Additional notes:
Probably worth mentioning that we've picked Kiln as a repo hosting service(mostly since we're using FogBugz anyway)

This is by no means the complete answer you'll eventually pick, but here are some tools that will likely factor into it:
repositories without working directories -- if you clone -U or hg update null you get a repository with no working directory (only the .hg). They're better on the server because they take up less room and no one is tempted to edit there
changegroup hooks
For that last one the changegroup hook runs whenever one or more changesets arrive via push or pull and you can have it do some interesting things such as:
push the changesets on to another repo depending on what has arrived
update the receiving repo's working directory
For example one could automate something like this using only the tools described above:
developer pushes five changesets to central-repo/project1/main
last changeset is on branch 'my-experiment' so csets are automatually re-pushed to optionally created repo central-repo/project1/my-experiment
central-repo/project1/my-experiment automatically does hg update tip which is certain to be on the my-expiriment branch
central-repo/project1/my-experiment automatically runs tests in its working dir and if they pass does a 'make dist' that deploys, which might set up database and vhost too
The biggie, and chapter 10 in the mercurial book covers this, is to not have the user waiting on that process. You want the user to push to a repo that contains possibly-okay-code and the automated processed do the CI and deploy work, which if it passes ends up being a likely-okay repo.
In the largest mercurial setup in which I've worked (20 or so developers) we got to the point where our CI system (Hudson) was pulling from the maybe-ok repos for each periodically then building and testing, and handling each branch separately.
Bottom line: all the tools you need to setup whatever you'd like probably already exist, but gluing them together will be one-off sort of work.

What you need to remember is that DVCS (vs. CVCS) introduces another dimension to versioning:
You don't have to rely anymore only on branching (and get a staging workspace from the right branch)
You now have with DVCS the publication workflow (push/pull between repo)
Meaning your staging environment is now a repo (with the full history of the project), checked out at a certain branch:
Many developers can push many different branches to that staging repo: the reconciliation process can be done in isolation within that repo, in a "main" branch of your choice.
Or they can pull that staging branch in their repo and test things out before pushing back.
From Joel's tutorial on Mercurial HgInit
A developer don't necessary have to commit for other to see: the publication process in a DVCS allows for him/her to pull the staging branch first, reconcile any conflict locally, and then push to the staging repo.

Mercurial Branch Repositories with SUBREPOS

I'm trying to determine how people use "branch repositories" while also using subrepos.
Let's say I have repo Main containing a solution file (.NET), and populated with subrepos A, B, C:
/Main
- A
- B
- C
MainSolution.sln
A, B, and C, while being shared between other "Main" repos, are very tightly integrated into Main project. Thus, a major feature to the Main repo will require modifications to the subrepos (i.e., they are shared libraries, but are very actively developed).
Now it is time to add a feature. This feature is too big for one person to handle, and thus the code will need to be pushed to the central repo so others can help. We'd also need to be able to go back to the last "stable" code before the feature development began in case a bugfix is needed. I believe I have two options at this point: (1) create a named branch in the Main repo, or (2) create a new clone of Main. Since there are subrepos, both of these options have repercussions not present typically.
Option 1) Creating a named branch will, I presume, allow modifications to the subrepos to be committed/pushed, but only other people who have also updated to that branch in their clone of Main will be affected, since the .hgsubstate file is tracked. However, the subrepos will get a new head, and thus the (possibly) experimental feature would end up getting pushed to the central repo. Am I understanding this correctly?
Option 2) There are numerous advocates for the "don't use named branches, use 'branch repositories'", which are literally clones of the main repo, but named differently and existing on the central server. This is a little appealing to me, as it seems to keep things separated (and thus detached from disaster as co-workers --and myself!-- are still learning Mercurial). But this workflow seems completely broken when subrepositories are involved, since creating a clone of the Main repo does not create new, separated clones of the subrepos. It's a new clone, but it's still pointing at the same subrepos, and thus changes made to them will find their way back into the subrepos! I realize this is by design, and it's one of the really cool things (to me) about Mercurial. But how on earth do people use this branch repository workflow with subrepositories? It is completely inconceivable that, for each feature/experiment/version/whatever, I'm going to create a new clone (on the central server) of the Main repo, AND create clones (on the central server) of the subrepos, AND modify all the .hgrc/.hgsub paths to point to the proper central repos.
At this point, I'm just trying to understand HOW people work on a complicated project and use subrepos with branch repositories. Any thoughts?

You have other options as well. You could use bookmarks, for example. Since version 1.9, bookmarks can be pushed and pulled, they're not just local anymore. Since you often don't want a development "branch" to stick around as a named branch after that new feature is completed, bookmarks are often a better choice for that kind of thing. I tend to use bookmarks for new development and save real branches for released versions.
You should also be aware that subrepositories don't have to be shared between multiple main repositories in the way you describe. You can actually have the subrepositories stored inside a main repository (as opposed to having them at the same level as the main repos, or stored in some other location entirely), which would make them unique to each main repository, except you can push and pull from the subrepos in other main repos when you want to share those changes. This is the way I usually do it.
Unfortunately much of this is difficult to explain without a whiteboard, so please let me know if this isn't clear.

I prefer named branches for features that will most likely eventually get merged into the default branch. It is much easier to switch branches than switch repos.
With named branches you never need to worry about accidentally pushing your unstable branch of development into the stable repo. The named branch is already there, but won't be retrieved via an update unless a developer asks for it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008