To keep my own versioned app or not - mercurial

I need some opinions here.
I'm working on a Django project using buildout to get the dependencies, etc...
I use mercurial as DVCS.
Now... I need to customize one of the dependencies, so I can do one of the following:
(* The changes may not be useful for everyone else.)
1- Do a fork of the project in (github, bitbucket, etc...) maintain my version, and get the dependency with (mercurial or git) recipe.
2- Clone the project, put it in the PYTHONPATH, erase DVCS dirs and add it to my projects version. So every change will be private. Here I need to erase all the info from their DVCS or something.
Any other you can think of.
I'm missing something? I'm too off?
Thanks!

Esteban, take these steps: I'll talk in mercurial-speak, but this is all do able in git too.
clone their project
make your clone of their project a subrepo in your project
That gives you the best of all worlds. You can edit code in your project and their project without paying attention to which is which, and when you commit the changes to your code go into your repo along with a pointer to a new changeset in your clone of their project. Then when you want to update your clone of their project you can do so in place and merge simply.
So this is pretty much what you said in '1' but there's no need to do a fork or host that repo publically. Just edit their clone as a subrepo of your project and never push (which wouldn't work anyway since you don't have write access to their repo).
Your option two's primary drawback is that as they modify and improve their project on which you depend you'll have a hardtime pulling their improvements in and merging them with yours.

Well if you're using DVCS then all your commits are kept as change sets, and people can choose to apply your change set or not. So as long as you comment that change, people can choose apply the change or not as they see fit. What's more if they don't want that change, but want your other changes, they can pick and choose. So the truth is the DVCS takes care of the problem for you (provided the people pulling from you are using the DVCS properly).
Personally, I recommend forking, but like I said, it doesn't really matter.

You ask this question in a rather confusing way, and I don't know if you really understand the point of a DVCS.
The whole point of a DVCS is to allow you to have your own private repository. You do not need to publish your repository on github or bitbucket or any of those places unless you want to, but I certainly would not erase the DVCS information.
If the upstream project makes changes you do want in addition to your own private changes, you will have a devil of a time merging them unless you keep the DVCS information around.
Using Mercurial, you can include a project in yours by using the Mercurial subrepo feature.

Related

Mercurial: Incomplete central repository possible?

I want to realize the following setup:
AtWork:MercurialRepo <-> Internet:MercurialRepo <-> AtHome:MercurialRepo
Problem is the repository is several gigs. I already have the entire repo at home (through bundling->cdrom->unbundling). The thing is, I do not want to store the whole repository on the internet. Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet? How do I best accomplish my goal? From time to time I would need to do the tedious bundling -> cdrom -> unbundling route, just to update everything else, but in general I do want to use the internet route and do not want to store the whole repo there.
So, as you've found out by now you can't selectively clone some files from a repository. The best you can do is clone a subset of all branches; but you will get the entire past history of these branches, for all files in the repository. So, unless a lot of the big files are only known in some branches and not others, this won't help you.
Since your problem is the large size of files (rather than a long and bulky history), you probably need to break it down into several "subrepositories" of manageable size. Note that the subset you are interested in cloning must be a subrepository; cloning the main repo necessarily includes the subrepositories. The mercurial subrepository documentation recommends that you make a trivial ("thin shell") main repo, and put all your project code in subrepositories.
Subrepositories are a complex solution, and are considered a "feature of last resort" by the mercurial team. It's a complex setup, there are various limitations (see the docs), and you'll have the extra complication of trying to convert your repo in a way that will preserve file history. So, it's worth considering ways to avoid this:
a) It would be best if you can avoid the middle copy of your repo; is there no way you can set up ssh access or a proxy so that your home repo can talk to your work repo directly? (Or vice versa; it's enough if one of the locations is able to contact the other).
b) You could carry the repo on a USB stick, as #vaclav's answer suggests.
c) Or maybe you should just bite the bullet and clone the entire repo on the internet.
Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet?
Not folders, but some parts of repo - yes
You can push -b (only some branch(es)) or push -r (revision with ancestors: for latest work it will be -r tip), but final size of transfer is heavy dependent from type of your DAG - in case of a lot of cross-branch merges you probably skip only small part of changesets
I have small idea, bit different from what you asked, but...
If I have same issue, I would thing of using usb flash as whole repository (if you are about 10 or 20 gig it should be cheap). So at work you can copy, or clone whole repo to usb, pull new changes from it at home, and after your home working is done, push it to repo on flash, then pull it to repo at work(I use even temporary commits for undone work which I revert to working directory and strip, so I can continue where I ended).
But definitely easiest way, is to try get some connection to work servers, or to your machine at work. Or get bigger space for repo at internet. So, just another Ideat. HTH
Is not really possible. The closest thing would be to use sub-repositories which will effectively allow you to have only part of your big repo on the net.

How to prevent Mercurial repos from being able to pull from specific repos

We've worked out a repository structure we'd like to maintain and under that structure will prevent RepoA from being able to pull from RepoB. How can I setup a repo so that it can only push to a certain repo but not pull from it?
You can technically push/pull from any location and would probably try to avoid mucking with that flexibility unless you're good at writing hooks. And, if anyone has write access to UAT, you cannot prevent any changes from being pushed to UAT as you will need to do that when some new bits need to enter the UAT branch for testing.
What it sounds like you are trying to do is preserve a "stable" while allowing work to continue on an "anonymous" branch (your alpha) that was cloned from UAT. Eventually, you have to merge that back into UAT, so I would really just give a few senior-level developers write access to UAT and trust that they follow the proper procedure when working with the branches.
I suggest reviewing the Guide to Branching and the Managing Releases section of the Hg Book before trying to invent a new way to seemingly protect your branches.
Edit: I did find a similar question for preventing the default push, but allowing pulls. It shows you the basics on implementing a preoutgoing hook, which is not what you want, but similar in nature.

Doing without partial commits the "Mercurial way"

Subversion shop considering switching to Mercurial, trying to figure out in advance what all the complaints from developers are going to be. There's one fairly common use case here that I can't see how to handle.
I'm working on some largish feature, and I have a significant part of the code -- or possibly several significant parts of the code -- in pieces all over the garage floor, totally unsuitable for checkin, maybe not even compiling.
An urgent bugfix request comes in. The fix is nice and local and doesn't touch any of the code I've been working on.
I make the fix in my working copy.
Now what?
I've looked at "Mercurial cherry picking changes for commit" and "best practices in mercurial: branch vs. clone, and partial merges?" and all the suggestions seem to be extensions of varying complexity, from Record and Shelve to Queues.
The fact that there apparently isn't any core functionality for this makes me suspect that in some sense this working style is Doing It Wrong. What would a Mercurial-like solution to this use case look like?
Edited to add: git, by contrast, seems designed for this workflow: git add the bugfix files, don't git add anything else (or git reset HEAD anything you might have already added), git commit.
Here's how I would handle the case:
have a dev branch
have feature branches
have a personal branch
have a stable branch.
In your scenario, I would be committing frequently to my branch off the feature branch.
When the request came in, I would hg up -r XYZ where XYZ is the rev number that they are running, then branch a new feature branch off of that(or up branchname, whatever).
Perform work, then merge into the stable branch after the work is tested.
Switch back to my work and merge up from the top feature branch commit node, thus integrating the two streams of effort.
Lots of useful functionality for Mercurial is provided in the form of extensions -- don't be afraid to use them.
As for your question, record provides what you call partial commits (it allows you to select which hunks of changes you want to commit). On the other hand, shelve allows to temporarily make your working copy clean, while keeping the changes locally. Once you commit the bug fix, you can unshelve the changes and continue working.
The canonical way to go around this (i.e. using only core) would probably be to make a clone (note that local clones are cheap as hardlinks are created instead of copies).
You would clone the repository (i.e. create a bug-fix branch in SVN terms) and do the fix from there.
Alternatively if it really is a quick fix you can use the -I option on commit to explicitly check-in individual files.
Like any DVCS, branching is your friend. Branching a repository multiple ways is the bread and butter of these system. Here's a git model you might consider adopting that works quite well with Mercurial, also.
In addition to what Santa said about branching being your friend...
Small-granularity commits are your friend. Rather than making lots of code changes in a single commit, make each logically self-contained code change in its own commit. Then it will be a lot easier to cherry-pick changes to merge between branches.
Don't use Mercurial without using the Mq Extension (it comes pre-packaged in the default installation). In addition to solving your specific problem, it solves a lot of other general problems and really should be the default way that you work (especially if you're using an IDE that doesn't integrate directly with Hg, making switching branches on the fly a difficult way to work).

Small, temporary branch in Mercurial

I've read a lot about Mercurial and branching in it, however, I am still very much a version control newbie.
I'm currently working on a project, where I have been tasked to work on a new module.
I have a "main" repository, which contains the latest code from the rest of the project, and a cloned repository (call it "task") where I am doing my work now.
I am a bunch of commits into my task, and find that I would like to do a little "experiment" with the way my program reads/stores/handles configuration data.
Now, if I understand VC best-practices correctly, this would be a great time to branch.
If I start into this experiment, and I like where it's going, I will want to merge it back into my "task" repository on the "default" branch pretty quickly.
On the other hand, if I don't like how it's going, I'll probably just scrap the branch.
The way I am most comfortable branching is through cloning, however I don't think this would be the best approach in this situation, as I'll only be changing a few files, but apparently using named branches is permanent, which doesn't seem appropriate here either.
What is your advice / best practice for this kind of situation?
I'm relatively new to Mercurial, but I know exactly the situation you are describing. I did some research on this before, and my conclusion was that the easiest way was to clone my repository.
See this answer for some more insight.
Also, this is a great guide to branching in Mercurial :)
Go with a clone, no doubt about it. A named branch in Mercurial is something that even the Mercurial folks say you don't need all that often. One of the beautiful things about DVCS is the fact that you can easily clone the repo and try some new and different things, and if they work, great, merge it back in to the main repo, otherwise, delete it all.
I personally use a "Branch By Feature" approach with Mercurial, which means that I will make a clone of my primary repo for each feature I'm working on. This includes spikes and experiments.

Mercurial setup: One central repo or several?

My company is switching from Subversion to Mercurial. We're using .NET for our product. We have a solution with about a dozen projects that are separate modules with no dependencies on each other. We're using a central repo on a server with push/pull for our integration build.
I'm trying to figure out if I should create one central repo with all the projects in it, or if I should create a separate repo for each project. One argument for separate repos is that branching the individual modules would be easier, but an argument for a single repo is easier management and workflow.
I'm very new to hg and DVCS, so some guidance is greatly appreciated.
ETA: At hginit.com, Joel says:
[I]f you’re used to having one big
gigantic repository for the whole
company, where some people only check
out and work on subdirectories that
they care about, this isn’t a very
good way to work with Mercurial—you’re
better off having lots of smaller
repositories for each project.
It'd be great if someone could expand on this or point me to more documentation.
One thing you should take into consideration here is the fact that Mercurial does not support checking out directories like subversion does. One typical subversion setup is to have one giant repo with multiple separate projects in it, and when somebody needs code they will just checkout a subdirectory containing that project. You can't do this in mercurial. You either take the whole repo, or nothing. If everybody working on these projects does not need all the code, all the time, you might want to split it up into separate repositories.
EDIT: This link might be helpful in setting things up, in particular the "Publishing Multiple Repositories" section.
if completely separate repos don't work for you maybe have each project as a subrepo of some umbrella repo. I have to say that seperate repos sounds like what you need though given that each project sounds totally independent.
I'm fairly new to Mercurial myself (my company is making the leap from SourceSafe) so I don't know what more experience would say.
For me it makes sense to have one repository per Visual Studio Solution. If your modules are truly not dependent on each other, why are they all in the same solution? If you have a good reason for them all being in one solution, then that's probably the reason to keep them in one repository. If there's not a good reason for them to be in one solution, then a repository and a solution for each makes more sense to me.
Edit: So, since all the modules are built together and need to integrate, that would push me towards a single solution and a single repository.
Mercurial does a great job of merging, but the one thing I've had issues with is the solution file when merging the addition of more than one project at a time. It gets confused with multiple End Project lines. So, as long as you aren't adding new projects very often, your merges should be smooth.
From my experience, and not based upon studies etc, I would say that each logical blob is a repository. If you share code between subprojects, they need to be in the same repo. There will be full subrepo functionality, but currently (apr 2010) it's not fully implemented.