How to properly use hg share extension? - mercurial

Say I have cloned a repo to a directory called ~/trunk and I want to share a branch named my-new-branch to the directory ~/my-new-branch. How would I do that with the hg share extension?
This is what I've been doing:
cd ~
hg share trunk my-new-branch
But then when I cd into the new directory I have to hg up to the branch?
Confused.

IMO share is a very useful command which has some great advantages over clone in some cases. But I think it is unfortunately overlooked in many instances.
What share does is to re-use the 'store' of Mercurial version control information between more than one local repository. (It has nothing directly to do with branching.)
The 'store' is a bunch of files which represents all the history Mercurial saves for you. You don't interact with it directly. Its a black box 99.99% of the time.
share differs from the more commonly-used clone command in that clone would copy the information store, taking longer to run and using potentially a lot more disk space.
The "side effect" of using share rather than clone is that you will instantly see all the same commits in every shared repository. It is as if push/pull were to happen automatically among all the shared repos. This would not be true with clone, you'd have to explicitly push/pull first. This is quite useful but something to be mindful of in your workflow because it may surprise you the first time you use it if you are only used to clone.
If you want to work in multiple branches (named or unnamed) of your project simultaneously,
either clone or share will work fine. One you have created the second repository, yes you need to update it to whatever changeset you want to begin working on.
Concrete example using share:
hg clone path\to\source\repo working1 # Create local repo working1 cloned from somewhere
cd working1
hg up branchname1
cd ..
hg share working1 working2 # shares the 'store' already used for working1 with working2
cd working2
hg up branchname2 # some other branch or point to start working from
As soon as you commit something in working1 that commit will be visible in the history of working2. But since they are not on the same branch this has no real immediate effect on working2.
working2 will retain path\to\source\repo as its default push/pull location, just like working1.
My own practice has been to create numerous locally shared repositories (quick, easy, saves space) and work in various branches. Often I'll even have a few of them on the same named branch but set to different points in history, for various reasons. I no longer find much need to actually clone locally (on the same PC).
A caveat -- I would avoid using share across a network connection - like to a repo on a mapped network drive. I think that could suffer some performance or even reliability issues. In fact, I wouldn't work off a network drive with a Mercurial repo (if avoidable) in any circumstance. Cloning locally would be safer.
Secondly -- I would read the docs, there are a few weird scenarios you might encounter; but I think these are not likely just based on my own experience.
Final note: although share is implemented as an "extension" to Mercurial, it has been effectively a part of it since forever. So there is nothing new or experimental about it, don't let the "extension" deal put you off.

Related

Mercurial: devs work on separate folders, why do they have to merge all the time

I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.

Mercurial: Incomplete central repository possible?

I want to realize the following setup:
AtWork:MercurialRepo <-> Internet:MercurialRepo <-> AtHome:MercurialRepo
Problem is the repository is several gigs. I already have the entire repo at home (through bundling->cdrom->unbundling). The thing is, I do not want to store the whole repository on the internet. Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet? How do I best accomplish my goal? From time to time I would need to do the tedious bundling -> cdrom -> unbundling route, just to update everything else, but in general I do want to use the internet route and do not want to store the whole repo there.
So, as you've found out by now you can't selectively clone some files from a repository. The best you can do is clone a subset of all branches; but you will get the entire past history of these branches, for all files in the repository. So, unless a lot of the big files are only known in some branches and not others, this won't help you.
Since your problem is the large size of files (rather than a long and bulky history), you probably need to break it down into several "subrepositories" of manageable size. Note that the subset you are interested in cloning must be a subrepository; cloning the main repo necessarily includes the subrepositories. The mercurial subrepository documentation recommends that you make a trivial ("thin shell") main repo, and put all your project code in subrepositories.
Subrepositories are a complex solution, and are considered a "feature of last resort" by the mercurial team. It's a complex setup, there are various limitations (see the docs), and you'll have the extra complication of trying to convert your repo in a way that will preserve file history. So, it's worth considering ways to avoid this:
a) It would be best if you can avoid the middle copy of your repo; is there no way you can set up ssh access or a proxy so that your home repo can talk to your work repo directly? (Or vice versa; it's enough if one of the locations is able to contact the other).
b) You could carry the repo on a USB stick, as #vaclav's answer suggests.
c) Or maybe you should just bite the bullet and clone the entire repo on the internet.
Is there a way to temporarily exclude folders from versioning in order to push/pull only a subset of the repo I am working on through the internet?
Not folders, but some parts of repo - yes
You can push -b (only some branch(es)) or push -r (revision with ancestors: for latest work it will be -r tip), but final size of transfer is heavy dependent from type of your DAG - in case of a lot of cross-branch merges you probably skip only small part of changesets
I have small idea, bit different from what you asked, but...
If I have same issue, I would thing of using usb flash as whole repository (if you are about 10 or 20 gig it should be cheap). So at work you can copy, or clone whole repo to usb, pull new changes from it at home, and after your home working is done, push it to repo on flash, then pull it to repo at work(I use even temporary commits for undone work which I revert to working directory and strip, so I can continue where I ended).
But definitely easiest way, is to try get some connection to work servers, or to your machine at work. Or get bigger space for repo at internet. So, just another Ideat. HTH
Is not really possible. The closest thing would be to use sub-repositories which will effectively allow you to have only part of your big repo on the net.

GIt Workflow dev and production - where should I create the repos?

I've read somewhere* a setup like this would be nice:
Two main branches, one for each server.
Pushing to master sends changes on to live;
Pushing to dev/stage (or whatever you call it) sends changes to staging;
Workflow:
Create branch from dev;
work locally until you're ready to test;
merge back to dev;
push to Hub, which sends changes to dev/staging server.
Once you're ready with those to go live:
merge from dev to master,
then push master to Hub, which sends those changes on to the live server.
Two main branches, one for each server.
So I have one branch "production" on "webroot/myliveapp/"
and another branch "development" on "webroot/devapp/"
Where should the repository be ?
UPDATE:
I mean:
We will have, according to this flow:
Prime repo;
Bare repo hub;
Clones;
The development and production branches should belong to one repository, right ?
If this is correct, then were should we issue the FIRST git init command ?
On our Prime repo ?
So we will have:
"webroot/myliveapp/" - production branch;
"webroot/devapp/" - development branch;
"webroot/.git" - Prime repository;
Does this make sense ?
Or should the Prime repository correspond to our production branch location ?
*Note: if you need a context about what workflow I'm trying to implement, is this one:
http://joemaller.com/990/a-web-focused-git-workflow/
Thanks for the update on your question, it is more clear now.
I believe the problem you're having is based on a misunderstanding of Git workflow; Git doesn't equate directories to branches, it equates a view of your filesystem to branches. This is powerful - but easy to shoot yourself in the foot. Let me explain.
Git acts more like a database-backed, differentially-versioned, history tracking filesystem in itself. It is "above" your filesystem, not "part of" it. It doesn't use your filesystem to represent branches, rather, when you check out a different branch, all the files in your filesystem will change to be the files in that branch. You are asking Git to make your filesystem represent the alternate reality of that branch.
If you are on branch master, and it has a file root/foo.txt committed, and you check out branch experiment, which does not have root/foo.txt committed, you will find that file gone when you look for it. It is a part of master, not experiment, and so it is not present in your filesystem. This is why Git is really picky about your current branch being committed before it lets you switch branches - if you have unstaged changes on your filesystem that Git doesn't know about, it refuses to blow them away by overwriting them with a different reality. You have to intervene to make things right first.
So, to answer the quesiton, don't create subdirectories for "myliveapp" and "devapp" - create different branches. Just have your one codebase under "webroot". Then, hack away on, say, the "unstable" branch, committing your changes as usual. You can then switch all of the files in your repository to be at the version of your dev server's files by switching to the "devapp" branch, and you can similarly switch back to "unstable" at any time.
When you want to update a branch, e.g. doing an update of your dev server, you can merge "unstable" into "devapp". This will make all of the files of "devapp" look like those of "unstable", bringing it up to date.
One other thing to note: the difference between a prime repo, a bare repo, and clones is almost nil. There is virtually no difference in the software; rather, it's a human convention to say "Linus' kernel is the canonical Linux kernel". With that understanding:
A prime repo is just one repository that everyone agrees holds the "canonical" version of the software. That is, whenever a developer has made a change they want everyone to see, rather than saying, "Pull my version of devapp", they can say, "I've published my changes to our prime repo." It's simply an easy convention for people to rally around.
A clone is a copy of some other repo. I could clone the prime repo, make changes, and then you can clone my repo. If you make changes, you can push them either onto the prime repo or onto mine, as long as the merge is valid and you have permissions on the computer.
A bare repo simply has no "working copy" - there is no "webroot" directory on that computer. It's empty with only the .git directory - which is fine for servers where nobody needs to alter the files.
Finally, the .git dir doesn't hold the files of your repo, it holds the git configuration and database. It's your entire repository history in database form, which is used to populate the rest of the repo with a particular version of your software. That's why I made the comment: you can locally check out any version of any alternate reality of the repository, with no network communication, at any time - because it's all there in the .git dir. The only network communication necessary is for when you want to sync your local repository to some other repository, using push or pull.

Merging changes to a workspace with uncommitted changes

We've just recently switched over from SVN to Mercurial, but now we are running into problems with our workflow. Example:
I have my local clone of the repository which I work on. I'm making some highly experimental changes to our code base, something that I don't want to commit before I'm sure it works the way it is supposed to, I don't want to commit it even locally. Now, simultaneously, my co-worker has made some significant improvements/bug fixes which I need. He pushes his commits to our main repository. The question is, how can I merge his changes to my workspace without the requirement that I have to commit all my changes, since I need his changes to test my own code?
A more day-to-day problem we have with the exact same workflow is where we have a couple of configuration files which are in the repository. Each developer makes a couple of small environment specific changes to the configuration files, but do not commit the changes. These couple of uncommitted files hinders us from making any merges to our workspace, just like with the example above. Ideally, the configuration files probably shouldn't be in the repository, unfortunately, that's just how it has to be for here unnamed reasons.
If you don't want to clone, you can do it the following way.
hg diff > mylocalchanges.txt
hg revert -a
# Do your merge here, once you are done, import back your local mods
hg import --no-commit mylocalchanges.txt
There are two operations, as you've discovered, that makes changes from one person available to someone else (or many, on either side.)
There's pulling, which takes changes from some other clone of the repository and puts them into your clone.
There's pushing, which takes changes from your repository and puts them into another clone.
In your case, your coworker has pushed his changes into what I assume is your central master of the repository.
After he has done this, you can pull the latest changes down into your repository, and merge them into your branch. This will incorporate any bugfixes or changes your coworker did into your experimental code.
This gives you the freedom of staying current on other coworkers development in your project, and not having to release your experimental code until it is ready (or even at all.)
So, as long as you stay away from the Push command, you're safe.
Of course, this also assumes nobody is pulling directly from your clone of the repository, if they do that, then of course they will get your experimental changes, but it doesn't sound like you've set it up this way (and it is highly unlikely as well.)
As for the configuration files, the typical way to do this is that you only commit a master file template into the repository, with a different name (ie. an extra extension .template or similar), and then place the name of the real configuration file into the ignore filter.
Each developer then has to make his or her own copy of the template, rename it, and change it in any way they want, without the risk of committing database connection strings, passwords, or local paths, to the repository.
If necessary, provide a script that will help the developer make the real configuration file if it is long and complex.
Regarding your experimental changes, you should commit them. Often.
Simply you commit them in a clone you don't push. You only pull to merge whatever updates you need from other repos.
As for config files, don't commit them.
Commit template files, and script able to generate complete config files from the template.
That way, developers will only modify "private" (i.e. not committed) config files with their own private values.
If you know your uncommitted changes will not collide with the merge commit that you are creating - then you can do the following...
1) Shelve the uncommitted changes
2) Do the pull and merge
3) Unshelve the uncommitted changes
Shelf effectively stores your uncommitted changes away as into diff (relative to your last commit) then rolls back those files in your local workspace. Then un-shelving then applies that diff, bringing back your uncommitted changes.
Tools such as TortoiseHg have shelf built in.

Ponderings of a Subversion User: What is a "branch" in Mercurial terms?

I'm a Subversion user, and I think I've got my head mostly around it all now. So of course now we're thinking of switching to Mercurial, and I need to start again.
In our single repository, we have the typical branches, tags, trunk layout. When I want to create a feature branch I:
Use the repo browser to copy trunk to branches/Features/[FeatureName].
Checkout a new working copy from branches/Features/[FeatureName].
Start working on it.
Occasionally commit, merge trunk in, resolve conflicts and commit.
When complete, one more merge of trunk, then "Reintegrate" the feature branch into trunk.
(Please note this process is simplified as it doesn't take into account release candidate branches etc).
So I have questions about how I'd fulfil the same requirements (i.e. feature branches rather than working on trunk) in Mercurial:
In Mercurial, is a branch still within the repository, or is it a whole new local repository?
If we each have a copy of the whole repository, does that mean we all have copies of each other's various feature branches (that's a lot of data transfer)?
I know Mercurial is a DCVS, but does that mean we push/pull changes from each other directly, rather than via a peer repository on a server?
I recommend reading this guide
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial//
In Mercurial, is a branch still within
the repository, or is it a whole new
local repository?
The equivalent of the subversion way of working would be a repository with multiple heads in mercurial. However, this is not the idiomatic way of doing things. Typically you will have only one head in a given repository, so separate repositories for each branch.
If we each have a copy of the whole
repository, does that mean we all have
copies of each other's various feature
branches (that's a lot of data
transfer)?
Yes, if you look at the history of the head of your local repository, then you'll be able to see all the feature branches that were merged in. But mercurial repositories are remarkably space efficient. For example, I have done a hg clone https://www.mercurial-scm.org/repo/hg to get the source for mercurial itself, and it is only 34.3 MB on an NTFS file system (compared to the source code download, which is 1.8 MB). Mercurial will also make use of hardlinks if your file system supports it, so there is little overhead if you clone a repository to another location on the same disk.
I know Mercurial is a DCVS, but does
that mean we push/pull changes from
each other directly, rather than via a
peer repository on a server?
One way of working is indeed to have each developer expose a public repository in which he pushes his own changes. All other developers can then pull what they want.
However, typically you'll have one or more "blessed" repositories where all the changes are integrated. All developers then only need to pull from the blessed repository. Even if you didn't explicitly have such a blessed repository I imagine people would automatically organize themselves like that, e.g. by all pulling from a lead developer.
Steve Losh's article on branching in mercurial linked above is fantastic. I also got into some explaining of branching and how the DAG works in a presentation I gave a couple of months ago on mercurial that's out on slideshare. The pertinent slides start at slide #43.
I think that understanding that all commits to the same repository are stored in a DAG (Directed Acyclic Graph) with some simple rules really helps demystify what's going on.
a node with no child nodes is a "head"
the root node has no parents
regular nodes have a single parent
nodes that are the result of a merge have two parents
if a merge node's parents are from different branches, the child node's branch is inherited from the first parent
Named branches are really just metadata labels on commits, but really aren't any different than the anonymous branches that happen when you merge someone elses work into your repository, or if you go back to an earlier version and then make a commit there to make a new head (which you can later merge).