How do we manage granular changesets in the central/shared repository?

How do we manage granular changesets in the central/shared repository? - mercurial

I've used Mercurial for years locally, but now we are doing a pilot of switching over from Subversion at my company.
We're embracing the fact that developers will now be making more granular changesets--some of which may not even build. When developers push their changes to the central repository, all of these changesets will show up in the history - this is natural and expected.
My question is: how do we deal with the fact that, because changesets are more granular, this makes it possible for developers to update to a revision that doesn't build? We are coming from a world where you can checkout anywhere in the repository and reasonably expect to be able to make a release from that point. With DVCS's, how do you tell where a "safe" revision is?
This issue is somewhat addressed in this question, but I'm more interested in finding out how to deal with this using branch repos (and not named branches).
I understand there are ways to modify history (e.g. the collapse extension) so that the changes get collapsed in the history of the repository, but we'd like to preserve the history.
Looking at the mercurial and mozilla trees, I don't see a clear way to tell where safe revisions are to sync. Is this not as important as we think it is?

Do you really need to check out ANY old revision and expect it to build?
I agree with you that you should be able to expect that the tip will always build.
This can be easily achieved, like Lazy Badger already said, by some kind of "don't push unfinished work to the main repo" policy / mutual agreement.
Concerning older revisions:
if you want to build older official releases of your software again (like, you're at version 1.6 now and want to make an v1.2 executable), you can expect the 1.2 tag to build as well if everybody always sticked to the "don't push unfinished work" policy.
if you want to take ANY revision and build it...well, do you really need this? Any bugs in previous versions will probably refer to official releases (see the "1.2 tag" stuff above), and not to something in between.
if you really need a buildable version from in between the official releases (for whatever reason), you'll have to look at the commit messages and find the commit that says feature 'foo' finished (and not the one that says started implementation of feature 'foo').
Yes, this requires a bit of thinking / common sense, but I can't imagine that you will need this really often.

Political solution may be "don't push crap into main repo", isn't it?
Technical solution will be not tag, but one bookmark ("KnownAsGood"), applied to the last working HEAD of default branch and agreement between devs "Update not to tip, but to bookmark"
Who'll test commits and move bookmark is another question from "project management" tag

If you're using feature branches and merging them without fast forward merges, you still have only stable builds on the mainline, but can see the unstable stuff on the feature branches if desired.

You can always tag your commits. These could be major/minor releases, successful builds or however you like. You can see the mercurial and mozilla tags in their repos.

Related

Mercurial sparse checkout

In this F8 conference video(starting 8:40) from 2015 they speak about the advantages of using Mercurial and a single repository across facebook.
How does this work in practice? Using Mercurial, can i checkout a subdirectory (live in SVN)? If so, how? Do i need a facebook-mercurial-extension for this
P.S.: I only found answers like this or this from 2010 on SO where i am not sure if the answers still apply with all the efforts FB put into it.

From your question it is not clear if you are looking for a workflow (the monorepo vs multiple repos debate) or for performance and scaling for a huge code base.
For the workflow, I suggest googling for monorepo. It has its pros and cons, you need to understand your situation and current workflow to decide. For the performance and scaling, keep reading.
The idea of remotefilelog is not to checkout a subdirectory (as you mention), the idea is to checkout everything. In order to do that in an efficient way, you need two extensions actively developed by Facebook:
remotefilelog. This gives you something conceptually similar to a shallow clone. This reduces hg clone and hg pull time.
fsmonitor (previously called hgwatchman, it is now part of mercurial core). This dramatically reduces time of local operations such as hg status. Note that fsmonitor is independent from remotefilelog. You can start experimenting with this, since it doesn't require any setup on the server side.
With a recent mercurial (which I strongly suggest) you can shave off the additional startup time of the Python interpreter using CommandServer + CHg.
Some additional notes:
I tested extensively fsmonitor. It works very well, on huge repos the time of hg status is reduced from 10 secs to less than 1 sec (and the majority of this 1 sec is Python startup time, see above for CHg). If your repository is really huge, you might need to fine tune some inotify kernel parameters (or the equivalent on MacOSX). The fsmonitor documentation has all the information you need.
I didn't test remotefilelog, although I read everything I found about it and I am sure it works. Depending on how development is done (everybody has always Internet connectivity or not, the organization has its own master repo or not) there can be a caveat: it partially transforms the decentralized hg into a centralized VCS like svn: some operations that normally can be done offline (for example: hg log and the first hg update to a changeset in the past) will now require connectivity to the master repository.
Before considering remotefilelog, I used extensively the largefiles extension on a huge repo. It has the same drawbacks than remotefilelog and some confusing corner cases for users that want to use hg just to get things done without taking the time to understand how it works. If I were to manage another huge repo, I would use remotefilelog instead than largefiles, although their use case is not really the same.
Mercurial has also support for subrepositories (doc1, doc2). The problem is that it changes the behavior of hg depending on where you are in the source tree. Again, if the developers don't care about really understanding how hg works, it will be just too confusing.
Additional information:
Facebook Engineering blog post
scaling mercurial wiki, although not completely up to date
just by googling mercurial facebook.

i am not sure if the answers still apply with all the efforts FB put
into it
(Early 2017) The answers in the questions linked still apply (because they occasionally get updated) but note that you will have to read all the comments and answers.
remotefilelog essentially allows on demand shallow clones (so you don't fetch the history for everything for all time) but you still fetch the essential metadata for, and checkout across, all the directories of the repo at the desired revision.
Using Mercurial, can i checkout a subdirectory (li[k]e in SVN)? If so, how?
https://stackoverflow.com/a/40355673/7836056 discusses how you might use third party extensions to allow narrow/sparse checkouts (Facebook's sparse.py) or narrow clones (Google's NarrowHG) with Mercurial thus only "creating" a single directory from within the main repository (albeit with radically different tradeoffs).
(Note phrasing matters: "sparse checkout" means a very specific action when referring to distributed version control in a manner that doesn't exist when using it to refer to centralised version control)

mercurial workflows when collaborating with people who make errors

I need to collaborate on a Mercurial repository (let's call it "foo") with some people who are novices at version control in general, Mercurial in particular.
I am trying to come up with a workflow that will enable us to use Mercurial without a lot of extra effort on either their end (confusion) or my end (cleanup).
My concern is that as novices I need to expect them to make errors, and I need to allow them to do so in a controlled way, otherwise they won't use the tool at all because they're too scared. But I don't want a bad change to pollute the repository unnecessarily.
I do not expect them to be able to merge properly or to use the mq extension. This is not
a matter of underestimating them, instead it is a realistic assessment given past experience with SVN and my own experience with Hg.
Which of the following approaches would make the most sense? Or if there's a better approach, what is it?
We have a repository foo-submit, read/writable by all, and a repository foo-trunk, readable by all but writable by admins. Users pull from foo-trunk, and push changes to foo-submit. Cleanup: If I find a good change, I let it through as is; if I find a bad change, I "bypass" it by merging with the previous version.
We have a repository foo-trunk readable by all, writable by admins. Each user is responsible for maintaining their own clone which is read-accessible to the rest of the team. When someone wants to push a change, they let me know and I pull it from their repository, with proper cleanup as necessary (same as in #1)
We have a repository foo-dev, read/writable by all, and a repository foo-trunk, readable by all but writable by admins. Users pull/push to foo-dev, and work in named branches if they need to do extensive development. I am responsible for performing merges and cleanup. The foo-trunk repository is merely for having a "clean" copy that has branches where the tip is always in a good state.

Good question, and one that I've never seen a great answer to.
That said, I like option 2. This is the "Pull Request" model used by the Linux Kernel and made popular by GitHub. It allows the admins to act as gatekeepers / reviewers, only allowing good change-sets to get past them when they're happy. If they decide a developer hasn't delivered something worthy, then the pull request is rejected (with reasons). Then the developer can go away, fix up their code / repo, and submit another pull request.
Running a server with something like RhodeCode on it can help keep on top of pull requests. As things grow you can have lower level gatekeepers that deal with subsystems, and higher level gatekeepers that deal with the whole project.
The bit I've never quite got my head around is what should happen to change-sets that are rejected, and that the developer decides to abandon rather than fix up and try again. They could be closed, but then could possibly appear by mistake as part of a future pull request. They'd be harmless, but possibly confusing. The alternative is stripping them, but that sounds like giving people tools they'll cut themselves on.
The other 2 options you give deserve a little comment.
1 is similar to 2. You're still doing a "Pull Request" type flow, but now you have server side branches which mirror the developer's clones. There's little difference and this is how a RhodeCode, GitHub, BitBucket server would let you work, except you don't have to go searching for changes. The server would tell you they're waiting for you to look at.
3 has the problem that everyone's changes are all merged together on foo-dev before you get to them. They would start becoming inter-dependent, and cherry-picking is going to be messy. You'd probably end up grafting change-sets on to foo-trunk which means you're creating new change-sets with new hashes. When the developers pull those they'll now have the change in two places; their original foo-dev version and your grafted foo-trunk version. This doesn't sound sustainable to me.

Best way i can think of if you don't want to use mq (understand with the least hassle for you) is to have your dev
create their own branch for the current feature being developped
merge it back to the main dev branch (or graft/transplant) when it's completed and validated
and then close the branch.
In the long term see for them to learn mq, it's not too hard to grasp.

3a - foo-dev has protected default branch (only some admins can push/merge-to this branch), users use named branches

What's the purpose of bookmarks in mercurial?

I read about bookmarks in mercurial, every source states that it's like git branching, that the bookmark gets updated uppon every commit, but I really don't understand what's its purpose. Any guidance would be greatly appreciated. Thanks.

From the Mercurial wiki:
Bookmarks are references to commits that are automatically updated
when new commits are made. If you do hg bookmark feature the feature
bookmark refers to the current changeset. As you work and commit
changes the bookmark will move forward with every commit you do. The
bookmark will always point to the latest revision in your line of
work. Since bookmarks are automatically updated when committing to the
changeset they are pointing to, they are especially useful to keep
track of different heads. They can therefore be used for trying out
new features or pulling changes that have yet to be reviewed.
I also suggest reading this article for an overview of the various ways to branch in Mercurial.
I suggest not taking bookmarks and trying to find a use for them, but rather finding a process you find natural to use in your team and then finding how Mercurial best handles branches in the context of that process, be it with bookmarks or cloning or what have you.

Adding just a few minor details, to the excellent points of Mihai Danila:
Bookmarks are, in a few words, a lot like: "movable tags without a permanent movement log".
There are various features built on top of the 'movable tag' property, like merging of bookmarks from other clones, automatic movement of the bookmark as you commit on top of it, etc. But the basic idea is pretty much just that. A symbolic name that you can attach to a changeset, much like a 'tag', but one that can move arbitrarily forward or backward in history as required.
You can definitely built a branching scheme on top of bookmarks. In fact, Mercurial itself has adopted the names 'crew' and 'crew-stable' for two bookmarks that the development team uses to coordinate Mercurial's own development. But you don't have to.
I agree 100% with what Mihai wrote: "Don't take bookmarks and try to find a use for them, but rather find a process you find natural to use in your team and then find out how Mercurial best handles branches in the context of that process, be it with bookmarks or cloning or what have you".

Doing without partial commits the "Mercurial way"

Subversion shop considering switching to Mercurial, trying to figure out in advance what all the complaints from developers are going to be. There's one fairly common use case here that I can't see how to handle.
I'm working on some largish feature, and I have a significant part of the code -- or possibly several significant parts of the code -- in pieces all over the garage floor, totally unsuitable for checkin, maybe not even compiling.
An urgent bugfix request comes in. The fix is nice and local and doesn't touch any of the code I've been working on.
I make the fix in my working copy.
Now what?
I've looked at "Mercurial cherry picking changes for commit" and "best practices in mercurial: branch vs. clone, and partial merges?" and all the suggestions seem to be extensions of varying complexity, from Record and Shelve to Queues.
The fact that there apparently isn't any core functionality for this makes me suspect that in some sense this working style is Doing It Wrong. What would a Mercurial-like solution to this use case look like?
Edited to add: git, by contrast, seems designed for this workflow: git add the bugfix files, don't git add anything else (or git reset HEAD anything you might have already added), git commit.

Here's how I would handle the case:
have a dev branch
have feature branches
have a personal branch
have a stable branch.
In your scenario, I would be committing frequently to my branch off the feature branch.
When the request came in, I would hg up -r XYZ where XYZ is the rev number that they are running, then branch a new feature branch off of that(or up branchname, whatever).
Perform work, then merge into the stable branch after the work is tested.
Switch back to my work and merge up from the top feature branch commit node, thus integrating the two streams of effort.

Lots of useful functionality for Mercurial is provided in the form of extensions -- don't be afraid to use them.
As for your question, record provides what you call partial commits (it allows you to select which hunks of changes you want to commit). On the other hand, shelve allows to temporarily make your working copy clean, while keeping the changes locally. Once you commit the bug fix, you can unshelve the changes and continue working.
The canonical way to go around this (i.e. using only core) would probably be to make a clone (note that local clones are cheap as hardlinks are created instead of copies).

You would clone the repository (i.e. create a bug-fix branch in SVN terms) and do the fix from there.
Alternatively if it really is a quick fix you can use the -I option on commit to explicitly check-in individual files.

Like any DVCS, branching is your friend. Branching a repository multiple ways is the bread and butter of these system. Here's a git model you might consider adopting that works quite well with Mercurial, also.

In addition to what Santa said about branching being your friend...
Small-granularity commits are your friend. Rather than making lots of code changes in a single commit, make each logically self-contained code change in its own commit. Then it will be a lot easier to cherry-pick changes to merge between branches.

Don't use Mercurial without using the Mq Extension (it comes pre-packaged in the default installation). In addition to solving your specific problem, it solves a lot of other general problems and really should be the default way that you work (especially if you're using an IDE that doesn't integrate directly with Hg, making switching branches on the fly a difficult way to work).

Is the subprepos feature in Mercurial 1.4.x ready for production use?

I'd like to evaluate Mercurial for my working projects. But most of my projects very heavily rely on the presence of svn:externals-like support. I've searched over StackOverflow and googled for corresponding support in Mercurial. All I found is subrepo feature added in Mercurial 1.3, but the page for this feature said:
subrepos are an experimental feature for Mercurial 1.3. So don't do this on mission critical repositories!
I don't want to use something unstable.
Can anybody shed some light on the real status of this feature, and the plans of polishing/finishing it and when it will be called "stable" and ready for mission critical repositories?

The word in the #mercurial IRC channel is that subrepos will continue to work as they do, and support will grow. For example currently the 'hg status' command isn't subrepo aware -- it works, it just doesn't recurse, but that in the future it will be. However, the current behaviors, fileformats (.hgsub and .hgsubstate) will only be changed in backward compatible ways.
So, go ahead and count on it now, and look forward to it getting better.
P.S. As of mercurial 1.4.2 the subrepos can now be subversion repos, so you can use a mercurial parent and a svn kid.

I've had good luck with the feature in my (light) usage of it so far. It's come in handy in two places:
Backing up a tree of unrelated repositories with a single hg pull command.
Tying a project together with specific versions of its dependencies, so that a single hg clone gets buildable source code. This is closer to the typical svn:externals usage.
Here are a couple of the limitations I've seen with it so far:
In case #1 above, you have to commit all subrepos at once. This is only occasionally annoying, as Mercurial (like any DVCS) encourages frequent commits—so most repos aren't left sitting around in an incomplete state to begin with.
Only the most basic Mercurial commands are subrepo-aware: clone, push / pull, update / commit, and perhaps a couple of others.
Extension authors are going to need time to test their extensions against repositories with subrepos.
When the Mercurial team describes the feature as "experimental," they don't mean that it's suddenly going to decide to erase all your data. They just mean that they haven't coded around all the edge cases like name conflicts (e.g., one developer adds a subrepo called README, while another developer adds a text file called README).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008