I’m currently migrating from CVSNT to Mercurial but am running into problems with what I used to be able to achieve with CVS modules. I have two projects say A and B which both depend on common code in a directory C. If I make a change to code in directory C, I want the changes to be reflected in both projects A and B. (it seems to be the exact reverse of the problem here)
I thought this could be achieved using subrepos but the .hgsubstate in projects A and B keeps a note of the changeset in the sub repo that I list committed against. I.e. I commit a change in C on project A and I have to manually open B to update and commit. (In actual fact there are many more projects that just A and B and yes I know common code should be in a shared library but the PHB insists!)
Is there a way to achieve this? Ideally I’d like it to be transparent to the user how the repos are structured i.e., they can commit a change to C and not have to realise it is a sub repo of their project. (At the moment tortoise Hg uses an ‘S’ to indicate a subrepo is dirty). I guess what I need is a daddy repo and a partial check out but surely there must be a better way? I'm on Windows so symlinks are out.
The fact that you have to manually update (and commit) the subrepository is very much intentional. This misfeature found in CVS and SVN was intentionally left out.
If it would automatically update subrepositories to the latest commit, there is no guarantee that code will remain working. E.g. if you would change the API of C, neither A nor B will work until you change them accordingly. And because you can never push multiple repositories atomically at the same time, there would unavoidably be a window during which A and B do not work. In practice, this window can grow rather large, especially as soon as one of those depending projects receives less development attention. And if project B would be put on hold for a while, the changes to C for project A would break it in no-time.
Worse, updating to an earlier version would not be able to restore the subrepository to the state it was in at the time. So as soon as you would make a backwards incompatible change to the subrepository, older versions would not work anymore! This greatly reduces the usefulness of version control. It would cause problems with branching and e.g. bisecting would not be able to work.
This is why you are required to manually update to the latest version of C, test whether everything still works, and check in that subrepository update. Thereby tightly locking the repository version to the subrepository version. You sacrifice a little bit of convenience, but you gain code stability, and I hope I was able to make clear why this is much better :).
As for transparently committing to a subrepository from within a parent repository, as far as I know it is planned to do this but so far no-one has actually implemented it.
Related
In this F8 conference video(starting 8:40) from 2015 they speak about the advantages of using Mercurial and a single repository across facebook.
How does this work in practice? Using Mercurial, can i checkout a subdirectory (live in SVN)? If so, how? Do i need a facebook-mercurial-extension for this
P.S.: I only found answers like this or this from 2010 on SO where i am not sure if the answers still apply with all the efforts FB put into it.
From your question it is not clear if you are looking for a workflow (the monorepo vs multiple repos debate) or for performance and scaling for a huge code base.
For the workflow, I suggest googling for monorepo. It has its pros and cons, you need to understand your situation and current workflow to decide. For the performance and scaling, keep reading.
The idea of remotefilelog is not to checkout a subdirectory (as you mention), the idea is to checkout everything. In order to do that in an efficient way, you need two extensions actively developed by Facebook:
remotefilelog. This gives you something conceptually similar to a shallow clone. This reduces hg clone and hg pull time.
fsmonitor (previously called hgwatchman, it is now part of mercurial core). This dramatically reduces time of local operations such as hg status. Note that fsmonitor is independent from remotefilelog. You can start experimenting with this, since it doesn't require any setup on the server side.
With a recent mercurial (which I strongly suggest) you can shave off the additional startup time of the Python interpreter using CommandServer + CHg.
Some additional notes:
I tested extensively fsmonitor. It works very well, on huge repos the time of hg status is reduced from 10 secs to less than 1 sec (and the majority of this 1 sec is Python startup time, see above for CHg). If your repository is really huge, you might need to fine tune some inotify kernel parameters (or the equivalent on MacOSX). The fsmonitor documentation has all the information you need.
I didn't test remotefilelog, although I read everything I found about it and I am sure it works. Depending on how development is done (everybody has always Internet connectivity or not, the organization has its own master repo or not) there can be a caveat: it partially transforms the decentralized hg into a centralized VCS like svn: some operations that normally can be done offline (for example: hg log and the first hg update to a changeset in the past) will now require connectivity to the master repository.
Before considering remotefilelog, I used extensively the largefiles extension on a huge repo. It has the same drawbacks than remotefilelog and some confusing corner cases for users that want to use hg just to get things done without taking the time to understand how it works. If I were to manage another huge repo, I would use remotefilelog instead than largefiles, although their use case is not really the same.
Mercurial has also support for subrepositories (doc1, doc2). The problem is that it changes the behavior of hg depending on where you are in the source tree. Again, if the developers don't care about really understanding how hg works, it will be just too confusing.
Additional information:
Facebook Engineering blog post
scaling mercurial wiki, although not completely up to date
just by googling mercurial facebook.
i am not sure if the answers still apply with all the efforts FB put
into it
(Early 2017) The answers in the questions linked still apply (because they occasionally get updated) but note that you will have to read all the comments and answers.
remotefilelog essentially allows on demand shallow clones (so you don't fetch the history for everything for all time) but you still fetch the essential metadata for, and checkout across, all the directories of the repo at the desired revision.
Using Mercurial, can i checkout a subdirectory (li[k]e in SVN)? If so, how?
https://stackoverflow.com/a/40355673/7836056 discusses how you might use third party extensions to allow narrow/sparse checkouts (Facebook's sparse.py) or narrow clones (Google's NarrowHG) with Mercurial thus only "creating" a single directory from within the main repository (albeit with radically different tradeoffs).
(Note phrasing matters: "sparse checkout" means a very specific action when referring to distributed version control in a manner that doesn't exist when using it to refer to centralised version control)
I have four devs working in four separate source folders in a mercurial repo. Why do they have to merge all the time and pollute the repo with merge changesets? It annoys them and it annoys me.
Is there a better way to do this?
Assuming the changes really don't conflict, you can use the rebase extension in lieu of merging.
First, put this in your .hgrc file:
[extensions]
rebase =
Now, instead of merging, just do hg rebase. It will "detach" your local changesets and move them to be descendants of the public tip. You can also pass various arguments to modify what gets rebased.
Again, this is not a good idea if your developers are going to encounter physical merge conflicts, or logical conflicts (e.g. Alice changed a feature in file A at the same time as Bob altered related functionality in file B). In those cases, you should probably use a real merge in order to properly represent the relevant history. hg rebase can be easily aborted if physical conflicts are encountered, but it's a good idea to check for logical conflicts by hand, since the extension cannot detect those automatically.
Your development team are committing little and often; this is just what you want so you don't want to change that habit for the sake of a clean line of commits.
#Kevin has described using the rebase extension and I agree that can work fine. However, you'll also see all the work sequence of each developer squished together in a single line of commits. If you're working on a stable code base and just submitting quick single-commit fixes then that may be fine - if you have ongoing lines of development then you might not won't want to lose the continuity of a developer's commits.
Another option is to split your repository into smaller self-contained repositories.
If your developers are always working in 4 separate folders, perhaps the contents of these folders can be modularised and stored as separate Mercurial repositories. You could then have a separate master repository that brought all these smaller repositories together within the sub-repository framework.
Mercurial is distributed, it means that if you have a central repository, every developer also has a private repository on his/her workstation, and also a working copy of course.
So now let's suppose that they make a change and commit it, i.e., to their private repository. When they want to hg push two things can happen:
either they are the first one to push a new changeset on the central server, then no merge will be required, or
either somebody else, starting from the same version, has committed and pushed before them. We can see that there is a fork here: from the same starting point Mercurial has two different directions, thus a merge is required, even if there is no conflict, because we do not want four different divergent contexts on the central server (which by the way is possible with Mercurial, they are called heads and you can force the push without merge, but you still have the divergence, no magic, and this is probably not what you want because you want to be able to checkout the sum of all the contributions..).
Now how to avoid performing merges is quite simple: you need to tell your developers to integrate others changes before committing their own changes:
$ hg pull
$ hg update
$ hg commit -m"..."
$ hg push
When the commit is made against the latest central version, no merge should be required.
If they where working on the same code, after pull and update some running of tests would be required as well to ensure that what was working in isolation still works when other developers work have been integrated. Taking others contributions frequently and pushing our own changes also frequently is called continuous integration and ensures that integration issues are discovered quickly.
Hope it'll help.
Changes were made to our .vcproj to fix an issue on the build machine (changeset 1700). Later, a developer merged his changes (changes 1710 through 1715) into the trunk, but the mercurial auto-merge overwrote the changes from 1700. I assume this happened because he chose the wrong branch as the "parent" of the merge (see part 2 of the question).
1) What is the "correct" mercurial way to fix this issue, considering out of all the merged files, only one file was merged incorrectly, and
2) what should the developer have done differently in order to make sure this didn't occur? Are there ways we can enforce the "correct" way?
Edit: I probably wasn't clear enough on what happened. Developer A modified a line in our .vcproj file that removed an option for the compiler. His check-in became changeset 1700. Developer B, working from a previous parent (let's say changeset 1690), made some changes to completely different parts of the project, but he did touch the .vcproj file (just not anywhere near the changes made by Developer A). When Developer B merged his changes (becoming changes 1710 through 1715), the merge process overwrote the changes from 1700.
To fix this, I just re-modified the .vcproj file to include the change again, and checked it in. I just wanted to know why Mercurial thought that it shouldn't keep the changes in 1700, and whether or not there was an "official" way to fix this.
Edit the second: Developer B swears up and down that Mercurial merged the .vcproj file without prompting him for conflict resolution, but it is of course possible that he's just misremembering, in which case this whole exercise is academic.
I will address the 2nd part of you question first...
If there is a conflict, the automated merge tools should force the programmer to decide how the merge happens. But the general assumption is that a conflict will involve two edits to the same set of lines. If somehow a conflict arises because of edits to lines that are not close to each other the automated merge will blithely choose both of the edits and a bug will appear.
The general case of a merge tool always merging properly is very hard to solve, and really can't be with current technology. Here is an example of what I mean from C:
int i; // Someone replaces this with 'short i' in one changeset stating
// that a short is more efficient.
// ... lots of code;
// Someone else replaces all the 65000s with 100000s in another changeset,
// saying that more precision is needed.
for (i = 0; i < 65000; ++i) {
integral_approximation_piece(start + i/65000.0, end + (i + 1) / 65000.0);
}
No merge tool is going to catch this kind of conflict. The tool would have to actually compile the code to see that those two parts of the code have anything to do with eachother, and while that would likely be enough in this case, I can construct an example that would require the code to be run and the results examined to catch the conflict.
This means that what you really ought to do is rigorously test your code after a merge, just like you should after any other change. The vast majority of merges will result in obvious conflicts that a developer will have to resolve (even though that resolution is often fairly obvious), or will merge cleanly. But the very few merges that don't fit either category can't easily be handled in an automated fashion.
This can also be fixed by development practices that encourage locality. For example a coding standard that states "Variables should be declared near where they're used.".
I'm guessing that .vcproj files are particularly prone to this problem since they are not well understood by developers and so if conflicts do appear they will not be sure what to do with them. My guess is that this happened and your developer simply did a revert back to the revision (s)he checked in.
As for part 1...
What to do in this case depends a lot on your development process. You can either strip the merge changeset out and redo it, though that won't work very well if lots of people have already pulled it, and it will work especially poorly if there are lots of changesets that have already been checked in that are based on the merge changeset.
You can also check in a new change that fixes the problem with the merge.
Those are basically your two options.
The tone of your post seems to me to indicate that you may have some politics surrounding this issue in your organization, and people are blaming this error on the frequent merges of Mercurial. So I will point out that any change control system can have this problem. In the case of Subversion, for example, every time a developer does an update while they have outstanding changes in their working directory, they are doing a merge, and this kind of problem can arise with any merge.
In mercurial a merge doesn't have a single parent, it by definition has two and only two parents. When someone is merging they're making two choices:
What two changesets will constitute the two changes
Which of those changesets will be the left-parent and which will be the right-parent
Of those two questions the first is very important, and the second barely matters at all, though it took me a while to come to understand that.
You select the left-parent by using hg update X. That changes the output of hg parents (or in newer versions hg summary) and essentially determines what's in your working directory before the merge.
You select the right-parent by using hg merge Y. That says merge X (the working directory's parent) with changeset Y. As a special case, if there are only two heads in your repository and your parent is already one of them then Y will default to the the other.
I'd have to see your resulting graph to know just what the developer did, but it's possible he didn't update to one head or another before invoking merge, which would have him merging one head with some point back in history.
If your developer picked the right parents for the merge then the left vs. right doesn't much matter -- the only real difference is that when one uses hg diff or hg log -p or some other command that shows the patch for a merge changeset, it's displayed relative to the left-parent. That's, however, mostly a factor in display only. Functionally they're pretty much identical.
Assuming your developer picked the right changesets then what he should have done was test the result of the merge before committing it. Merging is software development, not an annoying VCS side effect, and not testing before committing is the error.
Fixing
To fix this, just re-do the merge correctly. Use hg update to set one parent, use hg merge to pick the other. Make sure your current working directory is correct and then commit. You can get rid of his bad merge using something like hg strip or better, just close down his branch with hg commit --close-branch after updating to it.
Avoiding
You say "mercurial auto-merge", but mercurial doesn't really auto-merge. It does a premerge which is an extremely cautious combination of obvious changes, but it's so careful it won't even merge for you if each merge parent adds code in the same region because it can't know which block of code you'd rather have first.
You can disable this premerge entirely or on a file-by-file basis using the merge tool configuration options:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration?highlight=premerge
Subversion shop considering switching to Mercurial, trying to figure out in advance what all the complaints from developers are going to be. There's one fairly common use case here that I can't see how to handle.
I'm working on some largish feature, and I have a significant part of the code -- or possibly several significant parts of the code -- in pieces all over the garage floor, totally unsuitable for checkin, maybe not even compiling.
An urgent bugfix request comes in. The fix is nice and local and doesn't touch any of the code I've been working on.
I make the fix in my working copy.
Now what?
I've looked at "Mercurial cherry picking changes for commit" and "best practices in mercurial: branch vs. clone, and partial merges?" and all the suggestions seem to be extensions of varying complexity, from Record and Shelve to Queues.
The fact that there apparently isn't any core functionality for this makes me suspect that in some sense this working style is Doing It Wrong. What would a Mercurial-like solution to this use case look like?
Edited to add: git, by contrast, seems designed for this workflow: git add the bugfix files, don't git add anything else (or git reset HEAD anything you might have already added), git commit.
Here's how I would handle the case:
have a dev branch
have feature branches
have a personal branch
have a stable branch.
In your scenario, I would be committing frequently to my branch off the feature branch.
When the request came in, I would hg up -r XYZ where XYZ is the rev number that they are running, then branch a new feature branch off of that(or up branchname, whatever).
Perform work, then merge into the stable branch after the work is tested.
Switch back to my work and merge up from the top feature branch commit node, thus integrating the two streams of effort.
Lots of useful functionality for Mercurial is provided in the form of extensions -- don't be afraid to use them.
As for your question, record provides what you call partial commits (it allows you to select which hunks of changes you want to commit). On the other hand, shelve allows to temporarily make your working copy clean, while keeping the changes locally. Once you commit the bug fix, you can unshelve the changes and continue working.
The canonical way to go around this (i.e. using only core) would probably be to make a clone (note that local clones are cheap as hardlinks are created instead of copies).
You would clone the repository (i.e. create a bug-fix branch in SVN terms) and do the fix from there.
Alternatively if it really is a quick fix you can use the -I option on commit to explicitly check-in individual files.
Like any DVCS, branching is your friend. Branching a repository multiple ways is the bread and butter of these system. Here's a git model you might consider adopting that works quite well with Mercurial, also.
In addition to what Santa said about branching being your friend...
Small-granularity commits are your friend. Rather than making lots of code changes in a single commit, make each logically self-contained code change in its own commit. Then it will be a lot easier to cherry-pick changes to merge between branches.
Don't use Mercurial without using the Mq Extension (it comes pre-packaged in the default installation). In addition to solving your specific problem, it solves a lot of other general problems and really should be the default way that you work (especially if you're using an IDE that doesn't integrate directly with Hg, making switching branches on the fly a difficult way to work).
I have several projects with a very large over-lapping code-base. We've just recently started using SVN so I'm trying to figure out how I should be using it.
The problem is that as I'm finishing a task on one project, I'm starting a task on another, with some overlap. Often there's a lot of interrupt driven development as well. So, my code is never really in a completely stable state that I feel comfortable checking in.
The result is that we're not really using the VC system, which is a VERY bad thing, we all know... so, suggestions?
Check out a personal branch of the code and merge in changes. At least you will have some version control for your own changes, in case you need to roll back. Once you are comfortable with the state that your branch is in, merge that branch back into the trunk.
You can also check out a branch for each task, instead of one for each individual. You can also merge changes to your branch from the trunk if someone changes the trunk, and you want your branch to reflect the changes.
This is a common way to use SVN, although there are other workflows. I have worked on projects where I was afraid to commit(I would break the build possibly) because we did not effectively use branching.
Branching is really powerful in helping your workflow, use it until you're comfortable with the idea of merging.
Edit: 'Checking out a branch' refers to creating branch in your branches folder, and then checking out that branch. The standard svn repository structure consists of the folders trunk, tags, and branches at the root.
So, my code is never really in a completely stable state that I feel comfortable checking in.
Why is that ?
If your branch is appropriate for your work (with a good naming convention for instance), everyone will know its HEAD is not always stable.
In this kind of "working" branch, just put some tag along the way to indicate some "stable code points" (which can then be queried by any tester to be deployed).
Any other version on that working branch is just made to record changes, even though the current state is not stable.
Then later you merge all on a branch supposed to represent a stable state.
In TFS, you are able to create 'Shelf Sets' (I'm not sure what they'd be called in other source control providers). When you shelve some code, you are saving it to your repository, but not checking it in.
The reason this is important is that if you are working on Bug XXXX, and you fix half of the code, but it's not stable and not 'check-in-able', but you get assigned to NewFeature YYYY, you SHOULD NOT continue working with the same code base. You should 'Shelf' your Bug XXXX code, then return your local codebase to the latest checked-in code, and implement NewFeature YYYY.
This way you are keeping your check-ins atomic. You don't have to worry about losing your work, because it is still held by the repository (so if your computer bursts into flames, you don't have to burst into tears), and you aren't mixing your fixes for XXXX with your new code for YYYY.
Then, once you are asked to go back to XXXX (assuming you've checked in YYYY) you can just unshelve your 'shelf set' and jump right back into it where you left off.
Either accept that the code in SVN is not in a completely stable state and check it in anyway (and reserve time for stabilization and refactoring every X days/weeks so the code doesn't degrade too much).
Or force your team to work in a more structured way with minimal interruption based development so you can check in good code.
The first option is not ideal (but better then no source control), the second is probably impossible - there is no third option.
If you don't have time to get the code to a stable state you defiantly don't have the time to branch and merge all the time.
In distributed sourcecontrol systems like GIT, you commit to your local repository. Only when you push your code, it's 'committed' to the remote repository.
In this way, its much easier to 'safe' your work in between.