How to filt histories from another repo? - mercurial

I am using TortoiseHg as my VCS, here is the question:
I have 2 repos, Base and Feature
the Base's history is like this:
1 Prod 01
0 Create repo
suppose I'm cloning the Base to Feature now, and add some features in the Feature repo. It's history should look like this:
3 Prod 02
2 Add xxxxx
1 Prod 01
0 Create repo
Here comes the question, how could I only pull Feature's rev3 Prod 02 to the Base repo?
cause I want my Base repo be clean.
I saw TortoiseHg's repo like this, it's history is kind of:
161 bump to rev 30
160 bump to rev 29
how did they do this?
Best Regards
Sheng Yun

You could use the transplant extension for this. In the example you gave, assuming Base and Feature are stored in the same folder you'd execute this command:
cd /path/to/Base
hg transplant -s ../Feature 3

Related

Patch corruption or loss after rebase

I just lost all of the changes in a Mercurial patch (fortunately, I had a backup), and I would like to figure out what went wrong.
The Setup
I had a pair of patches, call them patch1.diff and patch2.diff. They were both based on revision 123, but affected completely different files, with no overlap. So, my repository looked something like this in TortoiseHg (where p is a patch and r is a regular revision):
Graph Rev Branch Tags Message
p 125 develop patch2.diff Change to existing file baz.php
p 124 develop patch1.diff Add new files foo.php and bar.php
r 123 develop Last committed changeset
|
r 122 develop Old changes
...
What I Did
I wanted to switch the order of the patches, because my work on patch2.diff was complete and I wanted to commit those changes. So I tried rebasing that patch onto revision 123. That didn't work, and I ended up with something like this:
Graph Rev Branch Tags Message
r Working directory - not a head revision!
r 126 develop Change to existing file baz.php
|
p | 125 develop patch2.diff Change to existing file baz.php
|
p | 124 develop patch1.diff Add new files foo.php and bar.php
|
r-+ 123 develop Last committed changeset
|
r 122 develop Old changes
...
That was clearly wrong. I now had a revision 126 with the same changes as those in patch2.diff, but I also still had a patch2.diff, which wasn't rebased as I expected. On top of that, I was getting the "not a head revision" message, even though there weren't actually any changes in my working directory.
So I stripped revision 126. At that point, things went completely off the rails, leaving me with this:
Graph Rev Branch Tags Message
p 125 develop patch2.diff Change to existing file baz.php
p 124 develop patch1.diff
r 123 develop Last committed changeset
|
r 122 develop Old changes
...
patch1.diff still appeared in TortoiseHg, but the changes and commit message were gone. I tried hg qpush --all, and got these messages:
applying patch1.diff
unable to read patch1.diff
I couldn't even find patch1.diff on my file system anymore. Ultimately, I had to run hg qdelete --keep patch1.diff and then restore my lost changes from offsite backups.
I ended up where I wanted to be, but nearly lost hours of work on a new feature. I was able to recover only because I had an offsite backup of the new files. That was terrifying.
The Question
What in the world happened? Why did I lose patch1.diff? I could understand if I lost the changes in patch2.diff given the way I used hg strip, but I have no idea why patch1.diff got nuked.
You stumbled over the issues why mq might very soon not be recommended anymore. It wants to retain control over csets it controls and it looses that, when you modify history under mq control. Thus mq does not work well with rebase, strip, histedit...
The better way is to simply stop using mq at all. Make your default phase for new commits secret (or draft). Commit your patches as normal changesets - then mq cannot interfer with proper working of rebase and what you did try to do would simply have worked.
hg rebase -s125 -d123
hg rebase -s124 -d126
(given the state of your repo as in the first quote, just asusming r124, r125 are normal csets, not under mq control)
And if you're a little daring, you take a look at the evolve extension which is very useful for people who maintain patch queues with respect to upstream repos or juggle draft changesets with collaborators.
See http://www.logilab.org/blogentry/88203 for an introduction to mercurial phases

Hg: delete latest commits [duplicate]

This question already has answers here:
Is there any way to delete local commits in Mercurial?
(10 answers)
Closed 8 years ago.
I've used Git in the past and I'm a Hg noob:
I have following repository structure:
o [> default] commit A
|
o commit B
.
.
.
o <a-tag]
|
I've updated to the commit with the a-tag and committed a few other commits. Now I have
o [> default] commit C
|
o commit D
|
| o [default] commit A
| |
| o commit B
| .
| .
| .
| /
o <a-tag]
|
Now (before pushing) I realize that I had my commits commit C and commit D based on the wrong commit. How can I go back to the initial state (without having to re-clone the repository) dropping these commits commit C and commit D (pendant to git reset --hard a-tag)?
You can use 'strip' to permanently delete commits and all it's descendants. In your case you need to specify the id of "D" revision:
hg strip -r D
Note: mq extension must be turned on:
[extensions]
mq=
Mercurial backups bundles of the stripped changesets in .hg/strip-backup so this operation is rather safe.
You say without cloning, and I'll get to that but first let me point out that doing this:
cd ..
hg clone -r -2 yourrepo yournewrepo
is an instantaneous action that gets you a new clone, locally, without the last two commits in your old repo. Because it's a local clone it uses hardlinks (even on Windows) so the repository takes up no additional diskspace.
That is the classic Mercurial solution. Mercurial was built with the idea of an immutable history. If there's something in the history you regret you commit its inverse (which is what backout does), so the history shows the error and its correction -- like a scientists log book. If you can't/couldn't abide having it in history you'd do a clone that excludes it like I showed above.
Not everyone can be quite so... hardcore... about their history, so lots of extensions have shown up that will modify history but they have to be specifically enabled. In your case the rebase extension, which you already have installed (it comes with Mercurial now), will do exactly what you want. You just need to enable it in your ~/.hgrc and then reparent D onto A. If you really just want C gone the strip command from the mq extension will do that. It also ships with Mercurial.
I would just clone from the appropriate point and delete the old repo.
Without cloning consider hg backout to undo first C and then D, or, if this is your ultimate goal hg rebase to move both C and D to the commit A point.
Backout C + backout D will remove these 2 commits, but add additional 3 (backout+backout+merge). "Clean" remove from history may be
Histedit extension ("drop" command)
MQ extension (convert changeset to mq-patches /first steps/ and drop mq-queue)
If you want only change wrong parent to correct, you have to use rebase, as #chill mentioned

For Mercurial, having 2 clones can work the same as having 2 branches?

Since I want to diff all the changes I made since 7 or 10 days ago, without seeing the changes of other team members, so I keep a clone, say
c:\dev\proj1
and then I keep another clone that is
c:\dev\proj2
so I can change code for proj1, and then in another shell, pull code from it, and merge with other team members, and run test. And then 10 days later, I can still diff all the code made by me and nobody else by going to the shell of proj1 and do a hg diff or hg vdiff.
I think this can be done by using branch as well. Does having 2 clones like this work exactly the same as having 2 branches? Any advantage of one over the other method?
The short answer is: Yes.
Mercurial doesn't care where the changesets come from, when you merge. In that sense, branches and clones work equally well when it comes time to merge changes.
Even better: The workflow you described is exactly the strategy in Chapter 3 of the Mercurial book.
The only advantage of branches is that they have a name, so you have less incentive to merge right off. If you want to keep those proj2 changes separate, while still pushing and pulling them from proj1, give them a real branch. Again, functionally, they're the same.
And yes, this is characteristic of DVCS, not uniquely Mercurial.
Note : I'm more familiar with git than hg but the ideas should be the same.
The difference will become apparent if you update both the clones (which are both editing the same branch) e.g. for a quick bug fix on the integration sandbox.
The right way would be for you to have a topic branch (your first clone) which is where you do your development and another one for integration (your second clone). Then you can merge changes from one to another as you please. If you do make a change on the integration branch, you'll know that it was made there.
hg diff -r <startrev> -r <endrev> can be used to compare any two points in Mercurial's history.
Example history:
rev author description
--- ------ ----------------------
# 6 me Merge
|\
| o 5 others More other changes.
| |
| o 4 others Other changes.
| |
o | 3 me More of my changes.
| |
o | 2 me My changes.
|/
o 1 others More Common Changes
|
o 0 others Common Changes
If revision 1 was the original clone:
Revs 2 and 3 represent your changes.
Revs 4 and 5 are other changes made during your branch development. They are pulled merged into your changes at rev 6.
At this point, to see only changes by me before the merge, run hg diff -r 1 -r 3 to display those changes only.
Why not simply have two branches? (Branching/merging is much easier and safer in a DVCS like Hg or Git than in a centralised VCS like TFS or SVN!) It would be much more secure and reliable.
This will become apparent e.g. when you will want to merge the two branches/clones back together. Also, editing one branch from two different physical locations can easily lead to confusion and errors. Hg is designed to avoid exactly these kinds of situations.
Thomas
As some answers already pointed out, branches (named or anonymous) are usually more convenient than two clones because you don't have to pull/push.
But two clones have the distinct advantage of total physical separation, so you can literally work on two things at the same time, and you don't ever need to rebuild when you switch project.
Earlier I asked a question about concurrent development with hg, with option 1 being two clones and option 2 being two branches.

How does Mercurial work with many developers?

I look at Mercurial repositories of some known products, like TortoiseHg and Python, and even though I can see multiple people committing changes, the timeline always looks pretty clean, with just one branch moving forward.
However, let's say you have 14 people working on the same product, won't this quickly get into a branch nightmare with 14 parallel branches at any given time?
For instance, with just two people, and the product at changeset X, now both developers start working on separate features on monday morning, so both start with the same parent changeset.
When they commit, we now have two branches, and then with 14 people, we would quickly have 10+ (might not be 14...) branches that needs to be merged back into the default.
Or... What am I not seeing here? Perhaps it's not really a problem?
Edit: I see there's some confusion as to what I'm really asking about here, so let me clarify.
I know full and well that Mercurial easily handles multiple branches and merging, and as one answer states, even when people work on the same files, they don't often work on the same lines, and even then, a conflict is easily handled. I also know that if two people end up creating a merge hell because they changed a lot of the same code in the same files, there's some overall planning failure here, since we've placed two features in the exact same place onto two developers, instead of perhaps trying them to work together, or just giving both to one developer in the first place.
So that's not it.
What I'm curious about is how these open source project manage such a clean history. It's not important to me (as one comment wondered) that the history is clean, I mean, we do work in parallel, that the repository is able to reflect that, so much the better (in my opinion), however these repositories I've looked at doesn't have that. They seem to be working along the Subversion model where you can't commit before you've updated and merged, in which case the history is just one straight line.
So how do they do it?
Are they "rebasing" the changes so that they appear to be following the latest tip of the branch even though they were originally committed a bit back in the branch history? Transplanting changesets to make them appear to' having been committed in the main branch to begin with?
Or are the projects I've looked at either so slow (at the moment, I didn't look far back in the history) at adding new things that in reality they've only been working one person at a time?
Or are they pushing changes to one central maintainer who reviews and then integrates? It doesn't look like that since many of the projects I looked at had different names on the changesets.
Or... What am I not seeing here?
Perhaps it's not really a problem?
It's not really a problem. In a large project even when people work on the same feature, they don't usually work on the same file. When they work on the same file, they don't usually modify the same lines. And when they modify the same lines, then a merge should be done manually (for the affected lines).
This means in practice that 80+% of the merges can be done automagically by Mercurial itself.
Let's take an example:
you have:
[branch 1] [branch2]
\ /
\ /
[base]
Edit: for clarity, by branch I refer here to unnamed branches.
If you have a file changed in branch 1 but the same file in branch 2 is the same as in base, then the version in branch 1 is chosen. If the file is modified in both branch 1 and branch 2 the files are merged line by line using the same algorithm: if line 1 in file1 in branch 1 is different than line 1 in file1 in base but branch 2 and base have the line 1 equal, line 1 in branch 1 is chosen (and so on and so forth).
For the lines that are modified in both branches, Mercurial interrupts the automated merging process and prompts the user to choose which lines to use, or edit the lines manually.
Since deciding which lines to use is best done by the person(s) who modified those lines, a good practice is to have the person that implemented a feature perform the merge. That means that if me and you work on the same project, I implement my feature, then make a pull from a central/common repository (get the latest version that everyone uses), then merge my new version with the pulled changes, then publish it to the common repository (at this point, the common repository has one main branch, with my merged changes into it). Then, you pull that from the server and do the same with your changes.
This implies that everyone is capable of doing whatever they want in their local repository, and the common/official repository has one branch. It also means that you need to decide on a time frame when people should merge their changes in.
I used to have three or four repositories on my machine already compiled on different product versions (different branches of the repository) and a few different branches in my main repository (one for refactoring, one for development and so on). Whenever I would bring one branch to a stable state (say - finish a refactoring) I would pull from the server, merge that branch into the pulled changes, then push it back to the server and let anyone know that if they made any changes to the affected files, they should pull first from the server.
We used to synchronize implemented features every Monday morning and it took us about an hour to merge everything, then make a weekly build on the server to give to QA (on bad days it would take two member of the team two hours or so, then everyone would pull the week's changes on their machine and use them as a new base for the week). This was for an eight-developers team.
In your updated question it seems that you are more interested in ways of tidying up the history. When you have a history and want to make it into a single, neat, straight line you want to use rebase, transplant and/or mercurial queues. Check the docs out for those three and you should realise the workflow for how its done.
Edit: Since Im waiting for a compile, here follows a specific example of what I mean:
> hg init
> echo test > a.txt
> hg addremove && hg commit -m "added a.txt"
> echo test > b.txt
> hg addremove && hg commit -m "added b.txt"
> hg update 0 # go back to initial revision
> echo test > c.txt
> hg addremove && hg commit -m "added c.txt"
Running hg glog now shows this (diverging) history with two branches:
# changeset: 2:c79893255a0f
| tag: tip
| parent: 0:7e1679006144
| user: mizipzor
| date: Mon Jul 05 12:20:37 2010 +0200
| summary: added c.txt
|
| o changeset: 1:74f6483b38f4
|/ user: mizipzor
| date: Mon Jul 05 12:20:07 2010 +0200
| summary: added b.txt
|
o changeset: 0:7e1679006144
user: mizipzor
date: Mon Jul 05 12:19:41 2010 +0200
summary: added a.txt
Do a rebase, making changeset 1 into a child of 2 rather than 0:
> hg rebase -s 1 -d 2
Now lets check history again:
# changeset: 2:ea0c9a705a70
| tag: tip
| user: mizipzor
| date: Mon Jul 05 12:20:07 2010 +0200
| summary: added b.txt
|
o changeset: 1:c79893255a0f
| user: mizipzor
| date: Mon Jul 05 12:20:37 2010 +0200
| summary: added c.txt
|
o changeset: 0:7e1679006144
user: mizipzor
date: Mon Jul 05 12:19:41 2010 +0200
summary: added a.txt
Presto! Single line. :)
Also note that I didnt do a merge. When you rebase like this, you will have to deal with merge conflicts and everything just like as if you did a merge. Because thats pretty much what happens under the hood. Experiment with this in a small test repo. For example, try changing the file added in revision 0 rather than just adding more files.
I'm a Mercurial developer, so let me explain how we/I do it.
In the Mercurial project we accept contributions in form of patches sent to the mailinglist. When we apply those with hg import, we do an implicit rebase to the tip of the branch we are working on. This help a lot with keeping the history clean.
As for my own changes, I use rebase or mq to linearize things before I push them, again to keep the history tidy. It's basically a matter of doing
hg push # abort: creates new remote head
hg pull
hg rebase
hg push
You can combine the pull and rebase if you like (hg pull --rebase) but I've always liked to take one step at a time.
By the way, there are some disagreements about this practice of linearizing the history -- some believe that the history should show how things really happened, with all the branches and merges and whatnot. I find that as long as you don't mess with public changesets, then it's okay and useful to linearize history.
The Linux kernel is stored in thousands of repositories and probably millions of branches, and this doesn't seem to pose a problem. For large projects you need a repository strategy (e.g., the dictator–lieutenants strategy), but having many branches is the main strength of the modern DVCSes and not a problem at all.
Yes, we'll have to merge and to avoid heads on the main repository, merging should be done on the child repositories by the developer.
So before you push your code to the parent repository you first pull the latest changes, merge on your side and (try to) push. This should avoid unwanted heads in the master repo
I don't know how the TortoiseHg team does things, but you can use Mercurial's rebase extension to "detach" a branch and drop it on the top of the tip, creating a single branch.
In practice, though, I don't get concerned about multiple branches, as long as I don't see more heads than there should be. Merging is not really a big deal.

Can I clone part of a Mercurial repository?

Is it possible to clone part of a Mercurial repository? Let's say the repository is quite large, or contains multiple projects, or multiple branches. Can I clone only part of the repository?
E.g. in Subversion, you might have trunk and branches. If I only want to get trunk (or one of the branches) I can just request [project]/trunk. If I clone the hg repo I'll get trunk and all of the branches. This might be a lot of information I don't want. Can I avoid getting this?
Alternatively, if I want to have multiple projects in one hg repo, how should I do this? I.e. so that I might just get one of the projects and ignore the others.
Yes you can. I'm sure you've moved on, but for the sake of those who will wander here later, I followed the docs at ConvertExtension, and wrote a simple batch script:
#echo off
echo Converting %1
REM Create the file map
echo include %1 > ~myfilemap
echo rename %1 . >> ~myfilemap
REM Run the convert process
hg convert --filemap ~myfilemap .\ ..\%1
REM Delete the file map
del ~myfilemap
cd ..\%1
REM update the new repo--to create the files
hg update
Name it something like split.cmd, and put it in the directory for the repo you want to split. Say for example you have C:\repos\ReallyBigProject, and a subfolder is C:\repos\ReallyBigProject\small-project. At the command prompt, run:
cd\repos\ReallyBigProject
split.cmd small-project
This will create C:\repos\small-project with a slice of the relevant history of revisions from the larger project.
The convert is not enabled by default. You'll need to make sure the following lines exist in your .hg\hgrc file (c:\repos\ReallyBigProject\.hg\hgrc in my example):
[extensions]
hgext.convert=
#Nick
"E.g. in Subversion, you might have trunk and branches. If I only want to get trunk (or one of the branches) I can just request [project]/trunk. If I clone the hg repo I'll get trunk and all of the branches. This might be a lot of information I don't want. Can I avoid getting this?"
Absolutely. Just use hg clone -r <branch> and get only the branch you want. If you have lots of branches, you need a -r <branch> for each one. <branch> doesn't have to be a named branch: you can simply have multiple unnamed heads (or named heads using bookmark, though those still aren't perfect, because currently they don't show up with push/pull/clone).
Keep in mind that in DVCSes, Mercurial among them, branches are often short-lived and merged back into each other frequently. If you pull a branch you will still get the common history it has with any other branches.
To my knowledge, that's not possible. But compared to Subversrion, cloning the whole repos may not be slower than just a branch from SVN.
Quoting from UnderstandingMercurial:
Many SVN/CVS users expect to host
related projects together in one
repository. This is really not what hg
was made for, so you should try a
different way of working. This
especially means, that you cannot
check out only one directory of a
repository.
If you absolutely need to host
multiple projects in a kind of
meta-repository though, you could try
the Subrepositories feature that was
introduced with Mercurial 1.3 or the
older ForestExtension.
#Nick said:
"This is a pretty big omission since a lot hosting sites only offer one repo. With svn I can effectively have as many repos as I want by only taking one branch from the main one. The subrepos sound like a hack."
Subrepos (aka submodules) are not as ideal as "narrow clones" its true. But at least for having many distinct projects in one hosting site's repository, you can have multiple code-bases in one repository. This won't allow you to slice up different sections of one repository / sub-directories of a project , but it will let you manage multiple projects. What you do is have lots of named branches each rooted at the empty (or null) changeset (i.e. they have no common root revision). It can get a little messy to track the branches but it does work.
For example:
hg init
hg branch project-1
# Changes, commits, repeated as needed
hg update null
hg branch project-2
# Changes, commits, repeated as needed
You now can see all your projects:
> hg branches
project-2 5:42c2beffe780
project-1 2:43fd60024328
The projects are unrelated (though you can merge them):
> hg debugancestors
-1:000000000000
Most usefully: you can clone only the project you want, and the others won't mix in:
> hg clone <repository> -r project-1
The graph for this would look something like this (hg log -qG):
# 5 | project-2 | {tip}
|
o 4 | project-2
|
o 3 | project-2
o 2 | project-1
|
o 1 | project-1
|
o 0 | project-1
You can do this for as many projects as you need, listing each with hg branches, and jumping between them with hg update. This takes some care, because named branch support isn't perfect. It isn't always intuitive for one thing (read about hg clone -u in Mercurial 1.4 -- the pre-1.4 behavior is surprising when cloning). But it does work.
Mercurial and Git only permit cloning on the entire repository. Thus it is recommended that each project gets its own repository.
Mercurial has a forest extension to ease having a "forest" for project repositories. The extension keeps each project in a separate repository, but provides options to update/push/pull all the forest repositories together.
It's possible to ask Mercurial to clone just a branch using hg clone -r branchname (see Mercurial clone from a branch).
With Google's NarrowHG extension extension it's possible to perform a narrow clone (see How do I clone a sub-folder of a repository in Mercurial? for a similar question).
I know that it is nearly 10 years after this question was asked, I I stumbled across
this question by accident.
There is a new mercurial extension call sparse that allows you to do this.
Here's a possible improvement to Vadim Kotov's solution that supports spaces in the small-project name/subfolder:
#echo off
echo Converting "%~1"
REM Create the file map
echo include "%~1" > ~myfilemap
echo rename "%~1" . >> ~myfilemap
REM Run the convert process
hg convert --filemap ~myfilemap .\ "..\%~1"
REM Delete the file map
del ~myfilemap
cd ".\%~1"
REM update the new repo--to create the files
hg update