Backing up a filesystem containing hg repos - mercurial

Is it possible to backup a filesystem with many Mercurial repositories (e.g., with rsync on the filesystem) and have the backup in an inconsistent state?
The repositories are served by ssh and serves this set of requests: {push, pull, in, out, clone}. It does not have 'hg commit' applied to it directly (which has a known race condition).

Mark Drago is correct that Mercurial writes its own files in a careful order to maintain integrity. However, this is only integrity with regard to other Mercurial clients. The locking design in Mercurial allows one Mercurial process to create a new commit by writing files in this order:
filelogs (holds compressed deltas for all revisions of a given file)
manifest (has pointers back to the filelogs associated with a given changeset)
changelog (has metadata and a pointer back to the manifest for the changeset)
while other Mercurial processes will read the files in this order
changelog
manifest
filelogs
The reader will thus not see a reference to the new filelog data since the changelog is updated last in an atomic operation (a rename, which POSIX requires to be atomic).
A backup program will not know the correct order to read the Mercurial files and so it might read a filelog before it was updated by Mercurial and then read a manifest after it was updated:
rsync reads .hg/store/data/foo.i
hg writes .hg/store/data/foo.i
hg writes .hg/store/00manifest.i
hg writes .hg/store/00changelog.i
rsync reads .hg/store/00manifest.i
rsync reads .hg/store/00changelog.i
The result is a backup with a changelog that points to a manifest that points to a filelog revision that does not exist --- a corrupt repository. Running hg verify on such a repository will detect this situation:
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
foo#1: f57bae649f6e in manifests not found
1 files, 2 changesets, 1 total revisions
1 integrity errors encountered!
(first damaged changeset appears to be 1)
This tells you that the manifest of revision 1 refers to revision f57bae649f6e of the file foo, which cannot be found. It is possible to repair this situation by making a clone that excludes the bad revision 1:
$ hg clone -r 0 . ../repo-fixed
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cd ../repo-fixed
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
1 files, 1 changesets, 1 total revisions
So, all in all, it is not that bad if you use a general backup program to backup your Mercurial repositories. Just be aware that you might have to repair a broken repository after you restore it from backup. The changeset you lose will most likely still be on the developer's machine and he can push it again after you repair the restored repository. The Mercurial wiki has more information on repairing repository corruption.
The completely safe way to backup a repository is of course to use hg clone, but it might not be practical to integrate this with a general backup strategy.

Why don't "backup" it with just hg clone? ;-)

The short answer is: You can copy (cp, rsync, etc.) a mercurial repository without problems.
The longer answer is: https://www.mercurial-scm.org/wiki/Presentations?action=AttachFile&do=get&target=ols-mercurial-paper.pdf (in particular section 5, sub-heading "Committing Changes").
Mercurial writes out changes in an order that makes it safe for any other process to read a mercurial repository at any time. If you copy a repository to some other location while a change is being made to the repository, you'll get some of the new data, but mercurial is smart enough to ignore partially written commits. When you use the copy you made as a mercurial repository you will either see the new commit or not, there will not be any corruption.

Related

Automatically trigger Hg repository verification before pushing/pulling, without programming

We discovered that our Hg repository got corrupted several weeks ago. This corruption seems to have propagated: all clones (the central repo and the user repos) are corrupted, pretty badly, in the same way. I think we could have prevented it if we did verification at that time.
Is there some Hg setting that would cause verification on each push, and prevent push in case of verification failure? I know I could implement it as a hook in Python, but is there maybe a simpler solution?
Is it also possible to do the opposite: make sure the remote repository is verified before pulling?
FWIW, I am on Windows 10 and we are using TortoiseHg.
Update: I've tried creating hooks as suggested by Jordi. Hg now hangs waiting for locks. Here is what I see:
c:\Users\username\test-hook>hg init
c:\Users\username\test-hook>cd ..
c:\Users\username>hg clone test-hook test-hook-clone
updating to branch default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
# At this point I edited clone repository settings to include
# [hooks]
# preoutgoing = hg verify
#
# Then I created a test.txt file and "added" it via TortoiseHg context menu.
c:\Users\username\test-hook-clone>hg commit
c:\Users\username\test-hook-clone>hg status
c:\Users\username\test-hook-clone>hg outgoing
comparing with c:\Users\username\test-hook
searching for changes
changeset: 0:a61d33af6cdb
tag: tip
user: username
date: Mon May 06 20:32:54 2019 +0200
summary: test file added
c:\Users\username\test-hook-clone>hg push -verbose
pushing to c:\Users\username\test-hook
searching for changes
running hook preoutgoing: hg verify
waiting for lock on repository c:\Users\username\test-hook-clone held by process '16840' on host 'LT407233'
To answer your question, the hook does not have to be written in Python. In the appropriate server's hgrc (either at the repository level or the system level), simply set
[hooks]
preoutgoing = hg verify
preincoming = hg verify
This may significantly slow down all pull and push operations, but perhaps you are willing to sacrifice speed for correctness.
This will result in output like this when a client tries to pull from a corrupt repo:
$ hg clone http://localhost:9000 sample-repo
requesting all changes
remote: abort: preoutgoing hook exited with status 1
and in your server logs, you should see output similar to
127.0.0.1 - - [18/Apr/2019 12:41:09] "GET /?cmd=batch HTTP/1.1" 200 - x-hgarg-1:cmds=lheads+%3Bknown+nodes%3D x-hgproto-1:0.1 0.2 comp=zstd,zlib,none,bzip2 partial-pull
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
a#0: broken revlog! (index data/a.i is corrupted)
warning: orphan data file 'data/a.i'
checked 2 changesets with 1 changes to 2 files
1 warnings encountered!
1 integrity errors encountered!
(first damaged changeset appears to be 0)
You can enable a server side option to perform more validation all incoming content, just set the server.validate=yes option on your server.
The simplest way is to enable it in your server global hgrc or in the repository .hg/hgrc file by adding the following two lines:
[server]
validate = yes
This is a server option but you can also use it on the client. It should validate pull too.
(By the way, what kind of corruption are you seeing?)

Undo an accidental hg strip?

I have accidentally run hg strip, and deleted a stack of commits. I have not done anything in the repo since. Is there a way for me to bring back this stack of commits, to undo the hg strip I just ran?
As long as you didn't run the strip with the --no-backup option, the stripped changesets can be found in the repository under .hg\strip-backup. If you sort the directory content by date the latest one is likely the one you need to restore. Restore it with hg unbundle <filename>.
It is possible to hg pull from a strip backup file as an alternative to using hg unbundle.
As noted in a comment on another answer to this question, hg unbundle has fewer options and only works with bundles, but can unbundle more than one bundle at a time. Whereas hg pull can pull from a single source (share/web/bundle) and has other options.
Here's an example of using hg pull based on an external post by Isaac Jurado:
Usually the backup is placed in REPO/.hg/strip-backup/. See the
example below:
$ hg glog
# changeset: 2:d9f98bd00d5b tip
| three
o changeset: 1:e1634a4bde50
| two
o changeset: 0:eb14457d75fa
one
$ hg strip 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
saved backup bundle to
/Users/hchapman/ttt/.hg/strip-backup/e1634a4bde50-backup.hg
And then, what one would do to recover those changesets would be:
$ hg pull $(hg root)/.hg/strip-backup/e1634a4bde50-backup.hg
Here is a worked example of unbundle from an external post. I've cleaned it up slightly to make it a little more general:
Recovering stripped files when using Mercurial
If you accidentally strip a patch and do not have a backup for it, you
can still recover your files using Mercurial. To recover your files:
Open a Microsoft Windows Command Prompt window.
Navigate to the project folder where you stripped the files.
Run the dir command
Navigate to the .hg folder where Mercurial stores all relevant project
files.
Run the dir command again.
Navigate to the strip-backup folder where Mercurial stores the backup
bundles of stripped patches.
Run the dir command again. Multiple files display in the directory
that use the <hash>-hg format. They are the backup bundles of stripped
patches.
Use Windows Explorer to find the required file. Open the strip-backup
folder in Windows Explorer, and sort by Date modified descending.
Unless the necessary backup bundle is already known, [it is recommended to]
restore the bundles in reverse chronological order starting
from the most recent bundle.
Navigate back to the project folder.
To restore a bundle, run hg unbundle .hg\strip-backup\<bundle_file_name>. ... You may want to add it to the
PATH environment variable to make it accessible globally.
Synchronize the project [using hg pull] to see the restored patch. If
the restored patch is not the one needed, then continue restoring the
patches in reverse chronological order until the required patch is
retrieved.
Note: You may restore the backup bundles in any order, instead of
using reverse chronological order. However, it may not be safe to do
so. You may end up attempting to restore a backup bundle, which has a
dependency on another backup bundle that has not been restored. In
this case, you will get an error.

Mercurial requiring manual merges unexpectedly

I've got a project running under Mercurial and am finding a lot of situations where a file needs manually merging, when I believe it should be able to merge automatically. I am wondering whether there are any options that can be given to Mercurial to help it out in these areas.
The project has an underlying platform with a couple of hundred files that can't be edited on the project. When the platform is updated, the project gets updated versions of these core files outside of Mercurial. The sequence I'm seeing repeatedly is:
On central dev system (linked to the core platform update mechanism):
Get a new version of core platform.
Commit these changes e.g. hg commit -m "New platform release"
Push to central mercurial server
On my Linux box:
Commit local changes
Pull from central mercurial server, and try to merge
Find merge conflicts on core files
The last two core files I've had to merge have no changes between the base and local versions (the access time is updated during a build, but the content is the same). The only changes are on the remote revision I'm merging with.
The only non-standard configuration I'm aware of is that the central mercurial instance is running under Rhodecode, with a commit hook setup to update a Redmine repository.
Is there anything else that can be configured in mercurial to help it figure out merges?
You can redo a merge with --debug to get more information about a merge. That is, take your repository and do
$ cd ..
$ hg clone my-project -r 123 -r 456 merge-test
where 123 and 456 is the two parents of the merge you want to examine closer. Then run
$ hg merge --debug
to see what Mercurial says. It should look like this if the file foo has only been changed in the branch you're merging in:
$ hg merge --debug
searching for copies back to rev 2
resolving manifests
overwrite: False, partial: False
ancestor: 932f5550d0ce, local: b0c286a4a76d+, remote: c491d1593652
foo: remote is newer -> g
updating: foo 1/1 files (100.00%)
getting foo
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
Here I was on revision b0c286a4a76d and merged with c491d1593652.
You can also use
$ hg status --rev "ancestor(b0c286a4a76d, c491d1593652)" --rev "c491d1593652"
M foo
$ hg status --rev "ancestor(b0c286a4a76d, c491d1593652)" --rev "b0c286a4a76d"
M bar
to double-check which files have been changed between the ancestor revision and the two changesets you're merging. Above you see that I changed foo on one branch and bar on the other.
If you see a platform file appear in both status lists, well then something went wrong in your procedures and this can explain the merge conflicts.
If this isn't enough to figure out what went wrong, then I suggest asking this question on the Mercurial mailinglist. That's a great place for discussion and bug-hunting — much better than Stack Overflow.

broken revlog and orphan revlog in Mercurial - How to repair?

This is what i get when i do hg verify :
repository uses revlog format 1
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
includes/base/class/ViewInstanceAdapter.class.php#7: broken revlog! (index data/includes/base/class/ViewInstanceAdapter.class.php.i is corrupted)
warning: orphan revlog 'data/includes/base/class/ViewInstanceAdapter.class.php.i'
158 files, 61 changesets, 270 total revisions
1 warnings encountered!
1 integrity errors encountered!
(first damaged changeset appears to be 7)
I do not use Mercurial for a long time and i don't understand what this means.
(I'm on windows using TortoiseHg, and the project is local only)
As said before (although you already confirmed this doesn’t work), you should start by trying to clone the repository; if the problems are related to the dirstate this can bypass it.
Next, every clone contains a complete repository, so every clone is effectively a back-up. Don’t you have a central server or colleague or another local copy? Try cloning that, then pulling from your corrupted repository. As the first damaged changeset is reported as being no. 7 (out of 270), this should be a pretty old one so likely easy to recover, and hopefully the damage does not prevent Mercurial from pulling changesets beyond that.
A third option you could try is to run a Mercurial-Mercurial conversion on your repository (hg convert repo repo-copy); a verbatim conversion should the keep changeset IDs intact, although it will probably run into the same problem. You could also try to specify a filemap to filter out the ViewInstanceAdapter file.
Because the damaged changeset is so old, and given that Mercurial uses an append-only writing method, the probable cause for this problem is a hardware failure or some kind of random disk corruption.
Note that Mercurial is not a backup system and does not provide redundancy. Making frequent back-ups (which in Mercurial’s case is as easy as a ‘hg push’) is the only way to make sure you don’t lose your precious code.
An alternate cause that I feel I should warn you about are virus scanners or the Windows indexing service. These lock files in a certain way that prevents them from being deleted during short time windows. Although Mercurial does its best to be robust, it is hard to defend against all cases. It is recommended to white-list your repositories, see this note.
I found a solution (Thanks to Laurens Holst) ONLY if you have a clean bakcup (with no error) including the issue revision.
In my problem rev issue is 7 and i have a backup until rev 18.
Steps :
Clone the backup repository at the last common rev (here it is 18)
Pull broken repository revs into cloned one (you have now two heads but no modifications on the working directory of course)
Update cloned repository to the most recent revision (tip)
You have now a working .hg dir :)

Fixing a failed integrity check in Mercurial?

I just did hg pull on a repository and brought in some changesets. It said to run hg update, so I did. Unfortunately, when I did that, it failed with the following error message:
abort: integrity check failed on 00manifest.i:173!
When I run hg verify, it tells me there are a number of issues with things not in the manifest (with some slight path obscuring):
>hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
somewhere1/file1.aspx#172: in changeset but not in manifest
somewhere2/file1.pdf#170: in changeset but not in manifest checking files
file3.csproj#172: ee005cae8058 not in manifests
somewhere2/file1.pdf#171: 00371c8b9d95 not in manifests
somewhere3/file1.ascx#170: 5c921d9bf620 not in manifests
somewhere4/file1.ascx#172: 23acbd0efd3a not in manifests
somewhere5/file1.aspx#170: ce48ed795067 not in manifests
somewhere5/file2.aspx#171: 15d13df4206f not in manifests
1328 files, 174 changesets, 3182 total revisions
8 integrity errors encountered!
(first damaged changeset appears to be 170)
The source repository passes hg verify just fine.
Is there any way to recover from an integrity check failure or do I need to re-clone the repository completely from the source (not a huge issue in this case)? What could I have done to cause this, so I don't do it again?
Well, since the first damaged changeset is 170, you could clone your local repository to 169 and then pull from the source. That means only pulling 5 changesets.
hg clone -r 169 damagedrepo fixedrepo
cd fixedreop
hg verify
And then:
hg pull originalsource
As for manual recovery of repository corruption, this page expounds on that better than I can. See section 4:
I have found corruption once in a while before, and although the above
documentation says it is usually from user error, my instances were on
removable USB drives with empty working directories. Sometimes things
just don't get written correctly or are interfered with somehow: it's
not always user error. But I always have multiple copies I can reclone
from so I've been able to get away with basic fixing.
If the simple fix of a partial local clone and pulling from the server doesn't fix it, you're down to 2 options after backing up your changes (if any) to a bundle or patches:
Manually hacking at Mercurial's files.
Doing a new full clone from the server. Usually the easier and faster of the two.
Beware: This method will change all hashes.
Actually there is another way to recover the repository when it is corrupted like this -
You can do a complete rebuild of the repository by using the convert extension. See Section 4.5 on https://www.mercurial-scm.org/wiki/RepositoryCorruption#Recovery_using_convert_extension
First enable the convert extension by adding the following to your ~/.hgrc file
[extensions]
convert=
Then convert the bad repo to create a fixed repo:
$ hg convert --config convert.hg.ignoreerrors=True REPO REPOFIX
This worked for me when I had the experience of suddenly finding that there were missing files in the manifests - "error 255".
Try remove your file 00manifest.i from repo and next use hg remove 00manifest.i and hg commit commands. Worked for me.
What we ended up doing was making a new copy of our 'central' repository, deleting the .hg folder in this copy, creating a new repository there (hg init), and then working with this as the central repository.
Be aware however this is only an appropriate solution if you don't need your changeset history other than as a reference (which we don't). You can still use your old central repository for this purpose.