Does anybody know why hg status is slow (3-10 secs) the first time it's called from the command line on a windows client (I'm assuming it is cached after that).
hg status is a local operation and it should not take that long especially with empty repos.
This is the case on both an active repository with several changes and a brand new repo with no files. So the size of the repo does not seem to be a factor on the performance.
Thanks!
When you run the hg status command, Mercurial has to scan almost every directory and file in your repository so that it can display file status. Hg has to perform at least one expensive system call for each managed file to determine whether it's changed since the last time Mercurial checked, there's no avoiding that.
I believe the reason subsequent calls to hg st are faster is because of the cached information the OS retains about all recently accessed files —avoiding disk access if the file has not been modified—. Sometimes the files themselves may even remain memory mapped by the OS or cached altogether on the HDD buffer.
Edit: also, if you haven't invoked hg in a while, the OS will need to read the hg executable and its dependencies from disk, since they might not be cached on RAM already.
Related
Summary
I've got a build in TeamCity using Mercurial as the VCS and it's repeatedly failing for one of these two reasons:
hg init - repository already exists, except I deleted the whole directory before this so it definitely didn't exist.
hg pull - timed out waiting for lock, but the lock it's waiting for seems to be its own lock.
I'm really hoping that someone has come across this before, or might be able to give me some ideas for how to troubleshoot it anyway.
Setup
I'm using TortoiseHg as the mercurial client, and I've updated it
(and hence Mercurial) to version 4.6.1 on both the build server and
agent.
The agent is running on a Windows 7 VM.
I have a Windows 10 VM with the same TeamCity/Mercurial setup that's
working fine.
The repo being pulled from is located on a network share.
The folder being pulled to is on a secondary drive on the VM.
The two problems I'm seeing are as follows:
1. Hg init failure
Steps:
Manually delete the whole working directory from the buildagent, so that's .hg folder and it's parent folder.
The working folder doesn't even exist now, so TeamCity will have to completely recreate the folder.
Run build on TeamCity, with clean all files selected.
Build starts, creates directory and calls hg init.
Error message that hg init failed because the "repository already exists".
When I look at the directory I can see a .hg folder, and some files inside it including a wlock file.
2. Pull failure
Steps:
Leave the working directory from problem 1 in place, including the .hg directory.
Ensure any lock files are deleted and hg recover has been run just in case.
Run build on TeamCity, without cleaning the directory.
The logs show hg pull starting and bundling files, but also says "waiting for lock on working directory of E:\blah held by process '3408' on host 'BUILDAGENT'
3408 here is an example, the number changes every time and corresponds to the hg.exe process that seems to be doing the pull.
Eventually after a lot of bundling and files messages I'll get a message saying it timed out waiting for the lock.
But of course the lock it's waiting for seems to be the lock it's holding itself!
If I delete the wlock file during this time, I'll see messages saying "got lock after X seconds" and immediately after it "waiting for lock on repository E:\blah held by process '3408' on host 'BUILDAGENT'. Then eventually it'll fail with a message about an abandoned transaction.
Does anyone have any ideas?
I have a mercurial repository at c:\Dropbox\code. I've created a clone of this repo locally using:
hg clone -U c:\Dropbox\code c:\GoogleDrive\codeBackup
This bare repo serves the purpose of backup only. I regularly push changes to codeBackup. Furthermore, both the directories are backed-up in the cloud (Dropbox & Google Drive respectively).
If my repo in code becomes corrupt would the codeBackup repo automatically be corrupt since the clone operation used hard links to the original repo? Thus my double-cloud-backup strategy would be useless?
P.S. : I understand that the fall back option is to use the cloud service to restore a previous known good state.
UPDATE : After digging around, I'll add these for reference
Discussion on repo corruption in mercurial
The problem is, if a 'hg clone' was done (without --pull option), then
the destination and the source repo share files inside .hg/store by
using hardlinks 1, if the filesystem provides the hardlinking
feature (NTFS does).
Mercurial is designed to break such hardlinks inside .hg if a commit
or push is done to one of the clones. The prerequisite for this is,
that the Windows API mercurial is using should give a correct answer,
if mercurial is asking "how many hardlinks are on this file?".
We found out that this answer is almost always wrong (always reporting
1, even if it is in fact >1) iff the hg process is running on one
Windows computer and the repository files are on a network share on a
different Windows computer.
To avoid hardlinks (use --pull):
hg clone -U --pull c:\Dropbox\code c:\GoogleDrive\codeBackup
To check for hardlinks:
fsutil hardlink list <file> : Shows all hardlinks for <file>
find . -links +1 : Shows all files with hardlinks > 1
ls -l : shows hardlinks count next to each file
The biggest problem here, regarding repository corruption, is that you're using Dropbox and Google Drive to synchronize repositories across machines.
Don't do that!
This will surely lead to repository corruption unless you can guarantee that:
Your machines will never lose internet connection
You will never have new changes unsynchronized on more than one machine at a time (including times where you have had internet problems)
That Dropbox will always run (variant of never lose internet connection)
You're not just plain unlucky regarding timing
To verify that Dropbox can easily lead to repository corruption, do the following:
Navigate to a folder inside your Dropbox or Google Drive folder and create a Mercurial repository here. Do this on one machine, let's call this machine A.
Add 3 text files to it, with some content (not empty), and commit those 3 text files.
Wait for Dropbox/Google Drive to synchronize all those files onto your second computer, let's call this machine B
Either disconnect the internet on one of the machines, or stop Dropbox/Google Drive on it (doesn't matter which one)
On Machine A, change file 1 and 2, by adding or modifying content in them. On Machine B, change file 2 and 3, making sure to add/modify in some different content from what you did on machine A. Commit all the changes on both machines.
Reconnect to the internet or restart Dropbox/Google Drive, depending on what you did in step 4
Wait for synchronization to complete (Dropbox will show a green checkmark in its tray icon, unsure what Google Drive will display)
Run hg verify in the repositories on both machine A and B
Notice that they are now both corrupt:
D:\Dropbox\Temp\repotest>hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
3.txt#?: rev 1 points to unexpected changeset 1
(expected 0)
3.txt#?: 89ab3388d4d1 not in manifests
3 files, 2 changesets, 6 total revisions
1 warnings encountered!
2 integrity errors encountered!
Instead get a free bitbucket or kiln account and use that to push and pull between to synchronize across multiple computers.
The only way you code repository can become corrupt (assuming it was not corrupt when you initially cloned it over to codeBackup) is when you write something to it, be it committing, rewriting history, etc. Whenever something gets written to a hard-linked file, Mercurial first breaks the hard link, creates an independent copy of the file and then only modifies that newly created copy.
So to answer your questions: under normal usage scenarios repository corruption will not propagate to your codeBackup repository.
So, for example, if there's a mercurial repository https://code.google.com/p/potentiallyLarge is there a command which would allow me to find out its size before cloning it? Something like
hg size https://code.google.com/p/potentiallyLarge
Also, is there a command for doing this for subversion repositories?
The size used on disk is different from the bandwidth used to make a clone. Some hosting sites (such as Bitbucket) display the size on disk so that you know upfront how much space you'll need on your system before cloning. But I can see that Google Code doesn't, so it wont help you here.
The Mercurial wire protocol doesn't expose any commands that can tell you how big a repository is. When you make a normal clone, the client doesn't know upfront how much data it will receive, it just receives a stream of data. After receiving the changelog, the client knows how many manifests and filelogs to expect, but it doesn't know the size of them.
In fact, it's difficult for the server to compute how much data a clone will use: the network bandwidth used is less than the disk space since the compression used is different (bzip2 vs gzip). However, if you use --uncompressed with your clone (which Google Code doesn't support) then there is a trick, see below.
The only way to know much bandwidth a clone uses is to make one. If you have a clone already you can use hg bundle to simulate a clone:
$ hg bundle --all my-bundle.hg
The size of the bundle will tell you how much data there is in the repository.
A trick: If Google Code had supported hg clone --uncompressed, then you could use that to learn the size of a remote repository! When you use --uncompressed, the client asks the server to send the content of the .hg/ directory as-is — without re-compressing it with bzip2. Conveniently, the server starts the stream by telling the client the size of the repository. So you can start such a clone and then abort it (with Control-C) when your client has printed the line telling you the size of the repo.
Update: My answer below is wrong, but I'm leaving it here since MG provided some good info in response. It looks like the right answer is "no".
Not a great way, but a work-around sort of way. A hg clone URL is really just hg init ; hg pull URL And the command hg incoming tells you what you'd get if you did a pull, so you could do:
hg init theproject
cd theproject
hg incoming --stat URL_TO_THE_PROJECT
and get a pretty decent guess of how much data you'll be pulling down if you follow up with:
hg pull URL_TO_THE_PROJECT
I'm not sure about the network efficiency of hg incoming but I don't think it downloads everything from all the changesets, though I could be wrong about that. It offers a --bundle option that saves whatever incoming pulls down to a file from which you can later pull to avoid double downloading.
This is what i get when i do hg verify :
repository uses revlog format 1
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
includes/base/class/ViewInstanceAdapter.class.php#7: broken revlog! (index data/includes/base/class/ViewInstanceAdapter.class.php.i is corrupted)
warning: orphan revlog 'data/includes/base/class/ViewInstanceAdapter.class.php.i'
158 files, 61 changesets, 270 total revisions
1 warnings encountered!
1 integrity errors encountered!
(first damaged changeset appears to be 7)
I do not use Mercurial for a long time and i don't understand what this means.
(I'm on windows using TortoiseHg, and the project is local only)
As said before (although you already confirmed this doesn’t work), you should start by trying to clone the repository; if the problems are related to the dirstate this can bypass it.
Next, every clone contains a complete repository, so every clone is effectively a back-up. Don’t you have a central server or colleague or another local copy? Try cloning that, then pulling from your corrupted repository. As the first damaged changeset is reported as being no. 7 (out of 270), this should be a pretty old one so likely easy to recover, and hopefully the damage does not prevent Mercurial from pulling changesets beyond that.
A third option you could try is to run a Mercurial-Mercurial conversion on your repository (hg convert repo repo-copy); a verbatim conversion should the keep changeset IDs intact, although it will probably run into the same problem. You could also try to specify a filemap to filter out the ViewInstanceAdapter file.
Because the damaged changeset is so old, and given that Mercurial uses an append-only writing method, the probable cause for this problem is a hardware failure or some kind of random disk corruption.
Note that Mercurial is not a backup system and does not provide redundancy. Making frequent back-ups (which in Mercurial’s case is as easy as a ‘hg push’) is the only way to make sure you don’t lose your precious code.
An alternate cause that I feel I should warn you about are virus scanners or the Windows indexing service. These lock files in a certain way that prevents them from being deleted during short time windows. Although Mercurial does its best to be robust, it is hard to defend against all cases. It is recommended to white-list your repositories, see this note.
I found a solution (Thanks to Laurens Holst) ONLY if you have a clean bakcup (with no error) including the issue revision.
In my problem rev issue is 7 and i have a backup until rev 18.
Steps :
Clone the backup repository at the last common rev (here it is 18)
Pull broken repository revs into cloned one (you have now two heads but no modifications on the working directory of course)
Update cloned repository to the most recent revision (tip)
You have now a working .hg dir :)
I have created a repository on https://bitbucket.org/ and use TortoiseHg to clone it to a folder in my local machine. I am able to add files commit files, but what I find is they never get updated on the Server at Bitbucket. By some fiddling, I found that there is this synch option. What I don't get is, why do I have to press the Synch. If I meant to commit, then it should commit.
Where is it being stored If it's not synched immediately with the remove server.
Note: I am trying out TortoiseHg and Mercurial while having ample experience on SubVersion.
When you commit, you commit to your local repository - i.e. to the .hg directory in the root of your project.. To synch with the remote repository you need to explicitly push your changes. This is how DVCSs work - it's not the same model as SVN.
A key feature of a distributed version control system is that you can make local commits. This means that the new commit does not leave your machine when you press "Commit", it is just stored locally.
This has some direct consequences:
you can work while you are offline, e.g., in a train or in a plane
commits are very fast: in Mercurial, creating a new commit involving n files (normally called a changeset) means appending a few bytes to n + 2 files.
you can change your mind: since you have not shared the new changeset with anybody, you can delete it from your local machine without problem
It also has some indirect consequences:
because commits are fast, people tend to make many more commits. The commits are typically more fine-grained than what you see in a centralized system and this makes it easier to review the changes
because commits are local, it often happens that people do concurrent work. This happens when both you and I make one or more commits based on the same initial version:
[a] --- [b] --- [c] <-- you
/
... [x] --- [y]
\
[r] --- [s] <-- me
The history has then effectively forked since we both started work based on changeset y. For this to work, we must be able to merge the two forks. Because this happens all the time, you'll find that Mercurial has very robust support for merging.
So, by decoupling the creation of a commit with the publishing of a commit, you gain some significant advantages.