We have a 200mb file. We currently use rsync to transfer it between developers when it changes. If we include it as part of our mercurial repository, will mercurial only transfer the diff like rsync or will it transfer the full file when changed?
Mercurial only transfers the deltas when you hg pull. Otherwise distributed version control wouldn't work at all.
When you hg pull, you get all the changesets missing in your local clone. Each changeset only contain a delta. The delta can be small or large, but if you're happy with the deltas found by rsync today, then you should also be happy with Mercurial.
About binary files: Mercurial does not distinguish between "text" and "binary" files when making a commit. They are all treated the same and delta compression is used in all cases. What can confuse this is that delta compression is useless if a file changes radically on every edit — the delta will be just as big as the file itself. Mercurial actually stores a compressed snapshot of the file in that case.
Several Mercurial extensions have been written for handling large files. They work by versioning the checksum, rather than by versioning the file itself.
If you are using Mercurial 2.0 or later, then it includes the LargeFilesExtension by default. The docs explain how the extension works:
The largefiles extension allows for tracking large, incompressible
binary files in Mercurial without requiring excessive bandwidth for
clones and pulls. Files added as largefiles are not tracked directly
by Mercurial; rather, their revisions are identified by a checksum,
and Mercurial tracks these checksums. This way, when you clone a
repository or pull in changesets, the large files in older revisions
of the repository are not needed, and only the ones needed to update
to the current version are downloaded. This saves both disk space and
bandwidth.
There are also other extensions you could use. There is more information here: Handling Large Files
Related
Assume I recover a Mercurial repository from a broken file system (e.g. bad hard drive), and I want to be sure that this one was not affected.
How can I force a self-check in Mercurial? That is, Mercurial walks through the whole history and checks that all checksums fit their respective dataset, and that the repository as a whole is consistent.
Is it sufficient to perform a local "hg clone" to enforce that check?
It there something like "git fsck" for Mecurial?
The command for a pure check is:
hg verify
In case the repository is corrupt, the Mercural wiki provides recovery instructions:
https://www.mercurial-scm.org/wiki/RepositoryCorruption
Of course, this only checks the commits, not the working directory. That it, it neither checks local changes that were not yet committed, nor ignored files such as build results. All those can't be verified by Mercurial, of course. Those would either have to be verified by different means, or simply be reset using a fresh Mercurial checkout and a fresh build.
I have a mercurial repository at c:\Dropbox\code. I've created a clone of this repo locally using:
hg clone -U c:\Dropbox\code c:\GoogleDrive\codeBackup
This bare repo serves the purpose of backup only. I regularly push changes to codeBackup. Furthermore, both the directories are backed-up in the cloud (Dropbox & Google Drive respectively).
If my repo in code becomes corrupt would the codeBackup repo automatically be corrupt since the clone operation used hard links to the original repo? Thus my double-cloud-backup strategy would be useless?
P.S. : I understand that the fall back option is to use the cloud service to restore a previous known good state.
UPDATE : After digging around, I'll add these for reference
Discussion on repo corruption in mercurial
The problem is, if a 'hg clone' was done (without --pull option), then
the destination and the source repo share files inside .hg/store by
using hardlinks 1, if the filesystem provides the hardlinking
feature (NTFS does).
Mercurial is designed to break such hardlinks inside .hg if a commit
or push is done to one of the clones. The prerequisite for this is,
that the Windows API mercurial is using should give a correct answer,
if mercurial is asking "how many hardlinks are on this file?".
We found out that this answer is almost always wrong (always reporting
1, even if it is in fact >1) iff the hg process is running on one
Windows computer and the repository files are on a network share on a
different Windows computer.
To avoid hardlinks (use --pull):
hg clone -U --pull c:\Dropbox\code c:\GoogleDrive\codeBackup
To check for hardlinks:
fsutil hardlink list <file> : Shows all hardlinks for <file>
find . -links +1 : Shows all files with hardlinks > 1
ls -l : shows hardlinks count next to each file
The biggest problem here, regarding repository corruption, is that you're using Dropbox and Google Drive to synchronize repositories across machines.
Don't do that!
This will surely lead to repository corruption unless you can guarantee that:
Your machines will never lose internet connection
You will never have new changes unsynchronized on more than one machine at a time (including times where you have had internet problems)
That Dropbox will always run (variant of never lose internet connection)
You're not just plain unlucky regarding timing
To verify that Dropbox can easily lead to repository corruption, do the following:
Navigate to a folder inside your Dropbox or Google Drive folder and create a Mercurial repository here. Do this on one machine, let's call this machine A.
Add 3 text files to it, with some content (not empty), and commit those 3 text files.
Wait for Dropbox/Google Drive to synchronize all those files onto your second computer, let's call this machine B
Either disconnect the internet on one of the machines, or stop Dropbox/Google Drive on it (doesn't matter which one)
On Machine A, change file 1 and 2, by adding or modifying content in them. On Machine B, change file 2 and 3, making sure to add/modify in some different content from what you did on machine A. Commit all the changes on both machines.
Reconnect to the internet or restart Dropbox/Google Drive, depending on what you did in step 4
Wait for synchronization to complete (Dropbox will show a green checkmark in its tray icon, unsure what Google Drive will display)
Run hg verify in the repositories on both machine A and B
Notice that they are now both corrupt:
D:\Dropbox\Temp\repotest>hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
3.txt#?: rev 1 points to unexpected changeset 1
(expected 0)
3.txt#?: 89ab3388d4d1 not in manifests
3 files, 2 changesets, 6 total revisions
1 warnings encountered!
2 integrity errors encountered!
Instead get a free bitbucket or kiln account and use that to push and pull between to synchronize across multiple computers.
The only way you code repository can become corrupt (assuming it was not corrupt when you initially cloned it over to codeBackup) is when you write something to it, be it committing, rewriting history, etc. Whenever something gets written to a hard-linked file, Mercurial first breaks the hard link, creates an independent copy of the file and then only modifies that newly created copy.
So to answer your questions: under normal usage scenarios repository corruption will not propagate to your codeBackup repository.
This is what i get when i do hg verify :
repository uses revlog format 1
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
includes/base/class/ViewInstanceAdapter.class.php#7: broken revlog! (index data/includes/base/class/ViewInstanceAdapter.class.php.i is corrupted)
warning: orphan revlog 'data/includes/base/class/ViewInstanceAdapter.class.php.i'
158 files, 61 changesets, 270 total revisions
1 warnings encountered!
1 integrity errors encountered!
(first damaged changeset appears to be 7)
I do not use Mercurial for a long time and i don't understand what this means.
(I'm on windows using TortoiseHg, and the project is local only)
As said before (although you already confirmed this doesn’t work), you should start by trying to clone the repository; if the problems are related to the dirstate this can bypass it.
Next, every clone contains a complete repository, so every clone is effectively a back-up. Don’t you have a central server or colleague or another local copy? Try cloning that, then pulling from your corrupted repository. As the first damaged changeset is reported as being no. 7 (out of 270), this should be a pretty old one so likely easy to recover, and hopefully the damage does not prevent Mercurial from pulling changesets beyond that.
A third option you could try is to run a Mercurial-Mercurial conversion on your repository (hg convert repo repo-copy); a verbatim conversion should the keep changeset IDs intact, although it will probably run into the same problem. You could also try to specify a filemap to filter out the ViewInstanceAdapter file.
Because the damaged changeset is so old, and given that Mercurial uses an append-only writing method, the probable cause for this problem is a hardware failure or some kind of random disk corruption.
Note that Mercurial is not a backup system and does not provide redundancy. Making frequent back-ups (which in Mercurial’s case is as easy as a ‘hg push’) is the only way to make sure you don’t lose your precious code.
An alternate cause that I feel I should warn you about are virus scanners or the Windows indexing service. These lock files in a certain way that prevents them from being deleted during short time windows. Although Mercurial does its best to be robust, it is hard to defend against all cases. It is recommended to white-list your repositories, see this note.
I found a solution (Thanks to Laurens Holst) ONLY if you have a clean bakcup (with no error) including the issue revision.
In my problem rev issue is 7 and i have a backup until rev 18.
Steps :
Clone the backup repository at the last common rev (here it is 18)
Pull broken repository revs into cloned one (you have now two heads but no modifications on the working directory of course)
Update cloned repository to the most recent revision (tip)
You have now a working .hg dir :)
In short:
How can I use Hg to synchronize repositories between two computers using a flash drive as intermediary?
With more detail:
I often develop code on computers that aren't networked in any way, and I transfer files between these machines using a USB flash drive. Now I would like to develop some software across these machines using Hg repositories on each machine that I can frequently sync-up using the flash drive transfer mechanism.
I'm slightly familiar with Hg, as I use it in the most simple way possible for versioning only my own work on independent machines, but am uncertain as to exactly what I should do to use it to synchronize repositories between two computers using a flash drive as intermediary. Maybe, for example, I need to create a temporary repository on the flash drive (using “clone”) from which I then sync to (using “push” and “pull”), and do this by A→flash, flash→B, B→flash, flash→A? The more specificity in your answer regarding the sequence of actions and commands, the more useful to me.
Finally, how do I get this process started? Do I need to do something so Hg knows these are all part of one code base? For example, each of my current repositories on the different computers was created independently from a time before I started using Hg, and although all the code is similar, independent changes have been made to each, and the repositories know nothing about each other. If what I need to do with this is different than what I need to do for the ongoing case once I have everything unified, spelling this process out for me as well would also help.
In case it's important, these machines can be running any of Windows, Mac, or Linux, and my versions of Mercurial are slightly different on each machine (though the Mercurial versions could be unified if needed).
What you have described above in terms of using the flash drive as an intermediate storage location should work. My process would be:
initial setup
create repo on computer A (using hg init)
clone the repo from computer A to flash drive
hg clone C:/path/to/repo/A X:/path/to/flash/drive/repo
clone the repo from flash drive to computer B
hg clone X:/path/to/flash/drive/repo C:/path/to/repo/B
working process
edit/commit to repo on computer A
push from computer A to flash drive
hg push X:/path/to/flash/drive/repo
pull from flash drive to computer B
hg pull X:/path/to/flash/drive/repo
edit/commit repo on computer B
push from computer B to flash drive (same commands as above)
pull from flash drive to computer A (same commands as above)
Finally, how do I get this process
started? Do I need to do something so
Hg knows these are all part of one
code base?
Mercurial knows if two arbitrary repositories have a common ancestor by looking at the SHA1 hash keys of the commits in each repo. In other words, assuming both repos have at least one common hash key in their histories, Mercurial will attempt to merge them. In your specific case, where both repos are initially un-versioned, Mercurial will need some help. The best thing to do would be to get to a place where both repos are identical and then perform your hg init. Mercurial should handle sharing from this point on.
When working offline on different machines. It is better to use the bundle command that comes with Mercurial. So echoing what dls wrote but a slight change process.
Initial setup as mentioned by dls.
or
Go to your Mercurial repository top directory
Create bundle: hg bundle --base null ../project.hg
Copy the project.hg file to your other computer
Create a directory there
Make it an Mercurial repository : hg init
Incorporate the bundle: hg pull <path/project.hg>
hg update
Check hg log, both the repository will show same base revisions and tip
Workflow using bundle
I use a slightly different workflow. I keep these repositories as distinct repositories.
I mention them as repo1 and repo2.
Suppose that the current tip of repo1 is 4f45839f613c.
You make changes and commit them in repo1
Create a bundle of the changes :
Command : This bundle contains all changes since the specified base version.
hg bundle --base 4f45839f613c changes.bundle
Take it to repo2 by copying the bundle.
You can simply pull the bundle to repo2 :
Command :
hg pull changes.bundle
If the bundle contains changes that are already present in repo2, then these will be ignored when pulling. As long as the bundle doesn't grow to large, this allows to use the bundle command with the same --base revision again and again to create bundles including further changes.
About bundles: these are (very well) compressed.
creates a (compressed) backup of the repository
hg bundle --base null backup.bundle
[Edit : Adding some links on this topic]
http://blog.experimentalworks.net/2010/09/review-remote-changes-offline-in-mercurial/
https://www.mercurial-scm.org/wiki/Bundle
[Edit: What I think is advantage of using bundle]
Bundles can be created offline, copied or sent via mail. Using push to repo on flash drive, requires it to be connected. Bundles are easier since it does not maintain that the two repo from which you push and pull have to be available at the same time.
Apart from that, bundles can also be of two types : Changesets and Incremental. Changeset bundles are complete standalone bundles. You can also use bundles for backup as a single file.
I want to do the equivalent of svn export REMOTE_URL with a mercurial repository. What I want at the end is an unversioned snapshot of the repository at the remote URL, but without cloning all of the changesets over to my local machine.
Also, I want to be able to specify a tag in the remote repository to pick this from. If it's not obvious, I'm building a release management tool that pulls from a canonical mercurial repository to build a release file, and it's slow right now because some projects have large, multiple-version binary files committed.
Is this possible? How would one go about it?
Its usually easier (if the remote HG is using the hgweb interface) to just visit the repo in your browser and download a .tgz / .zip / .bz2 of the tip revision. You'll see the links if the remote HG supports this.
If you want the repository, you need all of the revisions that went into the current tip for it to be at all functional.
There are options to hg clone that allow you to fetch a repository up to a certain revision, but none (that I could find) that allow you to get just the tip revision. What you are essentially asking for is a snapshot of the repo.
Edit: To Get A Snapshot
hg clone http[s]://url.to.repo repo.hg
cd repo.hg
hg archive ../repo-snapshot
cd ..
rm -rf repo.hg
The snapshot is now in repo-snapshot.
Yes, this does entail cloning the repo first, which is why I suggested seeing if the remote hgweb supports on the fly downloads of any particular revision. If it does, your problem is solved with something like curl or wget instead of HG.
If not, its good to let the original repo 'live' since you can update it again later via hg pull, then create another snapshot of a future release. This saves having to start over from scratch when cloning, especially for large repositories with lots of changes.
Also, Linux centric, but you get the gist. Of course, replace http[s] with the desired protocol as needed.
Is there any reason you can't maintain a mirror (updated in the background however often you want) of the remote repository on your local machine, then have the release management tool on your local machine run hg archive out of the local clone as necessary? If your concern is user-responsiveness, and not total bandwidth/storage consumed, this offsets the "slow" part to where you won't see it.
Tim Post noted that if you do have the hgweb CGI interface available, you can configure it to pull compressed archives down and unpack them (and the interface is consistent enough that you could script that via wget), but if you don't, core Mercurial doesn't have a lot of tools to help you, and the developers have expressed an opposition to trying to turn Mercurial into a general rsync-type client.
If you aren't afraid of playing with unofficial add-ons, you could have a look at the FTP Extension. That will force you to push from the server, however.