Data corruption of a mercurial repository

Data corruption of a mercurial repository - mercurial

I have a mercurial repository at c:\Dropbox\code. I've created a clone of this repo locally using:
hg clone -U c:\Dropbox\code c:\GoogleDrive\codeBackup
This bare repo serves the purpose of backup only. I regularly push changes to codeBackup. Furthermore, both the directories are backed-up in the cloud (Dropbox & Google Drive respectively).
If my repo in code becomes corrupt would the codeBackup repo automatically be corrupt since the clone operation used hard links to the original repo? Thus my double-cloud-backup strategy would be useless?
P.S. : I understand that the fall back option is to use the cloud service to restore a previous known good state.
UPDATE : After digging around, I'll add these for reference
Discussion on repo corruption in mercurial
The problem is, if a 'hg clone' was done (without --pull option), then
the destination and the source repo share files inside .hg/store by
using hardlinks 1, if the filesystem provides the hardlinking
feature (NTFS does).
Mercurial is designed to break such hardlinks inside .hg if a commit
or push is done to one of the clones. The prerequisite for this is,
that the Windows API mercurial is using should give a correct answer,
if mercurial is asking "how many hardlinks are on this file?".
We found out that this answer is almost always wrong (always reporting
1, even if it is in fact >1) iff the hg process is running on one
Windows computer and the repository files are on a network share on a
different Windows computer.
To avoid hardlinks (use --pull):
hg clone -U --pull c:\Dropbox\code c:\GoogleDrive\codeBackup
To check for hardlinks:
fsutil hardlink list <file> : Shows all hardlinks for <file>
find . -links +1 : Shows all files with hardlinks > 1
ls -l : shows hardlinks count next to each file

The biggest problem here, regarding repository corruption, is that you're using Dropbox and Google Drive to synchronize repositories across machines.
Don't do that!
This will surely lead to repository corruption unless you can guarantee that:
Your machines will never lose internet connection
You will never have new changes unsynchronized on more than one machine at a time (including times where you have had internet problems)
That Dropbox will always run (variant of never lose internet connection)
You're not just plain unlucky regarding timing
To verify that Dropbox can easily lead to repository corruption, do the following:
Navigate to a folder inside your Dropbox or Google Drive folder and create a Mercurial repository here. Do this on one machine, let's call this machine A.
Add 3 text files to it, with some content (not empty), and commit those 3 text files.
Wait for Dropbox/Google Drive to synchronize all those files onto your second computer, let's call this machine B
Either disconnect the internet on one of the machines, or stop Dropbox/Google Drive on it (doesn't matter which one)
On Machine A, change file 1 and 2, by adding or modifying content in them. On Machine B, change file 2 and 3, making sure to add/modify in some different content from what you did on machine A. Commit all the changes on both machines.
Reconnect to the internet or restart Dropbox/Google Drive, depending on what you did in step 4
Wait for synchronization to complete (Dropbox will show a green checkmark in its tray icon, unsure what Google Drive will display)
Run hg verify in the repositories on both machine A and B
Notice that they are now both corrupt:
D:\Dropbox\Temp\repotest>hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
3.txt#?: rev 1 points to unexpected changeset 1
(expected 0)
3.txt#?: 89ab3388d4d1 not in manifests
3 files, 2 changesets, 6 total revisions
1 warnings encountered!
2 integrity errors encountered!
Instead get a free bitbucket or kiln account and use that to push and pull between to synchronize across multiple computers.

The only way you code repository can become corrupt (assuming it was not corrupt when you initially cloned it over to codeBackup) is when you write something to it, be it committing, rewriting history, etc. Whenever something gets written to a hard-linked file, Mercurial first breaks the hard link, creates an independent copy of the file and then only modifies that newly created copy.
So to answer your questions: under normal usage scenarios repository corruption will not propagate to your codeBackup repository.

Related

Mercurial - how to make a local central repository

I would like to have a central repository in a directory on my local computer without setting up a server.
Context: I am working with my boss on a local server inside our LAN. We both are using a vnc connection onto the server, and are doing our work there for simplicity reasons. I would like to set it up so that I can have a copy of my scripts for development, and then when I get to a release, I will push it to a different directory that my boss can then run them from (or even better pull from to his own set and run from there).
I read that you can create a hg server by running 'hg serve', but I do not want to open it up to the LAN, because I don't want it to be accessible.
I tried running 'hg push /home/source' and it gave me an error.
I then ran 'hg init' while in that directory and tried again. It looked like it worked, and then didn't show any files in the directory. I ran status and it showed nothing, and then ran log and it showed the commits.

... without setting up a server
One way I've used to share a "central" Mercurial repository without having to deal with any "server" issues is to have the "central" repository in a folder on Dropbox.
For example, suppose:
your repository is named "repo" and that your "private" copy is in ~/repo
your Dropbox directory on your computer is ~/Dropbox/
Then:
cd ~/Dropbox
hg clone ~/repo
Now suppose you make some changes in ~/repo. You can then "push" them from ~/repo to ~/Dropbox/repo, or (more easily, as explained below) "pull" them into ~/Dropbox/repo when you're ready.
To make updating the "central" repository convenient, you might like to create a script such as:
#!/bin/bash
cd ~/Dropbox/repo
hg tip
hg pull -u
hg tip
Notice that in the script, there is no need to specify the source from which to pull; the hgrc file that's created when you created the clone keeps track of that. (Thank you, hg.)
If your colleague has direct access to a folder on your computer, then you could still adopt the strategy described above, without using Dropbox.
Needless to say, there are many variations.
Needless to say also, if more than one person attempts to commit changes to the shared folder, chaos can easily ensue.

mercurial: how to update production files without a server (EDIT: workaround + non-MS Windows solution)

I need to control the version of a few files accessible via an SMB share. These files will be modified by several people. The files themselves are directly used by a web server.
Since these are production files I wanted to force the users to pull a local copy, edit them, commit and push them back. Unfortunately there is no Mercurial server on that machine.
What would be the appropriate way to configure Mercurial on my side so that:
the versioning (.hg directory) is kept on the share
and that the files on the share are at the latest version?
I do not have access to this server (other than via the share). If I could have a mercurial server on that machine I would have used a hook to update the files in the production directory (I am saying this just to highlight what I want to achieve - this approach is not possible as I do not control that server)
Thanks!
UPDATE: I ended up using an intermediate server (which I have control over). A hook on changegroup triggers a script which i) hg update to have fresh local files ii) copies them to the SMB share
EDIT 1 Following discussions in comments with alex I have looked at the verbose version of the command line output. The \\srv\hg\test1 repo has a [hooks] section with changegroup = hg update. The output from a hg push -v gives some insights:
pushing to \\srv\hg\test1
query 1; heads
(...)
updating the branch cache
running hook changegroup: hg update
'\\srv\hg\test1'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported. Defaulting to Windows directory.
abort: no repository found in 'C:\Windows' (.hg not found)!
warning: changegroup hook exited with status 255
checking for updated bookmarks
listing keys for "bookmarks"
If I understand correctly the output above:
a cmd.exe was triggered on the client, even though the [hook] was on the receiving server
it tried to update the remote repo
... but failed because UNC are not supported
So alex's answer was correct - it just does not work (yet?) on MS Windows. (Alex please correct me in the comments if I am wrong)

If I understood correctly, you are looking for two things:
A repository hook that will automatically update the production repo to the latest version whenever someone pushes to it. This is simple: You're looking for the answer to this question.
If you can rely on your co-workers to always go through the pull-commit-push process, you're done. If that's not the case, you need a way to prevent people from modifying the production files in place and never committing them.
Unfortunately, I don't think you can selectively withhold write permissions to the checked-out files (but not to the repo) on an SMB share. But you could discourage direct modification by making the location of the files less obvious. Perhaps you could direct people to a second repository, configured so that everything pushed to it is immediately pushed on to the production repository. This repo need not have a checked-out version of the files at all (create it with hg clone -U, or do an hg update -r 0 afterwards), eliminating the temptation to bypass mercurial.

What prevents you from mount your Samba share and run hg init there? You don't need mercurial server (hg serve or more sophisticated things) to perform push/pull operations.

Mercurial and online sharing - how to proceed

A noob question... i think
I use Mercurial for my project on my laptop. How do i submit the project to an online server like codeplex?
I'm using tortoisehg and i cant find the upload interface for submit the project online...

From the command line, the command is:
hg push <url>
to push changes a remote repository.
In TortoiseHg, this is accessed through the "Synchronize" function, which seems to show up if you right-click in a Windows Explorer window but not on any file. It's also available in the workbench; the icon is 2 arrows pointing in a circle.

For these things, I find the best way to go is to use the command line interface - TortoiseHG is OK if you need to perform some common operations from the file browser, and it's a nice tool to visualize some aspects of your repository, but it doesn't implement all of mercurial's features in full detail, and it renames and bundles some operations for no apparent reason.
I don't know how things work at codeplex, but I assume it is similar to bitbucket or github, in which case here's what you'd do:
Create an empty repository on the remote end (codeplex / bitbucket / ...).
Find the remote repository's URL - for bitbucket, it is https://bitbucket.org/yourname/project, or ssh://hg#bitbucket.org/yourname/project.
From your local repository, commit all pending changes, then issue the command: hg push {remote_url}, where {remote_url} is the URL of the remote repository. This will push all committed changes from your local repository to the remote repository.
Since the remote's head revision (an empty project) is the same as the first revision in your local copy (because all hg repositories start out empty), mercurial should consider the two repositories related and accept the push.
For an introductory guide to command-line mercurial, I recommend http://hginit.com/

Using Mercurial via a USB flash drive

In short:
How can I use Hg to synchronize repositories between two computers using a flash drive as intermediary?
With more detail:
I often develop code on computers that aren't networked in any way, and I transfer files between these machines using a USB flash drive. Now I would like to develop some software across these machines using Hg repositories on each machine that I can frequently sync-up using the flash drive transfer mechanism.
I'm slightly familiar with Hg, as I use it in the most simple way possible for versioning only my own work on independent machines, but am uncertain as to exactly what I should do to use it to synchronize repositories between two computers using a flash drive as intermediary. Maybe, for example, I need to create a temporary repository on the flash drive (using “clone”) from which I then sync to (using “push” and “pull”), and do this by A→flash, flash→B, B→flash, flash→A? The more specificity in your answer regarding the sequence of actions and commands, the more useful to me.
Finally, how do I get this process started? Do I need to do something so Hg knows these are all part of one code base? For example, each of my current repositories on the different computers was created independently from a time before I started using Hg, and although all the code is similar, independent changes have been made to each, and the repositories know nothing about each other. If what I need to do with this is different than what I need to do for the ongoing case once I have everything unified, spelling this process out for me as well would also help.
In case it's important, these machines can be running any of Windows, Mac, or Linux, and my versions of Mercurial are slightly different on each machine (though the Mercurial versions could be unified if needed).

What you have described above in terms of using the flash drive as an intermediate storage location should work. My process would be:
initial setup
create repo on computer A (using hg init)
clone the repo from computer A to flash drive
hg clone C:/path/to/repo/A X:/path/to/flash/drive/repo
clone the repo from flash drive to computer B
hg clone X:/path/to/flash/drive/repo C:/path/to/repo/B
working process
edit/commit to repo on computer A
push from computer A to flash drive
hg push X:/path/to/flash/drive/repo
pull from flash drive to computer B
hg pull X:/path/to/flash/drive/repo
edit/commit repo on computer B
push from computer B to flash drive (same commands as above)
pull from flash drive to computer A (same commands as above)
Finally, how do I get this process
started? Do I need to do something so
Hg knows these are all part of one
code base?
Mercurial knows if two arbitrary repositories have a common ancestor by looking at the SHA1 hash keys of the commits in each repo. In other words, assuming both repos have at least one common hash key in their histories, Mercurial will attempt to merge them. In your specific case, where both repos are initially un-versioned, Mercurial will need some help. The best thing to do would be to get to a place where both repos are identical and then perform your hg init. Mercurial should handle sharing from this point on.

When working offline on different machines. It is better to use the bundle command that comes with Mercurial. So echoing what dls wrote but a slight change process.
Initial setup as mentioned by dls.
or
Go to your Mercurial repository top directory
Create bundle: hg bundle --base null ../project.hg
Copy the project.hg file to your other computer
Create a directory there
Make it an Mercurial repository : hg init
Incorporate the bundle: hg pull <path/project.hg>
hg update
Check hg log, both the repository will show same base revisions and tip
Workflow using bundle
I use a slightly different workflow. I keep these repositories as distinct repositories.
I mention them as repo1 and repo2.
Suppose that the current tip of repo1 is 4f45839f613c.
You make changes and commit them in repo1
Create a bundle of the changes :
Command : This bundle contains all changes since the specified base version.
hg bundle --base 4f45839f613c changes.bundle
Take it to repo2 by copying the bundle.
You can simply pull the bundle to repo2 :
Command :
hg pull changes.bundle
If the bundle contains changes that are already present in repo2, then these will be ignored when pulling. As long as the bundle doesn't grow to large, this allows to use the bundle command with the same --base revision again and again to create bundles including further changes.
About bundles: these are (very well) compressed.
creates a (compressed) backup of the repository
hg bundle --base null backup.bundle
[Edit : Adding some links on this topic]
http://blog.experimentalworks.net/2010/09/review-remote-changes-offline-in-mercurial/
https://www.mercurial-scm.org/wiki/Bundle
[Edit: What I think is advantage of using bundle]
Bundles can be created offline, copied or sent via mail. Using push to repo on flash drive, requires it to be connected. Bundles are easier since it does not maintain that the two repo from which you push and pull have to be available at the same time.
Apart from that, bundles can also be of two types : Changesets and Incremental. Changeset bundles are complete standalone bundles. You can also use bundles for backup as a single file.

Mercurial central server file discrepancy (using 'diff to local')

Newbie alert!
OK, I have a working central Mercurial repository that I've been working with for several weeks.
Everything has been great until I hit a really bizarre problem: my central server doesn't seem to be synced to itself? I only have one file that seems to be out-of-sync right now, but I really need to know how this happened to prevent it from happening in the future.
Scenario:
1) created Mercurial repository on server using an existing project directory. The directory contained the file 'mypage.aspx'.
2) On my workstation, I cloned the central repository
3) I made an edit to mypage.aspx
4) hg commit, then hg push from my workstation to the central server
5) now if I look at mypage.aspx on the server's repository using TortoiseHg's repository explorer, I see the change history for mypage.aspx -- an initial check-in and one edit. However, when I select 'Diff to local', it shows the current version on the server's disk is the original version, not the edited version!
I have not experimented with branching at all yet, so I'm sure I'm not getting a branch problem.
'hg status' on the server or client returns no pending changes.
If I create a clone of the server's repository to a new location, I see the same change history as I would expect, but the file on disk doesn't contain my edit.
So, to recap:
Central repository = original file, but shows change in revision history (bad)
Local repository 'A' = updated file, shows change in revision history (good)
Local repository 'B' = original file, but shows change in revision history (bad)
Help please!
Thanks,
David

Sounds like you're looking at the working copy on the central repo. Just like your local repo, there is a working copy. Running hg update (or "Update to branch tip" in TortoiseHg) should sync the central repo's working copy to the latest.

This is normal, as the repository on the server has two components: the actual repository of changesets (in the .hg subdirectory), and a working copy. When you push changes from the local repository on your workstation to the server repository, it updates the repository files on the server (in the .hg subdir), but it does not change the working copy files (outside the .hg subdir): this would require an explicit update operation on the server to change the working copy.
If the server repository is only being used as a repository, and you do all your actual work in clones, then you're probably better off using a "bare repository" on the server (just delete the working copy files and just keep only .hg subdirectory itself).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008