Which damage could be caused when killing a Mercurial process? Might the working directory be left in an undefined state? Might the .hg/-admin area get corrupted in some way?
Are files written in some kind of "atomic" operation by Mercurial? (Working tree files, .hg/-internal files, configurations files, and so on ...)
It is safe. All writes are done such that the files on disk are always consistent. Either the transaction writes out entirely or it automatically rolls back when next found. Not just consistent when closed, but always consistent -- whether it stops via SIGKILL or power failure.
You can sort of work out how that's accomplished here: https://www.mercurial-scm.org/wiki/FileFormats . This is possible in part because all file writes are append only. Before anything is written file lengths are noted, and if things are found to be in an inconsistent state they automatically truncate the files back to the checkpointed, known good lengths.
Using something like dropbox (which syncs files in whatever order it likes) can throw this out the window, which is why it's best to let hg be the only process writing to repositories.
Related
I have a utility script that will configure the local developer instance to mimic a specific production one - copy over the correct connection strings, client files, etc - thereby making it easy to investigate certain issues.
Because this is all done locally this changes files in the working directory. The first thing the script does therefore is print out a big fat warning to caution developers not to commit and revert changes when they are done. However it is possible to forget or miss this warning as cruft from my build system scrolls by and I wonder if its possible to go further.
Is it possible to do something to the hg repo so that a commit will be rejected and the developer would have to revert first?
I realize there are some variables to the question as far as revert what and commit what but given that I'm not even sure that this is possible, I am willing to take close-with-caveats for an answer.
This is exactly what hooks are for. You would need a precommit hook on every repo or possibly the pretxnchangegroup.
By creating a precommit hook (script) that checks for a specific file or a specific change, you can fail the commit and print out whatever warning you need. The return value of the script indicates to mercurial if the transaction is valid or not.
Use the following generic sample to check for the presence of certain files that would be committed, before accepting or rejecting the changesets.
I use the hunk-by-hunk or hunk selection approach to committing: instead of commuting all changes I made to a file, I commit related parts. E.g. I wrote a function and a test, compiled to ensure it works and then commit the function and the test separately. For this I use built-in functionality in tortoiseHg and RecordExtention when in the console.
Now I have two edits separated by only one unchanged line, thus falling in hg's tolerance of one hunk. I want to commit only the former for now. How?
The record extension doesn't let you split hunks further, but the less-standard CRecord extension does.
Just to put it out there, but what you're doing is usually considered bad practice because it guarantees that you haven't run the unit tests on the files as they're being committed. That, of course, doesn't apply in all environments.
If the reason you're leaving some parts uncommitted is because they're local-only changes you always in in place (passwords, paths, etc.) they're a good candidate for a Mercurial Queues "patch". Then you'd be able to 'pop' them off, commit the whole file, and then 'push' them back on.
I understand that in mercurial you can never remove a history for a file unless you do something like this. Is there any way to disable history for certain files from ever being created?. If any other repository system is capable of doing that, please put that down as well.
Why would I want that? Well, in our build system, new binaries are constantly being committed which the non-programmers can use to run the program without compiling every time (the compilation is done by the build system). Each time new binaries are committed, the old ones are useless as far as we are concerned. It is unnecessarily taking up space. If the new binary messes up by any chance, we can always revert back to older source and rebuild (assuming there is a way to disable history for specific files).
As you found out, you cannot do what you want directly in Mercurial.
I suggest you put the binaries somewhere else -- a Subversion subrepo would be a good choice. That way you will only download the latest version of each file on the client, but you will have all versions on your server (where it should be easy to add more disk space).
From Mercurial wiki - GSoC Ideas 2010:
Project Ideas
Lightweight copies/renames
(very difficult - a successful student
will become an expert in Mercurial's
storage format and transmission
protocol)
Copies and renames currently are not
too efficient. Mercurial copies the
copied/renamed source file to the new
initial revision of the target file in
its internal history store. For
renames, this is especially
counter-intuitive, as renaming a large
file grows the store by the file's
size. It would be better if Mercurial
had some way of referring to the
existing revision from the new file,
while preserving backwards
compatbility and bounded I/O
guarantees for retrieving revisions.
See issue883 for discussion.
There's an mq from an old attempt at
this located here.
Sorry if this is an obvious question (I'm not good at English and programming). I'm wondering, what does the "Lightweight copies" mean?
Is it mean: when this feature is implemented, multiple files with same content (same hash value different file names) will be stored only once in repository (just like Git)?
Update:
Thanks everyone for your answers. One of Mercurial's developers - tonfa also answered this question in a comment of this answer:
caveman: When light-weight copies are
implemented, will two files with same
content (same hash value different
names) store only once in repository
(just like Git)?
tonfa: no, this feature isn't planned
(it would break other optimizations to
minimize disk access)
Right now, when you copy a file, a new file is created in the repository that contains a compressed snapshot of the file you just copied. The idea would be to set it up so the copy references the old file somehow and then has revlog entries based on that instead of having to have its own snapshot to base the revlog entries off of.
This will not be like how git works. Changing Mercurial to work that way would be really interesting, and not the easiest proposition.
I'd better say that copied/renamed file wouldn't store as twice space more as it now, but will just point to the same revision.
Not sure that this will be true for the files added separately with the same content. According to the description they will be treated as completely independent files and will occupy 2x space.
I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.