How to use Mercurial's LargeFiles extension? - mercurial

I use Mercurial for game development, and I'm trying to use the LargeFiles extension included in Mercurial 2.0 to keep track of large binary assets. Unfortunately there isn't a whole lot of documentation on the extension, so I'm not sure how people are expected to use it.
For example, is there any way to safely clean out the .hg/largefiles directory? If I'm on the tip revision, and expect to always have internet access, then I don't need the old versions of largefiles cluttering up the repository, since that's the whole point of using the LargeFiles extension.
Also, how do I have more fine-grained control over where the largefile store is? I can only assume that it's created somewhere on the computer that ran hg init, but I have no idea about the details.
Thanks!

I don't have any guidance on how to safely clean out the .hg/largefiles directory.
Largefiles Store
The largefiles store seems to be stored, by default, at the one of following locations:
Windows: C:\Users\Username\AppData\Local\largefiles
OSX: /Users/username/Library/Caches/largefiles
Linux: (This is my best guess)
/home/username/largefiles
or /home/username/.cache/largefiles
User Configured:
This, however, can be changed in the global settings file using the usercache setting as follows:
[largefiles]
usercache = c:\path\to\largefiles\cache\
Note: This is not documented yet. This makes me wonder if it is subject to change.
Sources:
Largefiles Extension Documentation
User cache paths - https://www.mercurial-scm.org/repo/hg/file/41453d55b481/hgext/largefiles/lfutil.py (lines 84-103)
Undocumented largefiles.usercache setting - https://bz.mercurial-scm.org/show_bug.cgi?id=3088

I'm just posting this for anyone else coming into the thread from a search.
There's currently an issue using the largefiles extension in the mercurial python module when hosted via IIS. See this post if you're encountering issues pushing large changesets (or large files) to IIS via TortoiseHg.
The problem ultimlately turns out to be a bug in SSL processing introduced in Python 2.7.3 (probably explaining why there are so many unresolve posts of people looking for problems with Mercurial). Rolling back to Python 2.7.2 let me get a little further ahead (blocked at 30Mb pushes instead of 15Mb), but to properly solve the problem I had to install the IISCrypto utility to completely disable transfers over SSLv2.

Related

Mercurial & Web Development: How to handle publishing/deployment?

At work we're moving from no SCM to Mercurial. It's a bit of a learning curve, but after messing with it for two days I definitely feel more comfortable with it.
I still have one big, unresolved question though in my mind: Once code is finished, how do we handle the actual deployment?
Should we be running a copy of Mercurial on the production (live) server? Or should we set rsync or something up to sync from the repo to the web directory? What's the best practice here?
If we do go w/ just pointing apache to the repo, I assume this is okay as long as we're careful not to hg update to a different, non-stable branch? That still seems a little dangerous to me though. Is there some way to force it to only switch to certain builds?
Or is pointing apache to the repo just a terrible idea and I should be doing something else instead?
On a related topic, I've also heard some talk about putting any upgrade scripts (such as schema changes for MySQL) under version control so they can be ran when the version is deployed. But how would that even work as part of the workflow? I wouldn't want to keep it w/ everything else, because it's a temporary one-time use script...
Thanks for any advice you guys can give.
I recently discovered the hg archive command, so I think we'll go w/ this instead. I've written a bash script that changes to the head of the 'production' branch then archives it to a predetermined destination. Seems to work.
I'd still appreciate any feedback you guys have as to whether this is a good idea or not.
I think pointing apache to the repo is definitely a bad idea, hg archive is ok if all you want is to take a snapshot of the dev files.
I find my development source files and a deployed application (even for a web app that doesn't need compiling) are usually very different, the latter being derived from a subset of the former.
I tend to use a shell script or a even a Makefile to "build" a deployed application in a subdirectory of the development directory, this could just be creating a directory tree and copying necessary files or could include compressing scripts etc.
This way you have to make a conscious decision whether or not to include a file in the deployed version, thus helping prevent accidentally leaving development utility files in an online application that could cause a security risk.
The only part mercurial plays is, for a major release I create a new named branch (eg: 1.5), development continues on the default branch. Subsequent bug fixes or patches can be transplanted to the release branch if necessary and if a bug fix release is made I tag the release branch with the new version (eg: 1.5.1).

Disable file history for a particular set of files in Mercurial

I understand that in mercurial you can never remove a history for a file unless you do something like this. Is there any way to disable history for certain files from ever being created?. If any other repository system is capable of doing that, please put that down as well.
Why would I want that? Well, in our build system, new binaries are constantly being committed which the non-programmers can use to run the program without compiling every time (the compilation is done by the build system). Each time new binaries are committed, the old ones are useless as far as we are concerned. It is unnecessarily taking up space. If the new binary messes up by any chance, we can always revert back to older source and rebuild (assuming there is a way to disable history for specific files).
As you found out, you cannot do what you want directly in Mercurial.
I suggest you put the binaries somewhere else -- a Subversion subrepo would be a good choice. That way you will only download the latest version of each file on the client, but you will have all versions on your server (where it should be easy to add more disk space).

Mercurial (Hg) and Binary Files

I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.

How can I add complete binaries to a Mercurial patch?

I want to use Mercurial to capture changes made to the vanilla installation of a piece of software we use. Everytime we upgrade the software, we need to manually edit the various configuration files and add 3rd party libraries that we use in the current version of the software. Creating patches for the configuration files changes are fine, but how do I add 3rd party libraries (binaries) to a Mercurial patch? Is it even possible?
If you were to try to get the patch for the 7th revision...
hg export --git -r 7 -o 7.patch
Yes, the mq extension can handle binary data just as well as textual data. It will use Git's extended patch format to save the binary data. This is transparently handled for you when you refresh a patch with modified binary files.
Whether or not this is a good idea is another question — VonC is correct when he writes that this is not the normal use case for a version control system.
Even if it may be possible, it is not advisable! (for Mercurial or any other VCS)
A Version Control System is not made to record binaries (mainly because it quickly grows out of proportion, take a all lot of disk space, and has no efficient way to be stored in delta)
You should record the configuration need for each version you tag.
That can be a text file, or a maven pom for instance. Anything that allow an external mechanism (like maven) to download and locally store for you the right dependencies.
That means your patch will include changes to that text file (pom for instance), as well as the rest of the code modifications.

Mercurial Pull Error

I am new to the dvcs world. My company uses perforce and I'm not a fan so I thought I'd try to use mercurial as a front end. I set it up on a windows machine with TortiseHG, enabled the Perfarce extension, did a small checkout (limiting the target revision) and pulled for the rest. This seemed to be more robust than clone alone.
This seems to be working fairly well as I've been able to get up to change 8700 or so.
My problem is with an error in the perforce repo. During the hg pull command it hits an error abort: file path/to/file.pl missing in p4 workspace and rolls back the transaction.
Is there anyway to bypass or skip that file and force it to continue since this is not a file I care about.
Update:
According to the admin, the file in question was a symlink. Would that cause this kind of problem? If so, how do I/admin fix or bypass it?
Is it possible to check out just a part of a perforce repo rather than the whole thing?
The issue is with symlinks that are not supported out on Windows.
This is fixed in the current version of Perfarce, which should appear in TortoiseHG soon.
I suggest that you have someone check that the Perforce repository is actually in a sane state. There might be something broken which you triggered and the data of your company might be at stake, so someone should definitely look what is causing the problem.