This might be a noob question. but I'm really torned between adding documents to my repository, in this case Mercurial.
by documents i meant, files that doesn't really go into your program. like PSD, doc, xls.
what's the best way to handle those files, or how do you handle your documents.
Take a look at the Largefiles extension that shipped with Mercurial 2.0 (with bugfixes since). It's designed to treat files that are binary and update rarely in a different, more efficient way.
Basically it stores those files without trying to compute diffs between versions, and anybody cloning the repo just gets the versions they need, and not all the history. This leads to faster cloning / pulling, but updates may need a connection to the remote repository to read versions of files into the local cache.
I toss them in my repository. It's nice to track changes of them and see old revisions anyway. I can see old revisions of a design document or see what the previous art was for an asset (maybe a graphic designer removed the alpha channel and he/she wasn't supposed to). Throw it in there. If it doesn't change, it's not taking up any more space with a good source control system than storing it outside of source control.
Related
My requirement is alike with Dropbox. I was told to research Mercurial, but it stores all history at every repository, and diff file at local, transfer reversion only.
But I must use remote-diff (rsync). I think replacing Mercurial's diff algorithm with rsync is not easy,it must change most of the code of Mercurial.
And if I implement this only based on librsync, too many things left me to write a reliable “dropbox”. whether syncML will be helpful? but because I only sync files, whether it's also too complex?
Because my synchronization is not distributed, maybe SVN will suit my requirement more? But SVN is also not using rsync.
I understand that in mercurial you can never remove a history for a file unless you do something like this. Is there any way to disable history for certain files from ever being created?. If any other repository system is capable of doing that, please put that down as well.
Why would I want that? Well, in our build system, new binaries are constantly being committed which the non-programmers can use to run the program without compiling every time (the compilation is done by the build system). Each time new binaries are committed, the old ones are useless as far as we are concerned. It is unnecessarily taking up space. If the new binary messes up by any chance, we can always revert back to older source and rebuild (assuming there is a way to disable history for specific files).
As you found out, you cannot do what you want directly in Mercurial.
I suggest you put the binaries somewhere else -- a Subversion subrepo would be a good choice. That way you will only download the latest version of each file on the client, but you will have all versions on your server (where it should be easy to add more disk space).
I'm working on a collaborative scientific project that is made up by a handful of Python scripts (1M max) and a relatively large dataset (1.5 GB). The datasets are tightly linked to the python scripts since the datasets themselves are the science and the scripts are a simple interface to them.
I'm using Mercurial as my source control tool, but I am not clear on a good mechanism to define the repository. Logistically it makes sense to bundle these together so that by cloning the repository you'd get the entire package. On the other hand, I'm concerned about the source control tool dealing with large amounts of data.
Is there a clean mechanism to handle this?
If the data files change rarely and you normally need all of them anyway, then just add them to Mercurial and be done with it. All your clones will be 1.5 GB, but that is just the way it has to be with that amount of data.
if the data is binary data and changed often, then you might try to avoid downloading all the old data. One way to do this is to use a Subversion subrepository. You will have a .hgsub file with
data = [svn]http://svn.some.edu/me/ourdata
which tells Mercurial to make a svn checkout from the right-hand side URL and put the Subversion working copy into your Mercurial clone as data. Mercurial will maintain an additional file for you called .hgsubstate, in which it records the SVN revision number to checkout for any given Mercurial changeset. By using Subversion like this, you only end up with the latest version of the data on your machine, but Mercurial will know how to get older versions of the data when needed. Please see this guide to subrepositories if you go down this route.
There is an article on the official wiki about large binary files. But the proposition of #MartinGeisler is a really nice new alternative.
My first inclination is to separate the python scripts out into their own repository, but I really need more domain information to make the "right" call.
On the one hand, if new datasets will be created then you would want a core set of tools to be able to handle all of them, right? But I can also see how new datasets may introduce cases that the scripts may not have previously handled... although it seems like in an ideal world you would want scripts that are written in a general way so they can handle future data and existing datasets??
I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.
I'm using Mercurial with TortoiseHg. Each developer has their own repositories, and there's one central repository on the server for synchronizing our changes. (This will sound lame, but we're using it to manage the source for a legacy VB6 project. Nothing we can do about that...)
As has been pointed out elsewhere, there is a big problem in VB6 with merging the .frx (form resources) files. So code changes seem to merge fine, but if two developers both make changes at the same time in the form design view, we can't merge.
I'm ok with disallowing concurrent edits, but of course the whole point of Mercurial is that it's distributed so there is no option to force a file to be locked before editing. I don't believe there's a Mercurial solution for this, so I'm wondering: other developers who are using Mercurial for version control, do you have some 3rd party tool that assists with locking files for editing in the cases where it's necessary? Did we make a mistake using Mercurial instead of something like SVN?
Heard of some people using a standalone lock-server (this one in particular).
This is from Bryan O'Sullivan's book on Mercurial:
There is no single revision control
tool that is best in all situations.
As an example, Subversion is a good
choice for working with frequently
edited binary files, due to its
centralised nature and support for
file locking.