I just start using Mercurial yesterday (I don't have much programming experiences). I noticed, if I rename a 200MB file, ideally the repository size should not change, but I found it increased 200MB.
Is this a bug/weakness of Mercurial? Any chance it could be fixed/improved in future?
Update:
I have just tried TortoiseGit 64bit version on Windows 7 64bit. It didn't create duplicate contents when renaming files. But it seems once I renamed a file, its history was lost.
Update 2:
See tonfa's comment below. From Mercurial wiki - GSoC Ideas 2010:
Project Ideas
Lightweight copies/renames
(very difficult - a successful student
will become an expert in Mercurial's
storage format and transmission
protocol)
Copies and renames currently are not
too efficient. Mercurial copies the
copied/renamed source file to the new
initial revision of the target file in
its internal history store. For
renames, this is especially
counter-intuitive, as renaming a large
file grows the store by the file's
size. It would be better if Mercurial
had some way of referring to the
existing revision from the new file,
while preserving backwards
compatbility and bounded I/O
guarantees for retrieving revisions.
See issue883 for discussion.
There's an mq from an old attempt at
this located here.
Contact: mpm, tonfa, cyanite
No, it is not a bug. Renaming in mercurial causes removing file in old place and creating it in new one (with keeping the reference though, for merge and logging purposes).
So at least for now you can do nothing.
Related
This might be a noob question. but I'm really torned between adding documents to my repository, in this case Mercurial.
by documents i meant, files that doesn't really go into your program. like PSD, doc, xls.
what's the best way to handle those files, or how do you handle your documents.
Take a look at the Largefiles extension that shipped with Mercurial 2.0 (with bugfixes since). It's designed to treat files that are binary and update rarely in a different, more efficient way.
Basically it stores those files without trying to compute diffs between versions, and anybody cloning the repo just gets the versions they need, and not all the history. This leads to faster cloning / pulling, but updates may need a connection to the remote repository to read versions of files into the local cache.
I toss them in my repository. It's nice to track changes of them and see old revisions anyway. I can see old revisions of a design document or see what the previous art was for an asset (maybe a graphic designer removed the alpha channel and he/she wasn't supposed to). Throw it in there. If it doesn't change, it's not taking up any more space with a good source control system than storing it outside of source control.
I understand that in mercurial you can never remove a history for a file unless you do something like this. Is there any way to disable history for certain files from ever being created?. If any other repository system is capable of doing that, please put that down as well.
Why would I want that? Well, in our build system, new binaries are constantly being committed which the non-programmers can use to run the program without compiling every time (the compilation is done by the build system). Each time new binaries are committed, the old ones are useless as far as we are concerned. It is unnecessarily taking up space. If the new binary messes up by any chance, we can always revert back to older source and rebuild (assuming there is a way to disable history for specific files).
As you found out, you cannot do what you want directly in Mercurial.
I suggest you put the binaries somewhere else -- a Subversion subrepo would be a good choice. That way you will only download the latest version of each file on the client, but you will have all versions on your server (where it should be easy to add more disk space).
From Mercurial wiki - GSoC Ideas 2010:
Project Ideas
Lightweight copies/renames
(very difficult - a successful student
will become an expert in Mercurial's
storage format and transmission
protocol)
Copies and renames currently are not
too efficient. Mercurial copies the
copied/renamed source file to the new
initial revision of the target file in
its internal history store. For
renames, this is especially
counter-intuitive, as renaming a large
file grows the store by the file's
size. It would be better if Mercurial
had some way of referring to the
existing revision from the new file,
while preserving backwards
compatbility and bounded I/O
guarantees for retrieving revisions.
See issue883 for discussion.
There's an mq from an old attempt at
this located here.
Sorry if this is an obvious question (I'm not good at English and programming). I'm wondering, what does the "Lightweight copies" mean?
Is it mean: when this feature is implemented, multiple files with same content (same hash value different file names) will be stored only once in repository (just like Git)?
Update:
Thanks everyone for your answers. One of Mercurial's developers - tonfa also answered this question in a comment of this answer:
caveman: When light-weight copies are
implemented, will two files with same
content (same hash value different
names) store only once in repository
(just like Git)?
tonfa: no, this feature isn't planned
(it would break other optimizations to
minimize disk access)
Right now, when you copy a file, a new file is created in the repository that contains a compressed snapshot of the file you just copied. The idea would be to set it up so the copy references the old file somehow and then has revlog entries based on that instead of having to have its own snapshot to base the revlog entries off of.
This will not be like how git works. Changing Mercurial to work that way would be really interesting, and not the easiest proposition.
I'd better say that copied/renamed file wouldn't store as twice space more as it now, but will just point to the same revision.
Not sure that this will be true for the files added separately with the same content. According to the description they will be treated as completely independent files and will occupy 2x space.
My team is switching to Mercurial. Our projects all have a config file (web.config or app.config, and a few bat files as well - we are a C# shop). These files need to be part of the repository. When a developer clones the repository, local changes are needed to their config files to get them working. For example, a project's config file may need a connection string to the developer's database, or other environment-specific info. We don't want these changes ending up in the repository. And from time to time we do make changes to these configs that do need to get into the repository and distributed to the team and eventually the customer.
What is the easiest way for us to configure or use Mercurial so that these files are not getting committed by accident? I would like to be forced to make an explicit commit of such files, yet merges from the repo would automatically come down in updates.
This has to be a problem someone else has faced, but as Mercurial newbies we are all at a loss for the best solution.
Edit:
A similar question that may share some common solutions, but is not the same as this question, can be found at: Conditional Mercurial Ignore File
I am including this in case that other question might provide the answer you are looking for.
The typical way to handle this is to store templates for the configuration files in your repositor, and add the actual configuration files to the ignore list in Mercurial.
This way, you have pristine, unmodified, copies of each configuration files available at all times, even for new developers who clone from scratch, but in order to make the configuration files usable, you need to make a local copy of it to the actual configuration file name, and modify the file. You could also use compare/merge programs, such as Beyond Compare, to compare a new version of the template file with your local copy of an older version, to see what changed, and add in the missing bits.
If you need to hard prevent committing the actual configuration files, you need a pre-commit or pre-push hook that does this.
In your .hg/hgrc file do this:
[defaults]
commit = -X Projectname/web.config
(assuming "ProjectName" is the project subdir)
Edit:
Also, if you're using Tortoise HG - add this as well:
[tortoisehg]
ciexclude = Projectname/Web.config,Projectname/App_Data/DBFile.mdf
(by the way mind the FORWARD slash in folder-path! Even on Windows!)
I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.