HowTo merge old mercurial's commits into one - mercurial

I'm interesting in reducing the size of .hg folder in my project. I have size of .hg folder near 180Mb, but the real sources size of project is near 6Mb. So, I need some ways to downscale size of .hg folder.
In Mercurial history, there are a lot of bin, image files.
I searched some ways to solve this problem, but this is not exactly what I need. I needn't remove specific file, I need to remove all commits, so, in example:
0....1360 commits. With in more than 150Mb, and 6Mb real sources. I need similar to use 0...(1~50) commits and near 10Mb of real sources. This is cloned repositary, but I want to save all history in my server. So, the cloned repo with 10 commits need's to be related with server repo with 1360 commits. Any ways?

It sounds like you have both source code and your binaries committed to the repository. One option is to create a second repository for only "published/compiled" code and remove anything compiled from the first repository. Use the new repository for deploying updated code to your server. This should come with a smaller .hg folder on the server.
Either way, you cannot "remove" history without actually losing it with a DVCS. You have the option of completely deleting the .hg folder, but if you want to use Mercurial to update that folder and keep things in sync with what the latest changes are in the repository, you'll need to keep that folder around.

Related

Is there any side effect if I clone a repo and rename it?

I have central mercurial repository server. I cloned repoA on my local system
initiated new repoB on central server. cloned repob to local. copied everything from repoA to repoB, commit and pushed to repoB (central server)
now i have all the changeset history from repoA on this new repoB
there was a need to do so as there were two application code on same repoA, to separate it i did the above experiment. and it is working.
my question is by doing so is there any side effects , or is there a better way to do it (recommended way ) please suggest, thank you !
When you clone a repository to your local PC, the repository lives in a folder. That name of that folder is typically how people refer to the "name of the clone" or the "name of the repository".
Other than that, the folder name itself has very little significance and is not even properly part of the Mercurial repository.
It sounds like you did several other steps, but basically if you renamed repository A to B it won't make much difference (but see notes below).
You do not need to use hg clone to clone a repository. You can literally just copy the entire repository folder and the copy will work just fine. The one difference that I am aware of when you use clone vs. operating system file copy is that the clone will point back to the repo you cloned from (for use in push/pull operations). The copy would point back to the original source. (See notes below about some related effects).
One situation where you might cause some problems by renaming the repository folder is if you have have cloned FROM it. Example: you have local repo A. You clone A to B. Now internal to the configuration data in B is a reference to the folder path including A. If you rename A to A1 then that path is obviously broken.
In such a situation you can easily edit the B/.hg/hgrc file and modify the line starting with default= to correct the path.
Based on your question it sounded like you copied a bunch of stuff from one repo to another. Presumably this also included the .hg folder. Generally speaking I recommend avoiding the contents of that folder, and always approach it with caution.
Although technically some of it is human-readable it is simpler & safer to treat it as a black box, or you risk corrupting your repository. There are occasional exceptions (like hgrc) but they are few & far between.
Of course if you are just trying to learn how it works then by all means try things & see what happens! One of the great Mercurial features is the ability to copy a repo, mess around with it, and throw it away when done.

"factorizing" a mercurial repository on kiln

Summarized questions:
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
Detailed Questions:
I have begun the process of creating multiple repositories representing several projects that have shared components, and that is going well, thanks to SO and some helpful answers to my question here
As I move on to adding a second project I notice there are a few files in my projects that are duplicated, and are essentially the same thing, with enough similarity to warrant taking them out of a main project repository and creating a new subrepository so they can be
used by any new projects I begin, and
removed from other existing repositories, since they are identical.
I am assuming the best way is to simply create a new repository, move the files across on the local file system, push both repositories, and then create a .hgsub file and proceeed as in the answer to my earlier question. This would obviously then shift the files concerned to a subfolder in the local file system under each main project, which i can live with, but it does raise the hypothetical question - is it possible to have a list of files in a repository that are effectively part of a sub repository but reside alongside other files (i.e. not in a sub folder).
If I wanted to (for example) have a "acme.h" file in each project that is part of another repository could I do this? as it happens, I don't need to do this at this point in time, and in my current situation it would be better from a design point of view to have the files I need to "refactor" into another repository in their own subfolder, however that might not always be the case. I use refactor in quotes here, as strictly speaking it's more about refactoring duplicated files that is refactoring code - however the same principle applies.
hopefully my questions are succinct enough to be answered without too much more explanation.
Thanks for summary, makes it much easier to answer!
What is the simplest (and best) way to shift a group of files from an existing repository to a new sub repository, so those files can be integrated with other parent repositories, some of which may not yet exist?
You can use the convert extension to extract a directory from an existing Mercurial repository. You'll want to use the --filemap flag and in the filemap you include the directory you want and rename it to the root. See hg help convert for more info.
After you get a smaller repository with the
Do files in subrepositories need to be in discrete folders, or can they exist alongside other files?
They must be in their own folders. This is simply because that's how a repository looks like in Mercurial, Git, Subversion, ... When you're dealing with subrepositories, then Mercurial is not tracking the files inside the subrepo: it's just asking some (other) system to make a checkout of repository foo at some location.
So when your .hgsub file has
foo = foo
bar = [git]bar
baz = [svn]baz
then Mercurial will notice this on hg update and run
hg clone default-path-of-this-repo/foo foo
git clone default-path-of-this-repo/bar bar
svn checkout default-path-of-this-repo/baz baz
for your. This explains why subrepostories are directories in the outer repository: that's simply what a clone/checkout looks like these days.
As you can see, subrepositories can be of different types. It's conceivable that someone could add a RCS subrepository type for tracking individual files. They would then not have to live in a directory.

Are deleted files still downloaded with an hg clone?

In our Mercurial repo we added a really big file (and did an hg push), then deleted the big file (and did another push).
Now if someone does an hg clone will they still pull down that big file? I know it won't appear in their working directory as it was deleted, but will the file still be pulled down and stored in Mercurial internal storage?
I'd like to ensure people don't have to pull down the file. I've learned that really big files should be stored outside of Mercurial, so I deleted the file. But I was wondering if people will still be pulling down the big file - in which case I guess I will recreate the repository from scratch.
Of course it will still be in the repository.
You can always update back to older revisions, and if you update back to the revision you got when you committed the file, it'll be there in all its glory.
There are two ways to mitigate this (when you're committing, not now):
One of the big-files extensions, these essentially add big files to a secondary repository and link the two, so that if you update to a revision where the file doesn't exist, and you don't already have it, it will not get updated. ie. it's more a "on-demand" style of pulling
If the file never changes, keep it available on the network and just create some kind of link to it instead of a full copy
Right now, you got four options:
Strip away the changeset that added the file, and all the changesets that came after it. You can do that using the Mercurial Queues extension. Note that you need to do this stripping in all clones. If just one of your users push back the repository that has that file in its history to the central clone, you have the changesets back.
Rebuild the repository from scratch manually
Using the hg convert command and some filtering, the --filemap option can be used for this
Leave it as is. How big is it, will be much of a problem?
Note that rebuilding the repository, either manually or through hg convert will invalidate all clones. Anyone trying to push to your new central clone from an old clone will get a message about unrelated repositories. If any of your users are stupi^H^H^H^H^Hnot smart enough to realize that forcing the push is a bad idea, then you will have problems with this approach.
Yes, the file is still in the history. If you want to delete it completely, you need to use Mercurial Queues — see Editing History on Mercurial wiki.
Just keep in mind this breaks clones as revision IDs change.

subrepo, hg clone and symlinks

I'm quite new to mercurial, I've read a lot on this topic but I've been unable to find a clear answer.
The mercurial guide says: "For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory)."
The Repository wiki page says: "All of the files and directories that coexist with the .hg directory in the repository root are said to live in the working directory".
Now, to "link" a subrepo in a main repo I do:
hg init main
cd main
echo subrepo = ../subrepo > .hgsub
hg clone ../subrepo subrepo # (1)
hg add
hg ci -m "initial rev of the main repo"
Does the definition above mean that I'm actually creating a copy of subrepo when I perform (1)?? Or am I creating just a symlink to ../subrepo? According to the output of ls, it is an actual copy. But it sounds so strange to me... If someone could put a bit of light on this subject, I'd appreciate.
First of all, that part of Mercurial, I'm not an expert, but here's what I've understood.
No, you didn't create a link to the whole directory. Instead, files were hardlinked inside it.
This means that space on disk is reserved to keep your directory structure separate, but the files are all identical, because they were just cloned, so they are constructed as links back to the original.
When you start manipulating the repository, through your add or commit (ci) commands, then the hardlinks are broken by Mercurial and separate files are constructed for each, on demand.
Now, this is purely a technical thing, you don't need to know or care about this. If it makes it easier, just think of a clone as a complete copy of the original repository, separate files and all that. The hardlink part is just to save diskspace for the things that are the same.
Since a typical project has many files, and a typical changeset only changes a few files, and a typical reason to clone is that you're going to do a fixed set of changes, hardlinks makes sense since many of the files in the repository directories will be 100% identical to their original for the lifetime of the repository.
For those that aren't, all of that is silently handled by Mercurial for you.
Let us start by looking at what happens when you clone without talking about subrepositories. When you do
$ hg clone A B
then Mercurial will make hard links for the files inside A/.hg/store/data. So if a file called x is tracked, then after the clone you will see that
A/.hg/store/data/x.i
and
B/.hg/store/data/x.i
are hard linked -- this means that the two filenames really refer to the same file. As Lasse points out, this is smart since you might never commit a change to x clone, and so there is no reason to make two different x.i files for the A and B clones. Another advantage is that it is much faster to make a hard link than to copy a file, especially if x.i is very large: the hard link is a constant time operation.
In your example above you are adding a subrepository subrepo to the main repository. A subrepository consist of two things:
the subrepository itself. This what you creates when you do
$ hg clone ../subrepo
the subrepository meta data. This is what you store in the .hgsub file. You must tell Mercurial where you want the subrepository and where Mercurial can clone it from.
You ask if you copy or symlink the repository, and you certainly copied (cloned) it, as you have also confirmed with ls. Afterwards you added some meta data to Mercurial that tells it where it can expect to find the subrepository. This has nothing to do with a symbolic link in the normal filesystem sense, it is just some meta data for Mercurial.

Can one Mercurial repository live inside another Mercurial repository?

Can one hg repo live inside another hg repo on my local file system?
I am pulling down the bitbucket wiki for 'sandbox', and I want to know if this should be placed in repos/sandbox/wiki or repos/sandbox-wiki.
Is the former okay to do?
Edit: See Subrepository.
The short answer is yes, but I can't imagine why you would want to.
In your example, I think you should go with:
repos/sandbox-wiki
[edit] Additionaly:
Yo Dowg, I herd you like repositories.
So we put a repo in your repo so you can version while you version
:-)
Yes and no. Depends on what you want to do. You can create repo 'sandbox/wiki' but files in this inner repos won't be commited in the outer 'sandbox' repo (#Jason is right). If you don't want to, no problem.
Try explicitly adding files from wiki repos in sandox and you'll get the message below. If you just add path to some directory containing an inner repo the files will just be ignored.
From sandox root directoy:
hg add wiki/myfile
abort: path 'wiki/myfile' is inside repo 'wiki'
Mercurial does not allow nested repositories, but there is at least one reason for them:
Imagine that you are working in a project: /MyProject. In this folder you put everything: code, documentation, tests, etc.
You want to backup your work because it is very important, so you create a repository for /MyProject. Then, overtime you use bundles to save the evolution of /MyProject and back up them in a USB flash memory so that you can recover everything just in case your hard drive breaks.
Remember that /MyProject contains everything. And among all those things, there are the main code and some auxiliary projects. You also want to track the progress of an auxiliary project that is in /MyProject/AuxiliaryProject, so you use Mercurial to track its evolution.
Also, you want to have a separate repository for the main code: /MyProject/Main
In this situation you want nested repositories: one big one for being able to back-up everything using bundles and child repositories for managing each subproject.
I think Mercurial should give the user several options when initializing a repository. For example:
- ignore nested repositories
- include nested repositories but ignoring .Hg folders (i.e. act as if there were no nested repositories but do not ignore the information contained in the nested respositories).
- include nested repositories and also include .Hg folders (makes sense for back-up purposes)
--------- Edit:
Subrepositories is a feature that is work in progress:
https://www.mercurial-scm.org/wiki/subrepos
Also, there is an extension named "forest" that might become obsolete in the future:
https://www.mercurial-scm.org/ForestExtension
You'd need to set up an .hgignore file in sandbox to exclude wiki because mercurial assumes that it is responsible for all descendants. This would probably generate more user confusion than it is worth.