ignore certain files on HG pull - mercurial

Thanks for reading my situation..
My situation is this:
RepoA (7GB) - I have read access
RepoB (7GB - Forked from RepoA) - I have read/write access, but no admin access, and cannot fork.
I did not set up any of these, but am now the admin of the source code.
Currently a company is doing work in RepoA and we are pulling local and pushing into RepoB where we are also doing work.. This works great, but the repositories are so large it is very annoying.
RepoA has some stuff, RepoB as everything
I created RepoC, which is a clone of B minus all of the gigs of assets.
RepoC - cloned from RepoB with hg convert --filemap map.txt RepoB small_clone/RepoC
I now have a very nice manageable 300 MB repository in RepoC (Good). Now RepoB is useless to me.
However when someone makes a change in RepoA, I want that change in RepoC. In the past I have had RepoA in my hgrc, and simply done a hg pull RepoA. But when I do that now from the small repo (RepoC), it brings in all the assets I specified I did not want in RepoC.
Is there a way I can continue to take updates from RepoA that ignore certain directories? Or is there a way after I hg pull from RepoA, that I can prune out any new files I do not want prior to pushing to RepoC. I have not done an HG update, but the gigs of files are showing up in .hg/store/data. I am wondering if I can somehow yank them out of there, then hg update, then commit, then push
Any advice?
Thank you!

There is no direct way - mercurial requires to know the complete history, and that includes all the big assets (there's no shallow clone (yet)).
However you might consider to use some of the extensions which make it easier to deal with huge files or many sub projects:
largefiles: LargefilesExtension
subrepository: https://www.mercurial-scm.org/repo/hg/help/subrepos
instead of subrepository the guestrepository (3rd-party): GuestrepoExtension
If these changes to the workflow don't see the right approach (they all have their rough edges - check them carefully), you can only export the patches and import them into the other repository, either manually or even scripted; within repoC something like
hg export -rXXX -R path/to/repoB | patch might do the trick - but for each revision.

Related

Mercurial Repo Living Archive

We have an Hg repo that is over 6GB and 150,000 changesets. It has 8 years of history on a large application. We have used a branching strategy over the last 8 years. In this approach, we create a new branch for a feature and when finished, close the branch and merge it to default/trunk. We don't prune branches after changes are pushed into default.
As our repo grows, it is getting more painful to work with. We love having the full history on each file and don't want to lose that, but we want to make our repo size much smaller.
One approach I've been looking into would be to have two separate repos, a 'Working' repo and an 'Archive' repo. The Working repo would contain the last 1 to 2 years of history and would be the repo developers cloned and pushed/pulled from on a daily basis. The Archive repo would contain the full history, including the new changesets pushed into the working repo.
I cannot find the right Hg commands to enable this. I was able to create a Working repo using hg convert <src> <dest> --config convert.hg.startref=<rev>. However, Mecurial sees this as a completely different repo, breaking any association between our Working and Archive repos. I'm unable to find a way to merge/splice changesets pushed to the Working repo into the Archive repo and maintain a unified file history. I tried hg transplant -s <src>, but that resulted in several 'skipping emptied changeset' messages. It's not clear to my why the hg transplant command felt those changeset were empty. Also, if I were to get this working, does anyone know if it maintains a file's history, or is my repo going to see the transplanted portion as separate, maybe showing up as a delete/create or something?
Anyone have a solution to either enable this Working/Archive approach or have a different approach that may work for us? It is critical that we maintain full file history, to make historical research simple.
Thanks
You might be hitting a known bug with the underlying storage compression. 6GB for 150,000 revision is a lot.
This storage issue is usually encountered on very branchy repositories, on an internal data structure storing the content of each revision. The current fix for this bug can reduce repository size up to ten folds.
Possible Quick Fix
You can blindly try to apply the current fix for the issue and see if it shrinks your repository.
upgrade to Mercurial 4.7,
add the following to your repository configuration:
[format]
sparse-revlog = yes
run hg debugupgraderepo --optimize redeltaall --run (this will take a while)
Some other improvements are also turned on by default in 4.7. So upgrade to 4.7 and running the debugupgraderepo should help in all cases.
Finer Diagnostic
Can you tell us what is the size of the .hg/store/00manifest.d file compared to the full size of .hg/store ?
In addition, can you provide use with the output of hg debugrevlog -m
Other reason ?
Another reason for repository size to grow is for large (usually binary file) to be committed in it. Do you have any them ?
The problem is that the hash id for each revision is calculated based on a number of items including the parent id. So when you change the parent you change the id.
As far as I'm aware there is no nice way to do this, but I have done something similar with several of my repos. The bad news is that it required a chain of repos, batch files and splice maps to get it done.
The bulk of the work I'm describing is ideally done one time only and then you just run the same scripts against the same existing repos every time you want to update it to pull in the latest commits.
The way I would do it is to have three repos:
Working
Merge
Archive
The first commit of Working is a squash of all the original commits in Archive, so you'll be throwing that commit away when you pull your Working code into the Archive, and reparenting the second Working commit onto the old tip of Archive.
STOP: If you're going to do this, back up your existing repos, especially the Archive repo before trying it, it might get trashed if you run this over the top of it. It might also be fine, but I'm not having any problems on my conscience!
Pull both Working and Archive into the Merge repo.
You now have a Merge repo with two completely independent trees in it.
Create a splicemap. This is just a text file giving the hash of a child node and the hash of its proposed parent node, separated by a space.
So your splicemap would just be something like:
hash-of-working-commit-2 hash-of-archive-old-tip
Then run hg convert with the splicemap option to do the reparenting of the second commit of Working onto the old tip of the Archive. E.g.
hg convert --splicemap splicemapPath.txt --config convert.hg.saverev=true Merge Archive
You might want to try writing it to a different named repo rather than Archive the first time, or you could try writing it over a copy of the existing Archive, I'm not sure if it'll work but if it does it would probably be quicker.
Once you've run this setup once, you can just run the same scripts over the existing repos again and again to update with the latest Working revisions. Just pull from Working to Merge and then run the hg convert to put it into Archive.

Teaching a mercurial repository about a bad rename after it is pushed

We have moves and renames of files in our published mercurial history that were not properly recorded, so that they appear in the history as unrelated deletions and adds.
Is there any way to tell the repository about the connections so that --follow commands can work again?
(For non-pushed changes, here is a question discussing how to get mercurial to properly record moves/renames before you commit, as well as a useful tip here.)
One solution that works but is a bit brutal: You can login remotely on your central Mercurial server, fix the renames there as a local change and then ask everyone to clone the repo again.
This works since all Mercurial repos are equal. You just think of one as "central" but in fact it's a repo like any other. So if you have access to that, you can rewrite history there. The drawback is of course that every developer will notice, so they'll have to export any non-pushed changes they made, clone the repo again and then import the patch.
[EDIT] A possible workaround would be to create a new branch just before you did the bad rename, rename the files properly and then cherry pick all the changes after that into the new branch.
I'm not sure if it's a good idea to merge this branch into your original branch. If you did, then Mercurial would see two kinds of file renames in the past and I'm not sure which one it would follow.
So before you merge, I suggest that you create a small test/demo repo where you reproduce the situation and then try it out.
You can start a new head from before the rename, do the rename properly, then merge it into the head from the upstream server. It will give you a conflict. When this happens, revert the offending files to the revision from your new head (with the good renames; hg revert -r <rev> <files...>).

Are deleted files still downloaded with an hg clone?

In our Mercurial repo we added a really big file (and did an hg push), then deleted the big file (and did another push).
Now if someone does an hg clone will they still pull down that big file? I know it won't appear in their working directory as it was deleted, but will the file still be pulled down and stored in Mercurial internal storage?
I'd like to ensure people don't have to pull down the file. I've learned that really big files should be stored outside of Mercurial, so I deleted the file. But I was wondering if people will still be pulling down the big file - in which case I guess I will recreate the repository from scratch.
Of course it will still be in the repository.
You can always update back to older revisions, and if you update back to the revision you got when you committed the file, it'll be there in all its glory.
There are two ways to mitigate this (when you're committing, not now):
One of the big-files extensions, these essentially add big files to a secondary repository and link the two, so that if you update to a revision where the file doesn't exist, and you don't already have it, it will not get updated. ie. it's more a "on-demand" style of pulling
If the file never changes, keep it available on the network and just create some kind of link to it instead of a full copy
Right now, you got four options:
Strip away the changeset that added the file, and all the changesets that came after it. You can do that using the Mercurial Queues extension. Note that you need to do this stripping in all clones. If just one of your users push back the repository that has that file in its history to the central clone, you have the changesets back.
Rebuild the repository from scratch manually
Using the hg convert command and some filtering, the --filemap option can be used for this
Leave it as is. How big is it, will be much of a problem?
Note that rebuilding the repository, either manually or through hg convert will invalidate all clones. Anyone trying to push to your new central clone from an old clone will get a message about unrelated repositories. If any of your users are stupi^H^H^H^H^Hnot smart enough to realize that forcing the push is a bad idea, then you will have problems with this approach.
Yes, the file is still in the history. If you want to delete it completely, you need to use Mercurial Queues — see Editing History on Mercurial wiki.
Just keep in mind this breaks clones as revision IDs change.

Mercurial Remove History

Is there a way in mercurial to remove old changesets from a database? I have a repository that is 60GB and that makes it pretty painful to do a clone. I would like to trim off everything before a certain date and put the huge database away to collect dust.
There is no simple / recommended way of doing this directly to an existing repository.
You can however "convert" your mercurial repo to a new mercurial repo and choose a revision from where to include the history onwards via the convert.hg.startrev option
hg convert --config convert.hg.startrev=1234 <source-repository> <new-repository-name>
The new repo will contain everything from the original repo minus the history previous to the starting revision.
Caveat: The new repo will have completely new changeset IDs, i.e. it is in no way related to the original repo. After creating the new repo every developer has to clone the new repo and delete their clones from the original repo.
I used this to cleanup old repos used internally within our company - combined with the --filemap option to remove unwanted files too.
You can do it, but in doing so you invalidate all the clones out there, so it's generally not wise to do unless you're working entirely alone.
Every changeset in mercurial is uniquely identified by a hashcode, which is a combination of (among other things) the source code changes, metadata, and the hashes of its one or two parents. Those parents need to exist in the repo all the way back to the start of the project. (Not having that restriction would be having shallow-clones, which aren't available (yet)).
If you're okay with changing the hashes of the newer changesets (which again breaks all the clones out there in the wild) you can do so with the commands;
hg export -o 'changeset-%r.patch' 400:tip # changesets 400 through the end for example
cd /elsewhere
hg init newrepo
cd newrepo
hg import /path/to/the/patches/*.patch
You'll probably have to do a little work to handle merge changesets, but that's the general idea.
One could also do it using hg convert with type hg as both the source and the destination types, and using a splicemap, but that's probably more involved yet.
The larger question is, how do you type up 60GB of source code, or were you adding generated files against all advice. :)

Merging changes to a workspace with uncommitted changes

We've just recently switched over from SVN to Mercurial, but now we are running into problems with our workflow. Example:
I have my local clone of the repository which I work on. I'm making some highly experimental changes to our code base, something that I don't want to commit before I'm sure it works the way it is supposed to, I don't want to commit it even locally. Now, simultaneously, my co-worker has made some significant improvements/bug fixes which I need. He pushes his commits to our main repository. The question is, how can I merge his changes to my workspace without the requirement that I have to commit all my changes, since I need his changes to test my own code?
A more day-to-day problem we have with the exact same workflow is where we have a couple of configuration files which are in the repository. Each developer makes a couple of small environment specific changes to the configuration files, but do not commit the changes. These couple of uncommitted files hinders us from making any merges to our workspace, just like with the example above. Ideally, the configuration files probably shouldn't be in the repository, unfortunately, that's just how it has to be for here unnamed reasons.
If you don't want to clone, you can do it the following way.
hg diff > mylocalchanges.txt
hg revert -a
# Do your merge here, once you are done, import back your local mods
hg import --no-commit mylocalchanges.txt
There are two operations, as you've discovered, that makes changes from one person available to someone else (or many, on either side.)
There's pulling, which takes changes from some other clone of the repository and puts them into your clone.
There's pushing, which takes changes from your repository and puts them into another clone.
In your case, your coworker has pushed his changes into what I assume is your central master of the repository.
After he has done this, you can pull the latest changes down into your repository, and merge them into your branch. This will incorporate any bugfixes or changes your coworker did into your experimental code.
This gives you the freedom of staying current on other coworkers development in your project, and not having to release your experimental code until it is ready (or even at all.)
So, as long as you stay away from the Push command, you're safe.
Of course, this also assumes nobody is pulling directly from your clone of the repository, if they do that, then of course they will get your experimental changes, but it doesn't sound like you've set it up this way (and it is highly unlikely as well.)
As for the configuration files, the typical way to do this is that you only commit a master file template into the repository, with a different name (ie. an extra extension .template or similar), and then place the name of the real configuration file into the ignore filter.
Each developer then has to make his or her own copy of the template, rename it, and change it in any way they want, without the risk of committing database connection strings, passwords, or local paths, to the repository.
If necessary, provide a script that will help the developer make the real configuration file if it is long and complex.
Regarding your experimental changes, you should commit them. Often.
Simply you commit them in a clone you don't push. You only pull to merge whatever updates you need from other repos.
As for config files, don't commit them.
Commit template files, and script able to generate complete config files from the template.
That way, developers will only modify "private" (i.e. not committed) config files with their own private values.
If you know your uncommitted changes will not collide with the merge commit that you are creating - then you can do the following...
1) Shelve the uncommitted changes
2) Do the pull and merge
3) Unshelve the uncommitted changes
Shelf effectively stores your uncommitted changes away as into diff (relative to your last commit) then rolls back those files in your local workspace. Then un-shelving then applies that diff, bringing back your uncommitted changes.
Tools such as TortoiseHg have shelf built in.