delete files in history to save some space in mercurial - mercurial

ok, when I was young, I put severial big files(like resource file, dll, etc..) in my mercurial repos. and I found the size of it is so big that I cannot easily push it into bitbucket,
any way to delete this files history EASILY?
I put all those files in /res and /dll path.
edit:
this is a solution, but it will delete part of the history, so maybe there is a better solution.
Mercurial Remove History

Your best bet is to use the convert extension, but warning you'll end up with a totally different repo. Every hash will be different and every person who cloned will need to delete their clone and re-clone.
That said, here's what you do:
Create a filemap file named filemap.txt containing:
exclude res
exclude dll
and then run this command:
hg convert --filemap filemap.txt your-source-repository your-destination-repository
For example:
hg convert --filemap filemap.txt /home/you/repos/bloatedrepo /home/you/repos/slenderrepo
That gets you a whole new repo that has all of your history except the history of any files in /res and /dll, but again it will be a new, unrelated repo as far as mercurial (and bitbucket) are concerned.

Related

How to search for filename throughout Mercurial repository?

I want to search for a filename across all my commits/branches and find out which commits/branches contain that filename. I don't know which subdirectory/subdirectories of the repo the file would be in.
I've tried hg grep <filename>, but that only seems to show files containing "filename".
I've also looked at Mercurial - determine where file was removed?, but that really help me if a file was created on a different branch. The person asking that question suggested hg log myfile -v, which seems like it could work, but doesn't. I know that somewhere in my repo the file exists because I get something back when I do find .hg | grep <filename>, but that doesn't tell me (at least not clearly) which commits/branches.
You have to read hg help patterns and maybe hg help filesets in order to write correct pattern for the file (most probably you'll be happy with just pattern)
If file exist now in working directory (i.e. was added and not removed later), you'll find it with hg file <PATTERN> and will determine full path by output, see above (pattern used)
>hg file **/test-extra.t
tests\test-extra.t
and call hg log with full filename
In any case (hg file returned 0 for removed file or file still in WD) you can call hg log <FILESET> and get history too. Log for existing will be too long, will show deleted unique filename sample
>hg log set:**/dulwich/tests/__init__.py -Tcompact
223 0b6c08800d16 2009-07-23 08:48 +0100 a
delete the dulwich fork we have
2 c43c02cc803a 2009-04-22 16:59 -0700 schacon
added dulwich library and got the script to call it for clone
If your set will be too wide and (may) include files with the same name in different folders, you have to verify filenames by calling log with more details about files, f.e. with -Tstatus
hg log 'glob:**/<filename>` -Tstatus
seems to do the job. It doesn't give me the commits containing <filename>, but it does give commits (and their branches) involving filename.
Credit to Lazy Badger's answer for pointing me to this.

How can i make mercurial to add wildcard for file name

we are working on a project, where the angularjs web project is compiled and binaries are stored in hg repo. The problem is angularjs js files are usually compiled with hashing for all binary files. Ex: binary files are suffixed with unique extensions for each file
main.1cc794c25c00388d81bb.js,
polyfills.eda7b2736c9951cdce19.js,
runtime.a2aefc53e5f0bce023ee.js,
common.7d1522841bf85b01f2e6.js,
1.620807da7415abaeeb47.js,
2.93e8bd3b179a0199a6a3.....etc.
The problem is every time a new binary in checkin in hg repo, it is being detected as new file and retained along with old file of same name. So, i need a way to fool the hg repo, to retain the file name but still consider them as old file replacing the previous one.
main.1cc794c25c00388d81bb.js ==> overwrite old main.js
polyfills.eda7b2736c9951cdce19.js ==> overwrite old polyfill.js
runtime.a2aefc53e5f0bce023ee.js ==> overwrite old polyfill.js
common.7d1522841bf85b01f2e6.js ==> overwrite old commom.js
1.620807da7415abaeeb47.js ==> overwrite old 1.js
2.93e8bd3b179a0199a6a3 ==> overwrite old 2.js
Could any one point out a way, to fool the hg to consider these files are just modification of previous files and not as new files ?
Can hgignore or some other extension be used...
A VCS shall track the state of files. And those are indeed new files. One can argue that those are the old files renamed - which can be recorded by the VCS.
So there are two solutions I see:
Record moving the old filenames to the new filenames. hg addremove --similarity XX might be of big help here. It will result in all the files having the new names each time - but if the similarity is good enough it will work nicely. You might need to adjust the XX to get a similarity measure (0 ... 100) which works for you best. Adding --dry-run for testing purposes might make testing easy. You WILL need to delete the old files before you run hg addremove though.
Have a pre-commit hook which iterates over *.js files and moves via an appropriate regex ..js to *.js omitting the hashing code, effectively overwriting the generic filenames with the newly generated hashed filenames.

Splitting subfolders of an SVN repo into individual Mercurial repos results in error "abort: expected trunk to be at 'trunk/module1', but not found"

I have an SVN repo with structure like this:
/branch
/tags
/trunk
/trunk/module1
/trunk/module2
/trunk/module3
I am trying to separate this into individual Mercurial repos, where each new Mercurial repo retains the revision history of that module's files. The end result would be
/module1-hg
/module2-hg
/module3-hg
Based on this guide (http://wiki.colar.net/selectively_converting_subversion_repository_to_mercurial),
I have tried using
hg --config convert.svn.trunk=trunk/module1 convert https://repo.url/ module1-hg
but that results in the following error:
abort: expected trunk to be at 'trunk/module1', but not found
I am able to convert the whole SVN repository, but I'd really like to separate the modules at this point. I feel like I just can't find a good example of the syntax to split these apart. Can anyone help?
I recently did this but I did it in several steps.
Firstly, I converted the whole repo to be a mirror of the SVN repo. I used the hgsubversion extension for this but if you've done it using the convert extension then that's fine.
The second step was where I split the repos up. I used the convert extension with a filemap to exclude some folders and rename others.
For example:
hg convert bigrepo module1-hg --filemap module1.txt
And module1.txt would contain the following:
exclude module2
exclude module3
rename module1 .
That would create a repo called module1-hg excluding modules 2 and 3. It would also move the source of module1 into the root of the repo instead of a subdirectory.
You could then repeat the action for modules 2 and 3 with similar filemap files.

How to convert an existing Mercurial repository to use subrepositories and keep the history intact?

I've been reading about subrepositories and how to extract an existing folder from a Mercurial repository to a subrepository using the convert extension and a filemap. I can successfully do this. If I have the following folder structure:
C:\Project
---Project\root.txt
---Project\SubFolder
---Project\SubFolder\fileinsubfolder.txt
I can make a subrepository of SubFolder. In much the same way I can extract everything else a seperate repositorie (in this example the second repository would just have the root.txt file). Afterwards I can add the SubFolder repository as a subrepository to the second repository. But although both repositories have the complete history, these histories aren't linked => updating the root repository to an earlier state won't put the subrepository in the state it should be at that point. Updating to a consistent older revision (both root and subrepo updated automatically) will only work when updating to a revision that already knows about the subrepository and has .hgsubstate file.
And alternative I thought about was just forgetting the files in SubFolder in the current repository and initing a new repository in SubFolder and at the same time add a .hgsub file. What I hope to achieve here is work from this point on with a subrepository but still have a way to update to an older revision (before separating the subrepo) because the files of SubFolder are still in the history of the current repository.
This doesn't work though: When I have forgotten the files in mercurial, inited a new repo and linked it as a subrepo in the current repo and I update to an older revision before the subrepo existed I get this error:
C:\Project>hg update 1
abort: path 'SubFolder\fileinsubfolder.txt' is inside repo 'SubFolder'
The problem here is that when updating to an older revision which wasn't aware of the subrepo, this update wants to put files in the SubFolder. But this SubFolder is still another repo (has a .hg directory) and although the main repo has no recollection about it, the update doesn't want to put files in the SubFolder as it is a repo.
Is there anyway to get around this error or is there a better method to switch to using a subrepo for a certain folder in an existing Mercurial repository and keep the history intact (and both histories linked)?
Nope, I'm afraid there are no tools that will allow you to split a repository in the way you ask for.
Just to clarify your question, then let us imagine you have a repository where the content of root.txt and sub/file.txt evolve like this
root.txt sub/file.txt`
0: root 0 file 0
1: root 1 file 1
2: root 2 file 2
for the first three changesets. What you ask for is an option for hg convert that would turn this into two repositories (easy, we can do that today) and where the convert extension injects .hgsub and .hgsubstate files so that the three changesets contain
root.txt .hgsub .hgsubstate | file.txt
0: root 0 sub = sub <X> sub | <X>: file 0
1: root 1 sub = sub <Y> sub | <Y>: file 1
2: root 2 sub = sub <Z> sub | <Z>: file 2
where the <X>, <Y>, and <Z> hashes are the one corresponding to the first three commits in the subrepository.
There is no such option for hg convert today, but based on the above it sounds feasible to write one. That would give you a great way to convert back and forth between a combined and a split repository.

Mercurial: Remove file from all changesets

I understand how to remove an entire changeset from history but it's not clear how to remove a subset instead.
For example, how do I remove all DLL files from an existing changeset while leaving the source-code alone?
Because the revision ids (e.g. a8d7641f...) are based on a hash of the changeset, it's not really possible to remove a subset of a changeset from history.
However, it is possible to create a new repo with a parallel history, except for a certain set of files, by using the Convert extension. You'll be converting a Mercurial repo to a Mercurial repo, using the filemap to exclude the files you don't want by adding excludes. This will create a new, unrelated repository, which means that any clones people have won't be able to pull from it any more, and will have to re-clone from this new repo.
Make sure all your teammates have pushed their local changes to the
central repo (if any)
Backup your repository
Create a "map.txt" file with the following content:
# this filemap is used to exclude specific files
exclude "subdir/filename1.ext"
exclude "subdir/filename2.ext"
exclude "subdir2"
Run this command:
hg convert --filemap map.txt c:/oldrepo c:/newrepo
NOTE: You have to use "forward-slash" in paths, even on windows.
Wait and be patient
Now you have a new repo at c:\newrepo but without the files
PS. In the "upper" repo you have to remove all changesets and re-push your new repo.
PPS. I actually wrote a blog post about this that has more details (including stripping the changesest in Bitbucket etc.