Reduce mercurial .hg folder size - mercurial

I have a mercurial repo with several GB under .hg/store/data.
I identified several huge folders under .hg/store/data which start with underscore (e.g.: .hg/store/data/_some_path_example), and they are not present in the working directory.
I tried to use hg convert extension with a filemap with exclude statements, but the directories are still there in the converted repo.
What should be the paths in the exclude statements that would remove the .hg/store/data/_some_path_example paths with underscores?...
Thanks!

For the sake of other's who have the same question/issue:
As I wrote in a comment, the meaning of leading underscore is mercurial's way to specify that the letter coming after the underscore is upper-case.
I inserted exclude lines with correct case letters, and conversion worked as planned.

Related

Mercurial -- Ignore certain files based on the existence of other files

I use mercurial to keep track of a repository which contains both PDF files (generated by others, which I need to keep track of), and latex files, written by me.
For instance, assume a directory structure like this:
root
- Requirements.pdf
- MyReport.tex
- MyReport.pdf
In this case, MyReport.pdf changes every time MyReport.tex does, and can be wholly determined by the contents of the tex file, so it should not be under version control.
I am looking for a way to tell mercurial to ignore such files. Obviously I can add a rule to .hgignore like this (http://www.selenic.com/mercurial/hgignore.5.html)
syntax: glob
*.pdf
But that will ignore the PDFs that I do need to keep version controlled.
There's also this link: https://www.mercurial-scm.org/wiki/TipsAndTricks#Avoid_merging_autogenerated_.28binary.29_files_.28PDF.29 but that doesn't really solve my problem either, because while it handles building the PDFs, it does not handle telling hg which files are important.
Or I could just do this manually, but I would like a way to script it, to make it more general, since these repositories can have several dozen tex and pdf files and manually managing this has become cumbersome.
It seems like quite a simple rule: If there is a file by the name of "blah.pdf", check to see if there is also a file name "blah.tex" and if so, ignore it, otherwise, pay attention to it. But I can't find anything about that.
There is no such feature in Mercurial, nor in Git, nor will there likely ever be such a feature because it's extremely niche. However, you might consider simply putting your "generated" files into a separate output subdirectory, and then ignoring all such directories. For example, if you have an input like foo/bar.tex, the output could be foo/gen/bar.tex, and you could ignore gen/.
Obviously I can add a rule to .hgignore like this
(http://www.selenic.com/mercurial/hgignore.5.html) ... But that will
ignore the PDFs that I do need to keep version controlled.
.hgignore ignore all newly added or existing not versioned files, matching pattern, but bolded texts give you at least two usable solutions:
Write regexp, which means "all pdf, except some filename(s)" (with manually added filenames, most probably)
use wide pattern, but add needed files into repository explicitly (hg add FILENAME)

Remove not-matching lines from hgignore

Is there an Mercurial extension that removes lines from .hgignore that aren't matching any files in the local repository.
There exists no extension or built in function that does this. You could jerry rig a script to do to find lines that are ignoring nothing without too much work, but consider that this is probably a bad idea.
Just because the .hgignore line isn't matching an files on your local repository doesn't mean it's not matching them on anyone else's repository. Within .hgignore files you'll often find patterns like .swp and .bak. You might not use vi (which creates .swp files) and you might not use an editor that creates '.bakfiles, but other do. Or perhaps your editor creates .swp files but you don't currently have any because you're not actively editing a file. Removing that line means you'd not be ignoring a .swp file next time you had one andhg addremove` would cause it to become tracked.

prevent merge of ascii file in Mercurial

Is there a way to tell Mercurial that a specified ascii file should be completely overwritten rather than merged during future updates, similar to the treatment of a binary file?
Git handles this using .gitattribues, as described here: Git mark file as binary to avoid line separator conversion. Is there a Mercurial equivalent?
Have a look at merge-patterns (in the hgrc). This allows you to specify internal:other as the merge action.

Mercurial repository identification

I need to be able to uniquely identify a Mercurial repository and have that identifier placed in a file that is included when cloned. If I can put the identifier in a file in the .hg folder that is preferable to simply adding a normal file to the repo.
I understand that I can get a near certain identifier from the first changes that are committed. I know that the hgrc file cannot be used to store the identifier, because it is not cloned.
So, my question is: Is there another file in the .hg folder that is cloned that I can use to put the identifier? Thanks.
From first read, it sounds like you want to be able to make sure that a clone of the repository is a clone of the correct repository and not some stand-in impostor. However, if the identification information you're thinking of using is cloned with everything else, then an impostor would still pass this test. You'd need to keep that identifier separate so that it can be compared against information in the clone.
Whether that is your purpose or not, any file in .hg that is cloned you may not want to edit. You'd have to add a file to be tracked in the other areas of the repo, outside of .hg. However, you don't really need an extra file at all, as the changeset hash is not just near certain, but very certain, so the information for handily identifying a repository is built-in to the repository itself.
On the commandline, you can get either the short or full versions of the very first changeset's hash identifier:
> hg id -i -r0
89abf5502e3c
> hg log -r0 --template "{node}"
89abf5502e3c5c65e532db04d8d87141f0ac8b73
If I am correct about your desire to compare 2 identifiers so that you or someone else knows a clone of the repository is a true clone and not a false clone, you would have the same changset id available separately so that someone can use one of the above commands to see the id of their clone and compare it to what you say it should be. This is much like how many websites with downloadable executable files show a hash identifier next to the download link so that you can hash the file yourself and compare the result to the hash on the website.
Edit regarding your comment that sheds light on the purpose of this:
Since you need to be able to read it from a file, there are a couple options:
Tracked file in repository root
There is one file you might consider, other than creating your own: .hgtags.
hg tag -r0 ident
...would tag the very first revision, allowing you to use ident as a reference to that changeset rather than -r0. Mercurial always uses tag information from the latest version of .hgtags, no matter what changeset the working directory is updated to, but that may not matter to your app. hg tag appends a line such as this to the .hgtags file, creating the file if it doesn't exist:
a247494248c4b96a571bbd12e90eade3bf559281 ident
This is most handy if you don't have a tags files yet in your repos, because it will be the first line in the file for easy finding. You might think could simply write this file yourself, but then you'd still have to call hg to get the changeset id and again at some point for adding it to tracking and then committing: hg tag does all that for you.
If there is already the possibility of a tags file to consider, that's ok, too, because they tend to be relatively short and you just need to look for the 1 line that ends with your chosen tag name. Mercurial is designed for append-only operations to .hgtags, but everything would still work fine if you inserted the line for this tag as the very first line if .hgtags already exists because: 1. The tag will never be moved or removed. 2. You'll be using a tag name not already used in the file.
Reading hg's guts
There are files that normally only Mercurial itself touches deeper in .hg that can be read to get the first changeset's hash. I looked into Mercurial's File Formats, Revlog, and RevlogNG, and at least for 2 of my own repos, .hg\store\00changelog.i contains the first changeset's hash at offset 0x20 (20 byte length). Probably, at least since Mercurial 0.9, it will be the same in all repos. RevlogNG also notes the first 4 bytes of that file will indicate Revlog version number and flags. While the changeset id is only 20 bytes long currently, the actual field for it is 32 bytes long, probably for future expansion to a longer hash.
Since this option requires no alteration of existing repositories and only involves reading the first 52-64 bytes of the main index, it's the one I'd probably go with. If I was catching this requirement in the early stages of the product before any repos it manages were out in the wild, I would lean toward the custom file approach because I would probably have my own metadata file created and added from the beginning of the repo.
error: repository is unrelated message come from mercurial/treediscovery.py:
base = list(base)
if base == [nullid]:
if force:
repo.ui.warn(_("warning: repository is unrelated\n"))
else:
raise util.Abort(_("repository is unrelated"))
base variable store last common parts of two repositories. By giving this idea of push/pull checks we may assume that repositories are related if they have common roots, so check hashes from command:
$ hg log -r "roots(all())"
For unknown to me reason hg log -r 0 always shown same root, but you may have situation that FIRST_REPO hold SECOND_REPO history, but obviously 0 revs of SECOND_REPO different from FIRST_REPO but Mercurial check is passed.
You may not trick roots checking by carefully crafting repositories because building two repositories looks like these (with common parts but different roots):
0 <--- SHA-256-XXX <--- SHA-256-YYY <--- SHA-256-ZZZ
0 <--- SHA-256-YYY <--- SHA-256-ZZZ
impossible because that mean you reverse SHA-256 as each subsequent hash depends on previous values.

How can I ignore all directories except one using .hgignore?

I'm managing $HOME using Mercurial, to keep my dotfiles nice and tracked, or at least the ones that matter to me.
However, there's a profusion of files and directories in ~ that do not need to be tracked, and that set is ever-changing and ever-growing.
Historically, I've dealt with this by having this .hgignore:
syntax: glob
*
This keeps my status clean, as far as it goes, making only previously tracked files visible. However, I have some directories (in my case, scripts, .emacs.d) that I would like to see untracked files in; I almost always want to track new additions to those directories.
I know that I can run hg st -u scripts to identify untracked files, but I want a means whereby I can achieve the same function using plain ole hg status.
Is there a way to do this?
Try this in .hgignore instead:
syntax: regexp
^(?!(scripts|foo|bar)/)[^/]+/
^ matches start of path
(?!(scripts|foo|bar) uses negative lookahead to ignore all files except those in directories scripts, foo or bar
/) ensures that directories which have a tracked directory as a prefix are ignored
[^/]+/ then actually matches any directory (excluding those ruled out by the lookahead), so that files in ~ aren't ignored
Credit for the central idea in this solution (the negative lookahead) goes to Michael La Voie's answer to this question
This question has been asked here on SO quite a few times, and you'll get a lot of convoluted answers using zero-width negative look ahead assertions, an oft abused regex trick, but the better solutions are to either (a) just make the repo in that directory alone or (b) just add the files in that directory. For option (b) you'd just put .* in your .hgignore file to ignore everything, and then manually hg add the files you want tracked. In mercurial, unlike svn and cvs, you can override an ignore with an add.