mercurial temporarily ignore versioned files - mercurial

My question is essentially the same as here but applies to mercurial. I have a set of files that are under version control, and one save operation changes quite a lot of files. Some of the resulting changes are important for revision control, and some of the changes are just junk. I can "partition" off the junk into separate files. These junk files need to be part of a basic checkout in order for it to work, but their contents (and changes over time) aren't that important for revision control. Right now I just tell all our developers not to commit these files, but we all forget and it creates a lot of extra baggage in the repository. I don't really like the svn solution proposed because there are quite a lot of files and I want a simple clone to just work without all this extra manual work, so I was wondering if mercurial has a better alternative. It's kind of like hg shelve but not quite, and kind of like ignore, but not quite. Is there some hg extension that allows for this? Can git do it?

Mercurial doesn't support this. The correct way to do it is to commit thefile.sample and then have your developers (or better you deploy script) do a copy from thefile.sample to thefile if thefile doesn't exist. That way anyone can update the example file, but there's no risk of them committing their local changes (say their personal database connect string).

Aha! So TortoiseHG's repository and global settings have an Auto Exclude List where you can define a list of files that will be unchecked by default when the status, commit, and shelve dialogs open. So they still show up, but the user has to check them in order to actually do a commit. The setting is stored in hgrc, but it's under the [tortoisehg] heading so it's not supported by mercurial per se. Nevertheless, it fits my needs.

One solution to this is to use nested tree support (submodule in git), where the "junk" would be put in a different repository (to avoid cluttering the main repo), while enabling checking out the whole thing out in a consistent manner (right version of both repos in sync).
https://www.mercurial-scm.org/wiki/Subrepository?action=show&redirect=subrepos
In git, submodules are one solution to this issue - but they are not that great UI-wise. What I do instead is to keep two completely independent repositories, and using the subtree merge strategy when I need to update the main repo with the junk repo: http://progit.org/book/ch6-7.html

Related

Dealing with customized local source files in Mercurial

Our team of 4 currently uses svn, but I am testing out Mercurial. We code mostly in C++ and C# on .net. I have installed TortoiseHg, and use hgsubversion to push/pull against the existing svn repo. Other developers on the team will remain on svn for now.
I tend to have a few source files that I have customized (butchered would be a better term) in some specific way that I don't want to push to other developers. But I do want these custom changes compiled into my version of the project. The ratio of normal files to customized files is about 100:1
What is the best way to deal with these butchered files in mercurial? I could simply uncheck butchered files during commits, but eventually, I will forget this step. I have taken a brief look at shelving and ignoring.
Ignoring doesn't seem right, because these are tracked files. I do want to pull changes to butchered files from others. Shelving was close, but it removes the butchered code from my working copy, so that isn't correct either.
I can't be the only code butcher out there. Let me know how you deal with this. Git users, we are happy to hear from you too, we haven't entirely committed to using hg.
If they're files most everyone will be modifying (per developer database account info, etc.) the right way to do it is by tracking a xxxxx.sample file. For example have the build script use local-database.config which is in the .hgignore and provide a local-database.config.sample file in the repository. You can either include "copy the sample to the actual" in your instructions or have the build script do it automatically if the actual doesn't already exist. We have our config do a include of settings.local if and only if it exists, and allow it to override main settings. Then no one needs a local config, but it can be optionally used to alter anything w/o a risk of committing it.
If they're files that only you will be modifying the best way is to learn Mercurial queues. Then you can pop that changeset with the you-only changes, push to central repo, and then apply it again for your builds. There are various extensions that do similar things (shelve for example), but mq is preferable because you can version the overlay you're popping and applying.

Why does Mercurial make you pull/update/merge for unrelated files?

For larger teams, having to pull/update/merge then commit each time makes no sense to me, specifically when the files that were changed by other developers have nothing to do with my changeset files.
i.e. I change file1.txt, and someone else changes file10.txt. Why must I merge on my computer before being allowed to push?
It makes pushing a big pain, as you have to constantly pull/update/merge if many developers are commiting.
Also, it makes your changeset look much larger than it was since it shows your merges as seperate commits.
Mercurial makes you do this since its atomic unit isn't a file but a changeset. That is a node containing a group of changes. Each changeset is an individual node in history and represents what that person did. This does result in you having to merge even if no common files where changes (which would be a simple automatic merge). These merge nodes are important since they are part of your repositories history and gives Mercurial more information for future merges with ancestral information.
That said there is an extension you can use that would clean up your history a bit (but won't resolve your issue with needing to pull before you push). It is called the rebase extension, it is shipped with Mercurial but disabled by default. It adds a new arumument to pull that looks like:
hg pull --rebase
This will pull new changes and moves your local changeset linearly above them without having a merge changset. However, I would urge against using this since you do lose information about your repository since you are re-writing its history. Read this post for information about some issues that this may cause.
Well, you could try using rebase, which will avoid the merge commits, but it is not without its own perils. You can also collapse to one step by doing "hg pull --update", rather than separate hg pull; hg update commands.
As for why you must merge on your computer: this is a direct consequence of mercurial being a distributed version control system. There is no central server which can be considered canonical (unless you create one by convention), so there is no other "place" where the merge could occur. You are the only one who can decide how the information in your repo should be combined with the information in the remote repo. The results of these decisions must be recorded, and that is the origin of the merge commit.
Also, in your example the merge would happen without user interaction since there are no conflicts (the same would be true with rebase), so I don't see why that is a problem.
Because having changes in disjunct files does not guarantee that they are independent.
When you pull in changes, even if they are in files that are untouched by your local changes, it can cause your local changes to stop working. E.g. an interface that you access from newly written code could have been changed.
This is why there is always a merge step inbetween, so that a human can review the changes, test for issues, and address them before integrating the changes back into the main repository. This step is very important, because skipping it risks blocking all those 50-100 colleagues (which is very expensive).
I would take Lasse’s advice and push less often. Merging isn’t a big deal if you only need to do it twice or thrice a day. Also maybe create smaller team repositories (or branches) that are merged with the main repository daily by a designated person.

What's the best way to track private files in a public Mercurial repository?

"If it’s not in source control, it doesn’t exist."
This question was addressed for Git here: Techniques to handle a private and public repository?. What about for Mercurial?
I have several public Bitbucket repos (with multiple committers) where I'd like the source to be public, but which load API, SSH keys and other sensitive info from untracked files. However this results in someone emailing around the new config file if we add a new Mailchimp or Hunch or Twilio API key. Is there a way to shield these files from public view somehow and still track them? Everyone is syncing their repo through Bitbucket.
There are two good ways to handle this (besides zerkms's solution, which doesn't offer the easy of synchronization you want, but is what I'd do anyway):
Use Mercurial Queues. When you create a mercurial queue with hg qinit --create-repo it creates an overlay system that can be qpushed atop the existing repo. So you keep your secrets in queues and qpush them when you need them, and qpop them when you don't. With --create-repo the set of overlays (patches) is handled in a repository of its own. So people in the know can push/pull the secret overlay repo and people w/o access to it can use the base repo. The patch repo can be a private repo on bitbucket or hosted elsewhere.
or
Use a subrepo exactly as described in the git solution.
Create filename.ext.sample files with templates inside (probably filled with dummy data), which need to be copied and filled with actual data in the particular working directory.
That is what I usually do ;-)
Zerkms' solution is fast, easy, and clean, and likely your best bet for preventing secure content from being tracked / published; however as you say, "If it’s not in source control, it doesn’t exist." I find that far more often what I'm trying to keep out of source control is not a security concern, but simply a configuration setting. I believe these should be tracked, and my current employer has a rather clever setup for dealing with this, which I'll attempt to simplify / generalize / summarize here.
REPOSITORY
code/
...
scripts/
configparse.sh
...
config/
common.conf
env/
development.conf
testing.conf
production.conf
users/
dimo414.conf
mycoworker.conf
...
hosts/
dimo414-laptop.conf
dimo414-server.conf
mycoworker-laptop.conf
...
local.conf*
makefile
.conf*
* untracked file
Hopefully the idea here is pretty clear, we define settings at each appropriate level, enabling highly granular control of the codebase's behavior in a logical and consistent fashion.
The scripts/configparse.sh script reads all the necessary configuration files in turn and builds .conf out of all the settings it finds.
config/common.conf is the starting point, and contains logical default values for every setting. Many will likely get overwritten, but something is specified here. It's an error for a setting to be found in another file that isn't first set in common.conf.
config/env/ controls the behavior in different environments, doing things like pointing to the correct database servers.
config/users/ looks for a $USER.conf file, useful for setting things I care about, such as increasing the logging level for aspects my team works on, or customizing behavior I prefer to use across all my machines.
config/hosts does the same for machines, looking for $HOSTNAME.conf. Useful for machine-specific settings like application paths or data directories.
config/local.conf is an untracked file, and lets you set checkout-specific values and/or content you don't want in version control.
The aggregate of all these settings is output to .conf, which is what the rest of the codebase looks for when loading settings.

Mercurial (Hg) and Binary Files

I am writing a set of django apps and would like to use Hg for version control. I would like each app to be independent of the others so in each app there may be a directory for static media that contains images that I would not want under version control. In other words, the binary files would not all be in one central location
I would like to find a way to clone the repository that would include copies of the image files. It also would be great if when I did a merge, if there were an image file in one repo and not another, that there would be some sort of warning.
Currently I use a python script to find images and other binary files that are in one repo, but not the other. But a lot of people must face this problem, so there must be a more robust and elegant solution.
One one other thing...for reasons I do not want to go into, usually one of my repos is on a windows machine, and the other is on Linux. So a crossplatform solution would be nice.
Since Mercurial 2.0 the extension largefiles is now included in the main distribution. That extension keeps and manages large files outside of the "normal" repository in a way that you get the benefit of DCVS but without the benefit of exponential size and processing time growth.
Other extension that work along similar lines are SnapExtension and BigFilesExtension. However, those two are not distributed with Mercurial (you have to get them manually).
Mercurial can track any kind of file, for binary files if something changes then the whole file gets replaced not just the changes.
On the getting a warning if one repo doesn't contain a file, that's kind of the point of a DVCS is that the repos are related but are autonomous. You could always check and see what files were added during a synch or merge operation.
The current Mercurial book (by Bryan O'Sullivan) says, that Mercurial stores diffs also for binary files. How efficient this is, obviously depends on the nature of changes to binary files.

How good is my method of embedding version numbers into my application using Mercurial hooks?

This is not quite a specifc question, and more me like for a criticism of my current approach.
I would like to include the program version number in the program I am developing. This is not a commercial product, but a research application so it is important to know which version generated the results.
My method works as follows:
There is a "pre-commit" hook in my .hg/hgrc file link to version_gen.sh
version_gen.sh consists solely of:
hg parent --template "r{rev}_{date|shortdate}" > version.num
In the makefile, the line version="%__VERSION__% in the main script is replaced with the content of the version.num file.
Are there better ways of doing this? The only real short coming I can see is that if you only commit a specfic file, version.num will be updated, but it won't be commited, and if I tried to add always committing that file, that would result in an infite loop (unless I created some temp file to indicate I was already in a commit, but that seems ugly...).
The problem
As you've identified, you've really created a Catch-22 situation here.
You can't really put meaningful information in the version.num file until the changes are committed and because you are storing version.num in the repository, you can't commit changes to the repository until you have populated the version.num file.
My solution
What I would suggest is:
Get rid of the "pre-commit" hook and hg forget the version.num file.
Add version.num to your .hgignore file.
Adjust version_gen.sh to consist of:
hg parent --template "r{node|short}_{date|shortdate}" > version.num
In the makefile, make sure version_gen.sh is run before version.num is used to set the version parameter.
My reasons
As #Ry4an suggests, getting the build system to insert revision information into the software at build time, using information from the Version Control System is a much better option. The only problem with this is if you try to compile the code from an hg archive of the repository, where the build system cannot extract the relevant information.
I would be inclined to discourage this however - in my own build system, the build failed if revision information couldn't be extracted.
Also, as #Kai Inkinen suggests, using the revision number is not portable. Rev 21 on one machine might be rev 22 on another. While this may not be a problem right now, it could be in the future, if you start colaborating with other people.
Finally, I explain my reasons for not liking the Keyword extension in a question of mine, which touches on similar issues to your own question:
I looked at Mercurials Keyword extension, since it seemed like the obvious solution. However the more I looked at it and read peoples opinions, the more that I came to the conclusion that it wasn't the right thing to do.
I also remember the problems that keyword substitution has caused me in projects at previous companies. ...
Also, I don't particularly want to have to enable Mercurial extensions to get the build to complete. I want the solution to be self contained, so that it isn't easy for the application to be accidentally compiled without the embedded version information just because an extension isn't enabled or the right helper software hasn't been installed.
Then in comments to an answer which suggested using the keyword extension anyway:
... I rejected using the keyword extension as it would be too easy to end up with the string "$Id$" being compiled into the executable. If keyword expansion was built into mercurial rather than an extension, and on by default, I might consider it, but as it stands it just wouldn't be reliable. – Mark Booth
A don't think that there can be a more reliable solution. What if someone accidentally damages .hg or builds not from a clone but from an archive? – Mr.Cat
#Mr.Cat - I don't think there can be a less reliable solution than the keywords extension. Anywhere you haven't explicitly enabled the extension (or someone has disabled it) then you get the literal string "$ID$" compiled into the object file without complaint. If mercurial or the repo is damaged (not sure which you meant) you need to fix that first anyway. As for hg archive, my original solution fails to compile if you try to build it from an archive! That is precisely what I want. I don't want any source to be compiled into our apps without it source being under revision control! – Mark Booth
What you are trying to do is called Keyword Expansion, which is not supported in Mercurial core.
You can integrate that expansion in make file, or (simpler) with the Keyword extension.
This extension allows the expansion of RCS/CVS-like and user defined keys in text files tracked by Mercurial.
Expansion takes place in the working directory or/and when creating a distribution using "hg archive"
That you use a pre-commit hook is what's concerning. You shouldn't be putting the rest of version_gen.sh into the source files thesemves, just into the build/release artifacts which you can do more accurately with an 'update' hook.
You don't want the Makefile to actually change in the repo with each commit, that just makes merges hell. You want to insert the version after checking out the files in advance of a build, which is is what an update hook does.
In distributed systems like Mercurial, the actual "version number" does not necessarily mean the same thing in every environment. Even if this is a single person project, and you are really careful with having only your central repo, you would still probably want to use the sha1-sum instead, since that is truly unique for the given repository state. The sha1 can be fetched through the template {node}
As a suggestion, I think that a better workflow would be to use tags instead, which btw are also local to your repository until you push them upstream. Don't write your number into a file, but instead tag your release code with a meaningful tag like
RELEASE_2
or
RELEASE_2010-04-01
or maybe script this and use the template to create the tag?
You can then add the tag to your non-versioned (in .hgignore) version.num file to be added into the build. This way you can give meaningful names to the releases and you tie the release to the unique identifier.