Repo cloned twice in multi-config job - mercurial

We have Jenkins set up to build on a number of slave nodes. On each node, it seems to clone the repository twice (using Mercurial). First it clones it directly into the top-level workspace/[jobname] directory (this seems to happen before the master hands off to the slave) and then the slave starts and creates another clone in workspace/[jobname]/label/[nodename].
Our repo is quite large (over 250Mb) so cloning it twice is a major resource drain in time and disk space. Is there a reason it does this, considering that the first clone doesn't seem to be used by the build? Can this behavior be changed?

Related

Strip a Mercurial branch on the server side

We have a SOLUTION folder (Mercurial repository) in whitch we have a PROJECT folder, that is also a Mercurial repository.
So two repositories: one - the root(solution) folder and other - a subfolder of the root folder(the project) (yes strange but it is like this)...
Everything worked, but one day someone somehow included the SOLUTION branch into the PROJECT repository... So all the history from the Solution branch was included in parralel with the Project branch into the PROJECT repository....
Now is a little mess in the PROJECT repository... There is need to clean that repository...
Locally it worked by applying the hg strip rev XXS (where XXS was the revision number of the very first node from the freshly added Solution branch in the Project repository).
But it seems there is no strip equivalent on the server?!
Every time we'll pull incoming changes in the Project repository, the "Solution" branch will be re-imported....
Is there a way to manage it on the server side?
Of course the same solution would also work on the server. Thus you need login access to the server itself to execute the same local history operation on it. But for the default setup (publishing server) a push will never remove changesets which are present on a remote location; when you history edit your local repository, the changes will not all propagate: only additions to the graph will, but no deletions.
If such changes to the remote server are expected to be pushed, and this is a regular thing, you might want to look into use of phases and how to setup a non-publishing server, e.g. a server with mutable history: Phases#Publishing_Repository.
Mind that such a workflow also means that every single one of the people with push privilige has to change their default phase to 'draft' instead of 'public' - at least for that project.
kill the server repo. start a fresh one, then from local:
hg push -rev XXR
where XXR is the last rev you want to keep.

Data corruption of a mercurial repository

I have a mercurial repository at c:\Dropbox\code. I've created a clone of this repo locally using:
hg clone -U c:\Dropbox\code c:\GoogleDrive\codeBackup
This bare repo serves the purpose of backup only. I regularly push changes to codeBackup. Furthermore, both the directories are backed-up in the cloud (Dropbox & Google Drive respectively).
If my repo in code becomes corrupt would the codeBackup repo automatically be corrupt since the clone operation used hard links to the original repo? Thus my double-cloud-backup strategy would be useless?
P.S. : I understand that the fall back option is to use the cloud service to restore a previous known good state.
UPDATE : After digging around, I'll add these for reference
Discussion on repo corruption in mercurial
The problem is, if a 'hg clone' was done (without --pull option), then
the destination and the source repo share files inside .hg/store by
using hardlinks 1, if the filesystem provides the hardlinking
feature (NTFS does).
Mercurial is designed to break such hardlinks inside .hg if a commit
or push is done to one of the clones. The prerequisite for this is,
that the Windows API mercurial is using should give a correct answer,
if mercurial is asking "how many hardlinks are on this file?".
We found out that this answer is almost always wrong (always reporting
1, even if it is in fact >1) iff the hg process is running on one
Windows computer and the repository files are on a network share on a
different Windows computer.
To avoid hardlinks (use --pull):
hg clone -U --pull c:\Dropbox\code c:\GoogleDrive\codeBackup
To check for hardlinks:
fsutil hardlink list <file> : Shows all hardlinks for <file>
find . -links +1 : Shows all files with hardlinks > 1
ls -l : shows hardlinks count next to each file
The biggest problem here, regarding repository corruption, is that you're using Dropbox and Google Drive to synchronize repositories across machines.
Don't do that!
This will surely lead to repository corruption unless you can guarantee that:
Your machines will never lose internet connection
You will never have new changes unsynchronized on more than one machine at a time (including times where you have had internet problems)
That Dropbox will always run (variant of never lose internet connection)
You're not just plain unlucky regarding timing
To verify that Dropbox can easily lead to repository corruption, do the following:
Navigate to a folder inside your Dropbox or Google Drive folder and create a Mercurial repository here. Do this on one machine, let's call this machine A.
Add 3 text files to it, with some content (not empty), and commit those 3 text files.
Wait for Dropbox/Google Drive to synchronize all those files onto your second computer, let's call this machine B
Either disconnect the internet on one of the machines, or stop Dropbox/Google Drive on it (doesn't matter which one)
On Machine A, change file 1 and 2, by adding or modifying content in them. On Machine B, change file 2 and 3, making sure to add/modify in some different content from what you did on machine A. Commit all the changes on both machines.
Reconnect to the internet or restart Dropbox/Google Drive, depending on what you did in step 4
Wait for synchronization to complete (Dropbox will show a green checkmark in its tray icon, unsure what Google Drive will display)
Run hg verify in the repositories on both machine A and B
Notice that they are now both corrupt:
D:\Dropbox\Temp\repotest>hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
3.txt#?: rev 1 points to unexpected changeset 1
(expected 0)
3.txt#?: 89ab3388d4d1 not in manifests
3 files, 2 changesets, 6 total revisions
1 warnings encountered!
2 integrity errors encountered!
Instead get a free bitbucket or kiln account and use that to push and pull between to synchronize across multiple computers.
The only way you code repository can become corrupt (assuming it was not corrupt when you initially cloned it over to codeBackup) is when you write something to it, be it committing, rewriting history, etc. Whenever something gets written to a hard-linked file, Mercurial first breaks the hard link, creates an independent copy of the file and then only modifies that newly created copy.
So to answer your questions: under normal usage scenarios repository corruption will not propagate to your codeBackup repository.

How can I keep some modifications from propagating in mercurial?

I am developing a web database that is already in use for about a dozen separate installations, most of which I also manage. Each installation has a fair bit of local configuration and customization. Having just switched to mercurial from svn, I would like to take advantage of its distributed nature to keep track of local modifications. I have set up each installed server as its own repo (and configured apache not to serve the .hg directories).
My difficulty is that the development tree also contains local configuration, and I want to avoid placing every bit of it in an unversioned config file. So, how do I set things up to avoid propagating local configuration to the master repo and to the installed copies?
Example: I have a long config.ini file that should be versioned and distributed. The "clean" version contains placeholders for the database connection parameters, and I don't want the development server's passwords to end up in the repositories for the installed copies. But now and then I'll make changes (e.g., new defaults) that I do need to propagate. There are several files in a similar situation.
The best I could work out so far involves installing mq and turning the local modifications into a patch (two patches, actually, with logically separate changesets). Every time I want to commit a regular changeset to the local repo, I need to pop all patches, commit the modifications, and re-apply the patches. When I'm ready to push to the master repo, I must again pop the patches, push, and re-apply them. This is all convoluted and error-prone.
The only other alternative I can see is to forget about push and only propagate changesets as patches, which seems like an even worse solution. Can someone suggest a better set-up? I can't imagine that this is such an unusual configuration, but I haven't found anything about it.
Edit: After following up on the suggestions here, I'm coming to the conclusion that named branches plus rebase provide a simple and workable solution. I've added a description in the form of my own answer. Please take a look.
From your comments, it looks like you are already familiar with the best practice for dealing with this: version a configuration template, and keep the actual configuration unversioned.
But since you aren't happy with that solution, here is another one you can try:
Mercurial 2.1 introduced the concept of Phases. The phase is changeset metadata marking it as "secret", "draft" or "public". Normally this metadata is used and manipulated automatically by mercurial and its extensions without the user needing to be aware of it.
However, if you made a changeset 1234 which you never want to push to other repositories, you can enforce this by manually marking it as secret like this:
hg phase --force --secret -r 1234
If you then try to push to another repository, it will be ignored with this warning:
pushing to http://example.com/some/other/repository
searching for changes
no changes found (ignored 1 secret changesets)
This solution allows you to
version the local configuration changes
prevent those changes from being pushed accidentally
merge your local changes with other changes which you pull in
The big downside is of course that you cannot push changes which you made on top of this secret changeset (because that would push the secret changeset along). You'll have to rebase any such changes before you can push them.
If the problem with a versioned template and an unversioned local copy is that changes to the template don't make it into the local copies, how about modifying your app to use an unversioned localconfig.ini and fallback to a versioned config.ini for missing parameters. This way new default parameters can be added to config.ini and be propagated into your app.
Having followed up on the suggestions here, I came to the conclusion that named branches plus rebase provide a simple and reliable solution. I've been using the following method for some time now and it works very well. Basically, the history around the local changes is separated into named branches which can be easily rearranged with rebase.
I use a branch local for configuration information. When all my repos support Phases, I'll mark the local branch secret; but the method works without it. local depends on default, but default does not depend on local so it can be pushed independently (with hg push -r default). Here's how it works:
Suppose the main line of development is in the default branch. (You could have more branches; this is for concreteness). There is a master (stable) repo that does not contain passwords etc.:
---o--o--o (default)
In each deployed (non-development) clone, I create a branch local and commit all local state to it.
...o--o--o (default)
\
L--L (local)
Updates from upstream will always be in default. Whenever I pull updates, I merge them into local (n is a sequence of new updates):
...o--o--o--n--n (default)
\ \
L--L--N (local)
The local branch tracks the evolution of default, and I can still return to old configurations if something goes wrong.
On the development server, I start with the same set-up: a local branch with config settings as above. This will never be pushed. But at the tip of local I create a third branch, dev. This is where new development happens.
...o--o (default)
\
L--L (local)
\
d--d--d (dev)
When I am ready to publish some features to the main repository, I first rebase the entire dev branch onto the tip of default:
hg rebase --source "min(branch('dev'))" --dest default --detach
The previous tree becomes:
...o--o--d--d--d (default)
\
L--L (local)
The rebased changesets now belong to branch default. (With feature branches, add --keepbranches to the rebase command to retain the branch name). The new features no longer have any ancestors in local, and I can publish them with push -r default without dragging along the local revisions. (Never merge from local into default; only the other way around). If you forget to say -r default when pushing, no problem: Your push gets rejected since it would add a new head.
On the development server, I merge the rebased revs into local as if I'd just pulled them:
...o--o--d--d--d (default)
\ \
L--L-----N (local)
I can now create a new dev branch on top of local, and continue development.
This has the benefits that I can develop on a version-controlled, configured setup; that I don't need to mess with patches; that previous configuration stages remain in the history (if my webserver stops working after an update, I can update back to a configured version); and that I only rebase once, when I'm ready to publish changes. The rebasing and subsequent merge might lead to conflicts if a revision conflicts with local configuration changes; but if that's going to happen, it's better if they occur when merge facilities can help resolve them.
1 Mercurial have (follow-up to comments) selective (string-based) commit - see Record Extension
2 Local changes inside versioned public files can be easy received with MQ Extension (I do it for site-configs all time). Your headache with MQ
Every time I want to commit a regular changeset to the local repo, I
need to pop all patches, commit the modifications, and re-apply the
patches. When I'm ready to push to the master repo, I must again pop
the patches, push, and re-apply them.
is a result of not polished workflow and (some) misinterpretation. If you want commit without MQ-patches - don't do it by hand. Add alias for commit, which qop -all + commit and use this new command only. And when you push, you may don't worry about MQ-state - you push changesets from repo, not WC state. Local repo can also be protected without alias by pre-commit hook checking content.
3 You can try LocalBranches extension, where your local changes stored inside local branches (and merge branches on changes) - I found this way more troublesome, compared to MQ

Doesn't TortoiseHg Auto Synch with Server

I have created a repository on https://bitbucket.org/ and use TortoiseHg to clone it to a folder in my local machine. I am able to add files commit files, but what I find is they never get updated on the Server at Bitbucket. By some fiddling, I found that there is this synch option. What I don't get is, why do I have to press the Synch. If I meant to commit, then it should commit.
Where is it being stored If it's not synched immediately with the remove server.
Note: I am trying out TortoiseHg and Mercurial while having ample experience on SubVersion.
When you commit, you commit to your local repository - i.e. to the .hg directory in the root of your project.. To synch with the remote repository you need to explicitly push your changes. This is how DVCSs work - it's not the same model as SVN.
A key feature of a distributed version control system is that you can make local commits. This means that the new commit does not leave your machine when you press "Commit", it is just stored locally.
This has some direct consequences:
you can work while you are offline, e.g., in a train or in a plane
commits are very fast: in Mercurial, creating a new commit involving n files (normally called a changeset) means appending a few bytes to n + 2 files.
you can change your mind: since you have not shared the new changeset with anybody, you can delete it from your local machine without problem
It also has some indirect consequences:
because commits are fast, people tend to make many more commits. The commits are typically more fine-grained than what you see in a centralized system and this makes it easier to review the changes
because commits are local, it often happens that people do concurrent work. This happens when both you and I make one or more commits based on the same initial version:
[a] --- [b] --- [c] <-- you
/
... [x] --- [y]
\
[r] --- [s] <-- me
The history has then effectively forked since we both started work based on changeset y. For this to work, we must be able to merge the two forks. Because this happens all the time, you'll find that Mercurial has very robust support for merging.
So, by decoupling the creation of a commit with the publishing of a commit, you gain some significant advantages.

Setting up a mercurial mirror

Can anybody tell me how to set up a mirror of a mercurial repository? I have a mercurial repo on my laptop, but want to auto mirror the repo on a NAS drive as a form of backup. Ideally, it would be cool if the solution checks a known location for a repo, and if one doesn't exist, create it, and from then on mirror any changes.
Another thing to bear in mind is that the NAS may not always be available, so I would need to accomodate this in some way.
I did something similar with git, but all the functionality should be in mercurial too.
I created manually a clone on some server (in my case a VPS somewhere on the net in case my house burns down with NAS and laptops in it).
With git you can create a "naked" repository, i.e. w/o a branch checked out.
Then I regularly push to it.
This can be automated using 'hooks', more info here .
The trick is to get the handling off the commit hook (oun intended) and that the syncing is not in your workflow. Run your push script using the 'at' command in a couple of minutes time. Then it runs asynchronously in the background. I would not be fancy here, try and handle failures gracefully.
You now have a setup which will keep the backup synched within a couple of minutes.
Mercurial gives you the freedom to do that however you would like. If you wanted, you could just setup a process to copy the repo from your local machine to the NAS at a regular interval. Everything about the repo is stored in the directory, and everything in the directory is just a file.
However, it sounds to me like you want to setup something more akin to a version control system like Subversion. I do something like this with one of my projects (actually, I moved it from SVN to Mercurial, but that's a different answer).
I have a repository on xp-dev.com and my local repository on my computer. I do all of the work on my local repository I want to do, issuing hg com very frequently. When I am done for the day/night I do a hg push ssh://hg2.xp-dev.com/myrepo to send all of my local changes to the remote server.
So, really all you want to do is an hg push to put your local repo on your NAS and then remember to do it again on a regular basis.