Tuning mercurial server and identifying bottleneck - mercurial

I have a very large mercurial repo that takes hours for an initial clone. Clone is being done over https, via scmmanager. I would like to try and get this down to minutes, if possible.
My mercurial repo is running on a server with 24 cores, the load is around 2 while doing a clone from my workstation. I'm wondering how I can tune mercurial on the server to use more cores perhaps. iowait is at 0. Network traffic is low on the server, iftop shows 5Mb/s TX, and I have gigabit ethernet. 4 gigs of ram are being used out of 64 gigs total, and 24 gigs are used for disk cache and 1.5 gigs for buffers.
On my workstation, I have tried renicing hg to -5. Load is around 0.5. My workstation has 8 cores. My workstation has gigabit ethernet also, and seeing minimal traffic also, around 2 Mb/s RX. iowait is also zero on my workstation.
OS for mercurial server is CentOS 6. OS for workstation is Debian jessie. Any ideas would be greatly appreciated.
Edit: tried doing a clone over ssh as well as a local non-hardlinked uncompressed clone. Both take a very long time. (Hours) Repo size is 8 gigs. Unsure how to proceed.

The clone itself is very low complexity operation for Mercurial. Suspect scmmanager. Try cloning direct from the filesytem using a ssh:// URL and load should be just about zilch.
Also try disabling compression on the clone with --uncompressed. If it helps hgweb makes that settable on the server side.

What is the total size of your repositories?
The suggestions Ry4an makes, should indeed resolve your issue. If they don't, you could also create a bundle and transfer that for the initial clone. See hg help bundle for details.
Basically: this creates a single file containing revisions (in your case, you'll want to use '--all') to make sure it contains the entire repository). You can then use wget or any other command to download the file (which should be much faster than 5 Mbps).

Related

What is maximum file size limit in mercurial

Currently, I cant clone the mercurial directory due to following error
Abort: stream ended unexpectedly.
We have few files that are larger than 10MB in size. These files are already uploaded on mercurial repository but we are getting error while doing a clone of that directory. We also have checked our internet connection which is not the issue. please guide what is the maximum size mercurial can transfer.
Kind Regards
We've had this issue when hosting the "master" repository on a shared host.
The hosting company had routines in place which would kill any processes using too much memory, and it seems hgweb loads most of the repository in memory during cloning. Thus if the timing was right, hgweb would get killed in the middle of the cloning operation, producing the error message you posted on the client.
We've moved our "master" repository to Bitbucket for now.
If it's an abruptly aborted stream it's not any limitation that Mercurial is imposing -- that would come with a clear error message. What server do you have hosting hgweb? Are you using Apache or another http server? Are you going over ssh? This is more likely trouble at your network level than it is a Mercurial configuration issue -- and certainly it's not a fundamental Mercurial limitation.
Generally the limits are in Gb area and defined by the operating system rather than mercurial (https://www.mercurial-scm.org/wiki/HandlingLargeFiles).
However, you repository might have a hook configured to limit the binary file size. See for example https://www.mercurial-scm.org/pipermail/mercurial/2009-January/023322.html
So you need to check the configuration of your repository in .hg/hgrc

TortoiseHg - seems to be very slow on Win7

I have recently setup a Mercurial clone with TortoiseHg on our network - it seems to take forever to add files, do commits etc.. It usually hangs for 3-5 minutes at a time & for seem reason it really doesn't like any kind of right-clicking in TortoiseHg.
I am fairly new to Mercurial so there could be some settings to speed this all up but I am not sure of how to best approach this, my pc specs are below:
Intel Core 2 Quad CPU Q8300 # 2.50Ghz
4GB RAM (3GB Usable)
The actual clone is pretty big - in total just around 200MB, I'm not sure if this large size is causing the slowdown, or the fact that the clone itself isn't on my machine but on our local network.
Any ideas of how best to optimise everything?
First, I would try to do the same mercurial operations from the command line, to rule out a slow GUI.
We have the same setup here at my work. The "main" repos is on a mapped network drive. Accessing this is slow, so I've made a local clone for fast access and only synchronizing when necessary.
Now when I think about it, why don't you have a clone on your local machine? Isn't that the entire point of dvcs?

Is system-wide Mercurial installation enough in a shared enviroment?

I am learning how I can install Mercurial on our team system, but I am not experienced enough to make some decision.
For our team, we have a server machine used as a repository. Every team member also has her/his own Linux RedHat installed machine. However, we do not do anything on our local terminals and we do everything on the server. Every member has a user directory on the server such as /home/Cassie, /home/john, ... and we save all our code and work there. When we turn on the local terminals, the GNOME system shows our personal files on the server not the local machine. Whenever everyone click the terminal application on desktop, it connects to her own home directory. Thus, we do not need to use SSH command to connect to the server. It is like the school multi-users system. Everyone has a user account and she logs into her own account to do her own work. I hope I can install a shared repository on that server and every one can do push, pull, etc. all kind of commands there.
1) Since we use a shared environment, does it mean that I need to install Mercurial on only the server and that is enough for everyone to do "commit", "push", "pull", etc. commands?
2) By installing only system-wide Mercurial, does it eliminate the ability to do local commit? If I would like to let everyone still have the "local commit" ability, how should I do it?
3) I have searched online. Some people mentioned that for a shared network server, it is impossible to have locks for any two users if they are trying to access the same file at the same time. Does it imply my situation?
In sum, we do all the work on the server. I hope to find a plan to have Mercurial control on a repository shared by everyone when everyone still has local commit ability and the repository still has some locks protection if any two users try to access a file at the same time. If this scenario is feasible, can I just install the Mercurial on the server or I need to install Mercurial for both servers and users machines? If it is impossible for the scenario, would someone please suggest me a plan to have version control for our system?
1) Since we use a shared environment, does it mean that I just need to install the Mercurial on the server and it is enough for everyone to do "commit","push","pull"..etc commands ?
If your users are logging into a shell on the server in order to do their work, then yes it is sufficient to have Mercurial installed only on the server.
2) By installing only system-wide Mercurial, does it eliminate the ability to do local commit ? If I would like to let everyone still have the "local commit" ability, how should I do it ?
Your users will presumably checkout from a shared "root" repository into their own home directory in order to work on the code. They will have a "local" copy of the repo in their home directory and will push into the shared root repository.
3) I have searched online. Some people mentioned that for a shared network server, it is impossible to have locks for any two users if they are trying to access the same file at the same time. Does it imply my situation ?
As long as your users are working within their own local copies of the repo, they will not interfere with one another. The only time a conflict may arise is when committing back to the shared root repository -- in which case the user will need to merge their changes and resolve any conflicts.
I would recommend reading carefully through Joel Spolsky's excellent Hg Init tutorial for a better understanding of how Mercurial handles "central" and "local" copies.

Many binary files synchronization

I have about 100 000 files on office server (images, pdf's, etc...)
Each day files count grows about 100-500 items, and about 20-50 old files changes.
What is the best way to synchronize Web-server with these files?
Can any system like Mercurial, GIT help?
(On office server, I'll commit changes, and web-server periodically do updates)?
Second problem is, that on Web-server I have user-generated-content (binary-files) (other files).
Each day this users upload about 1000-2000 new files. Old files don't change.
And I need to backup these files to local machine.
Can any system like Merurial, GIT help in this situation?
(On web-server I'll commit these files by cron, and on local machine I'll do updates)
Thanks
UPD.
Office server is Windows Server 2008 R2
Web-server is Debian 5 lenny
The simplest and most reliable mechanism (in my experience) is rsync.
On Windows, however, rsync over ssh is badly broken due to issues with how Cygwin interacts with named pipes. Rsync over its own protocol works (as long as you don't care about encryption), but I've had lots of problems getting rsync to stay up as a Windows service for more than a few days at a time. DeltaCopy is a Windows app that uses the rsync tools behind the scenes; it seems to work very well, though I haven't tried the ssh option.
A DVCS is not a good solution in this case: it will keep the all history, which you don't always need, and will make any clone a massive operation.
An artifact repository like Nexus is much more adapted if you need some kind of versioning with integrity check associated with your binaries.
Otherwise (no versioning), a simple rsync like Marcelo proposes is enough.

Hudson slaves, how to access workspace

Howto configure system to have one master and multiple slaves where building normal c-code with gmake? How slaves can access workspace from master? I guess NFS share is way to go, but if that's not possible any other options?
http://wiki.hudson-ci.org/display/HUDSON/Distributed+builds is there but cannot understand how workspace sharing is handled?
Rsync? From master: SCM job -> done -> rsync to all slaves -> build job and if was done on slave -> rsync workspace back to master?
Any proof of concept or real life solutions?
When Hudson runs a build on a slave node, it does a checkout from source control on that node. If you want to copy other files over from the master node, or copy other items back to the master node after a build, you can use the Copy to Slave plugin.
It's surely a late answer, but may help others.
I'm currently using the "Copy Artifact plug-in" with great results.
http://wiki.hudson-ci.org/display/HUDSON/Copy+Artifact+Plugin
(https://stackoverflow.com/a/4135171/2040743)
Just one way of doing things, others exist.
Workspaces are actually not shared when distributed to multiple machines, as they exist as directories in each of the multiple machines. To solve the coordination of items, any item that needs distributed from one workspace to another is copied into a central repository via SCP.
This means that sometimes I have a task which needs to wait on the items landing in the central repository. To fix this, I have the task run a shell script which polls the repository via SCP for the presence of the needed items, and it errors out if the items aren't available after five minutes.
The only downside to this is that you need to pass around a parameter (build number) to keep the builds on the same page, preventing one build from picking up a previous version built artifact. That and you have to set up a lot of SSH keys to avoid the need to pass a password in when running the SSH scripts.
Like I said, not the ideal solution, but I find it is more stable than the ssh artifact grabbing code for my particular release of Hudson (and my set of SSH servers).
One downside, the SSH servers in most Linux machines seem to really lack performance. A solution like mine tends to swamp your SSH server with a lot of connections coming in at about the same time. If you find the same happens with you, you can add timer delays (easy, imperfect solution) or you can rebuild the SSH server with high-performance patches. One day I hope that the high-performance patches make their way into the SSH server base code, provided that they don't negatively impact the SSH server security.