How to speedup clone speed on mercurial servers in distributed environment? - mercurial

I would like to know which approach would be better for using mercurial.
Currently I do have I do have 4-5 geographic sites where I have build farms and currently we are synchronizing mercurial repositories to local mercurial read-only servers in order to minimize the traffic between sites.
Still, we have lots of troubles due to failure to sync.
I am wondering if other solutions would not be better and much easier to maintain: like using HTTP proxies and eventually making the proxies invisible by using geographic DNS.
Now the build machines are using hg clone and the repositories are changed often.
What would be best approach to deal with this?
Note: If you have bad experience add them as comments.

Related

Is a Central Mercurial repo necessary?

Trying to find a workable workflow for multiple developers in our Coldfusion shop before we implement.
Currently, most of us (still) work directly in production. I want to change that.
If each developer has their own repo and there are repo's on the test and prod web servers, what is the value in a 'central' repository? What value does something like BitBucket add in this scenario?
If you use a central repository, you can put the development into a dev branch and leave the production branch for only bugfixes. Also, I think it is a bad idea to run off a mercurial repo in a production environment. Think about a regular deploy strategy to decouple the production server from the repository.
But I must admit, that I have no experience in Coldfusion, maybe there it is totally okay to run directly from a repository.
The chief advantages of bitbucket are the lack of server setup/maintenance/backup, you can get to it anywhere you have internet access.
#luksch was right to question running prod directly from a cloned repo. At a minimum you want to make sure you're not serving up the .hg directory. I'd encourage you to use some kind of deployment script that would grab the source at a tag from mercurial package it, place it on the repo and also restart or do anything the CF server needs.
The best experience I had with cold fusion (much like I have a best root canal) was when we ditched adobe's server and used railo, this freed us up from paying adobe for licenses on all our servers as well as made it easy to package the app and it's runtime in a war - thus making deployment super easy.
A central repo hosted on BitBucket or your company's servers (assuming they are physically protected and backed up) gives you the advantage of reliably having them available and business continuity in case of something really bad happening.
Multiple copies of the repository are good if a hard drive with one of them crashes. But if fire or theft wipes out all of the hard drives with the repos (which I have read one case of that happening) it leaves you with nothing.
One of the best things about distributed version control is that it allows you to design your workflow around your own development processes. You do not need a central repository but many projects end up using one.
A central repository is a great way of keeping track of the latest version of the code base. This means that when a developer wants to get a copy of the latest code they always know where to pull / clone from rather than having to ask around the team.
Having a central repository doesn't limit you in any way, you can still use other workflows along side it. For example if a few members of the team are working on a feature they can push and pull between their development repositories without pushing to the central repository.

Adding centralized configuration to our servers

As our systems grow, there are more and more servers and services (different types and multiple instances of the same type that require minor config changes). We are looking for a "cetralized configuration" solution, preferably existing and nothing we need to develop from scrtach.
The idea is something like, service goes up, it knows a single piece of data (its type+location+version+serviceID or something like that) and contacts some central service that will give it its proper config (file, object or whatever).
If the service that goes online can't find the config service it will either use a cached config or refuse to initialize (behavior should probably be specified in the startup parameters it's getting from whom or whatever is bringing it online)
The config service should be highly avaiable i.e. a cluster of servers (ZooKeeper keeps sounding like a perfect candidate)
The service should preferably support the concept of inheritence, allowing a global configuration file for the type of service and then specific overrides or extensions for each instance of the service by its ID. Also, it should support something like config versioning, allowing to keep different configurations of the same service type for different versions since we want to rely more on more on side by side rollout of services.
The other side of the equation is that there is a config admin tool that connects to the same centralized config service, and can review and update all the configurations based on the requirements above.
I know that if I modify the core requirement from serivce pulling config data to having the data pushed to it I can use something like puppet or chef to manage everything. I have to be honest, I have little experience with these two systems (our IT team has more), but from my investigations I can say it seemed they are NOT the right tools for this job.
Are there any systems similar to the one I describe above that anyone has integrated with?
I've only had experience with home grown solutions so my answer may not solve your issue but may help someone else. We've utilized web servers and SVN robots quite successfully for configuration management. This solution would not mean that you would have to "develop from scratch" but is not a turn-key solution either.
We had multiple web-servers each refreshing its configurations from a SVN repository at a synchronized minute basis. The clients would make requests of the servers with the /type=...&location=...&version=... type of HTTP arguments. Those values could then be used in the views when necessary to customize the configurations. We did this both with Spring XML files that were being reloaded live and standard field=value property files.
Our system was pull only although we could trigger a pull via JMX If necessary.
Hope this helps somewhat.
Config4* (of which I am the maintainer) can provide you with most of the capabilities you are looking for out-of-the-box, and I suspect you could easily build the remaining capabilities on top of it.
Read Chapters 2 and 3 of the "Getting Started" manual to get a feel for Config4*'s capabilities (don't worry, they are very short chapters). Doing that should help you decide how well Config4* meets your needs.
You can find links to PDF and HTML versions of the manuals near the end of the main page of the Config4* website.

Is it bad to have your test and production environments on the same machine?

Would it be bad to have things set up so that MySite.com is production and test.MySite.com is test? Both running off the same machine. The site doesn't get a lot of traffic.
UPDATE
I am talking about an ASP.NET web application running on a Windows server.
Yes, it is a bad idea.
Suppose your test code has a bug that consumes all memory/cpu/disk space? Then your production site goes down.
Have separate machines for production and test and use DNS to point the URLs to each.
Edit (more points):
If the sites share a machine, they share an IP address, so when using an IP address to access a site, you will not know whether you are on production or test.
When sharing the same machine, deployment can be tricky, you have to be extra careful not to deploy untested code to production (easier to do, since both live on the same machine).
The security considerations for production and test should be separate - this kind of setup makes it more difficult.
It'd be really hard to test whether environment updates (new version of php/perl/python/apache/kernel/whatever) with test and production on the same machine.
It is a bad idea. When you have a new untested feature it may kill the production site.
Is compliance with any kind of standard an issue? Generally you want developers to have lots of access to test environments so they can resolve issues. However, it's not always a good idea (or even allowed) for developers to have the same level of access to production systems.
In theory, yes. When developing, there are a lot of things that could go awry, like #Oded mentioned. By having a dedicated webserver run your main site, you avoid the complexity of having duplicated databases, virtual hosts, etc. You could certainly make test.mysite.com publicly available, though.
As a customer, often times, the first thing I do is visit a company's website. If the site is inaccessible, even briefly, it looks unprofessional and I quickly lose interest. You do not want to lose business because you were too cheap to buy one extra computer!
Edit: I see from your comments above that this is indeed a business server. Answer updated.
"Good, bad, I'm the guy with the gun." - Ash
Bad is really a range. It can be anywhere between replacing motherboards with the power plugged in and wet hands, to using excessively short variable names. What you really want to know is what are the tradeoffs. You obviously know some of the benefits or you wouldn't be thinking about using the production server for testing.
The big con is the test code is running in a shared environment with production. If there is no sandbox (process limits, memory limits, disk limits, chroot file system, etc) you risk impacting the production server is something goes awry in the testing. You may accidentally DOS your self by consuming all of a particular resource. You may accidentally remove the production site. Someone may think it's okay to do a load test. If you are fine with taking those risks, then you can go ahead an run your test app on the production server.
BTW: It is bad.
As your question is not platform specific, I'll try to answer in a general form. I'll lso refer only to the "same machine" part of your question, since the "domain name" should be very easy to change ... if all common precautions have been taken.
What you really need is to isolate environments. Depending on the technology used, that may mean "separate machines" or not.
As an example, a lot small to medium banks in the world run their critical systems on one mainframe. It's not unusual for one of those beasts to cost (peripherals and all) six figures. Some of them opted to have separate, smaller machines for development and testing, while others run hundreds of environments (sometimes as VMs) on the same machine. The tricky detail is that the mainframe hardware and OS do provide real and consistent isolation between those environments, assigning disks, CPUs, comm channels, credentials, libraries, OS modules, DBs, etc, etc based on a strict policy that can be as granular as you want.
The problem with many other platforms is that finding the way to isolate the environments is up to you, while in a dinosaur platform is provided by the grace of HAL.
HTH!

How do SaaS companies verify and track the code they release to the customers?

I'm working at a SaaS company who releases new features and bug fixes to our customers every six weeks. When we write code changes they pass through different steps (like a state machine) before reaching the production servers. The steps are different depending on if the change is done in the regular development cycle or as an emergency fix. We're currently using Harvest to manage the steps and track what code (features and bug fixes through packages) is being released to the customers and in that sense it's working well.
Unfortunately Harvest is both expensive and a pain to use from a programmer's point of view. Branching and merging is a nightmare. So we're looking into switching to Mercurial. Mercurial seems to excel in those areas. However, Mercurial doesn't seem to be made for tracking changes or manage the above mentioned process, it only does SCM.
Q: What options do we have when it comes to the release process, surely there are other SaaS companies (e.g. Google, Flickr, Facebook, LinkedIn) out there who wants quality control before releasing code to production servers?
Q: Is it a bad idea to try and build the process in Mercurial or are there other tools that we need to use together with Mercurial?
[Edit]
To clarify, this is our (suggested) branch structure.
Here's the process flow we currently have in Harvest:
Hotfix <--> Test Level 1 <--> Test Level 2 <--> Master (Production)
Feature <--> Test <--> Release Test <--> Master (Production)
I'm not looking for a bug tracker, but rather a deployment tool that helps us track and deploy code that has been verified by our testers (code in release branch). If there is more than one hotfix being worked on at the same time we need to be able to test them together and if one breaks the code we need to be able to "demote" the code breaking changes back one step in the process flow. Today it's enough for the two developers to "promote" their changes to Test Level 1 and the system can be tested with both changes together. If one developer's changes breaks anything only when it's together with the other developer's code it can easily be demoted back from Test Level 1.
However, Mercurial doesn't seem to be made for tracking changes or manage the above mentioned process, it only does SCM.
It's best to use a separate tool for issue tracking. That way you can use the best of breed for each task. Just makes sure that you select one which integrates well with your version control system.
To give some examples: jira (commercial) and trac (free) both have mercurial integration plugins. They also have customizable workflow states, allowing you to model your process.
I've come to realize that what we're looking for is not an issue tracker, but rather a deployment tool to replace that aspect of Harvest. Go and Anthill Pro are two candidates.

Does anyone use Virtualization to create a quicker disaster recovery of a development environment?

I'm getting pretty tired of my development box dying and then I end up having to reinstall a laundry list of tools that I use in development.
This time I think I'm going to set the development environment up on a Virtual Box VM and save it to an external HDD so that way I can bring the development environment back up quickly after I fix the real computer.
It seems to be like a good way to make a "hardware agnostic backup" and be able to get back up to speed quickly after a disaster.
Has anybody tried this? How well did it work? Did it save you time?
I used to virtualize all my development eviroments using VirtualBox.
Basically, i have a Debian vbox image file stamped in a DVD. When i have a new project i copy it to one of my external hdds and customize it to my project.
Once my project was delivery, then i copy the image from my external hdd to a blank DVD and file it.
I've done this with good success, we had this in our QA environment even and we'd also make use of Undo disks, so that if we want to test for example Microsoft patches we could roll the box back to it's previous state.
The only case we had issues was on SQL Server's particullary if you do a lot of disk activity. We had two VM's replicating gigs of data btw each other hosted on the same physical box. The disks just couldn't keep up; however, for all the other tiers it worked like a breeze.
One cool idea I just saw a presentation on is using VirtualBox, and have your host using OpenSolaris with ZFS. That makes it easy to take a snapshot of your image(s), and rollback to the snapshot when things go wrong, or when you want to restore to a known state for QA purposes.
I keep all development on virtual machines. In a multi-developer shop this allows for rapid deployment of a new development environment if someone fries their VM (via service pack or whatever) and allows a new developer to join the project almost immediately.
K
I'm reading the question much differently than the rest of you guys. I read it as the OP asking about keeping an image of a fresh install as a VM, then, when a server needs to be redeployed, you can restore from a backup of the VM.
In this case, the VM is nothing more than a different way of maintaining an image of an OS install, and if it works, it's not a half bad idea, IMO.
In the companies I work with, I encourage the use of network installable operating systems. With the right up-front work you can configure a boot server on your office network which will install your base operating system, all the drivers you need for your hardware, and all the software you'll use. Not only will this bail you out in a disaster scenario where you lose a machine, but it makes deploying hardware for new employees trivial.
This is easier with Linux than it is with Windows or Mac, but the latter two can work in this manner too.
I use the same network install methods for deploying servers in a live environment too.
The Virtualisation approach isn't a bad answer to the same problem, but to me it doesn't seem quite as clean.
That's not the way to go.
When you are developing you want to have many tools, some which require a lot of computing power.Keep in mind that (IIRC, I couldn't find it on VBox website ) only emulates a PIV.
At the moment only one VM simulates a dual core CPU, and that's very new. This is important because there are race conditions that can only be seen on multiple CPU machines, so you want to test your code under multiple CPU/cores.
I think a simpler and better thing to do is make a disk image of your system and configuration partitions, restore it once a month to keep a clean system, and restore it
when ever your system gets mussed.
Now a quick word about Windows, since the other systems where I have done this are no problem. The partitions that you image, should not be changed in between. Not a problem
for other OS's, but some briliant person decided to put Profiles on Windows smack dab in the system files. I simply make it a point to not put anything in my Profile (or on my Desktop which is in my Profile ) that I'm not willing to lose.