We're moving a social media service to be on separate data centers as our other hosting provider's entire data center went down. Twice.
This means that both websites need to be synchronized in some sense -- I'm less worried about the code of the pages, that's easy enough to sync, but they need to have the same database data.
From my research on SO, it seems MySQL Replication is a good option, but the MySQL manual, for scaling out, says that its best when there are far more reads then there are writes/updates:
http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-scaleout.html
In our case, it's about equal. We're getting around 200-300 thousand requests a day right now, and we can grow rapidly. Every request is both a read and write request.
What would be the best method or tool to handle this?
Replication isn't instantaneous, and all writes have to be sent over the wire to the remote servers, so it takes bandwidth too. As long as this works for you and you understand the consequences, then don't worry about the read/write ratio.
However, are you sure that you need global replication? We handle millions of requests and have one location, with multiple web servers connected to two databases. One database is the live database, and the other is a replicated read only database.
We do have global fail over locations, and some people connect to these on any day, even if our main node is up because they have Internet issues. The data just trickles in though.
If the main node went down, then every body would be using the global fail over locations, in order. So, if our main node died, all customers would connect to Denver. If Denver went down, they'd all connect to Columbus.
Also, our main node is on two different Internet providers, so one ISP going down doesn't take us down.
Is the connection speed between two datacenters good enough? You can copy files to a new server and move database there. And then setup old server so that it will connect to new server's MySQL database in another DC? This will be slower of course, but depending on the nature of your queries it can be acceptable. As soon as DNS or whatever moves/finishes, you just power off the old server when there is no more requests for it.
To help you to assess your options you need to consider what your requirements are in a disaster recovery scenario (i.e. total loss of the system in one data-centre).
In particular for this scenario, how much data can you afford to lose (recovery point objective - RPO), and how quickly do you need to have the standby data-centre version of the site up and running (recovery time objective - RTO).
For example if your RPO is no transactions lost and recovery in 5 minutes, then the solution would be different than if you can afford to lose 5 mins of transactions and an hour to recover.
Another question I'd ask is if you're using SAN storage at all? This gives you options for replication at the storage level (SAN array to SAN array), rather than at the database level (e.g. MySQL replication).
Also to consider is the distance between the data-centres (e.g. timewise can you afford to perform a synchronous write to both databases, or would an asynchronous replication approach be more appropriate)
Related
I've seen pictures like this where multiple rails engines write to a single mySQL server.
1) Is this possible? Or does Rails want each application server to write to one database server?
2) If this is possible, how is it accomplished? Are there queues and a scheduler between the application servers and the write database server?
Scaling a mysql db is a pretty difficult thing to do, but its certainly been done plenty of times and there are a lot of best practices out there for you to take advantage of. The first thing you should know is that before you worry about scaling writes for a while yet, you probably need to scale your reads first.
Scaling reads can be done fairly easily using replication. There are several tools out there that make managing replication a lot easier such as Amazon RDS. Generally speaking many web severs can connect to many databases (as suggested by others), however you quickly run into scale issues once you have a lot of traffic, connections or whatever other action you are performing that generates load on the server.
As replicated severs are read only, you need to manage which sever you connect to depending on the action you're performing. I.e. if you had a users table, when creating, updating or deleting users you need to use the "write" database (the primary "source" sever) but when reading the user table, you can use one of the read replicas. This reduces the load on the primary write sever (allowing it to deal with even more writes) and as you can have multiple read databases behind a load balancer, you can get away with this structure for a very long time and scale reads across tens of database severs before you'll hit any significant issues (however most apps get away with 1-3).
There are situations where you will need to use your write database for read actions (although you should avoid it as much as possible) as the read replicas can be slightly behind the write dbs due to latency in replicating the write db queries, however most of the time you should be able to code knowing that there is the possibility that the read db is delayed (i.e. queue actions a reasonable period of time such that the updates will propagate across all the read severs) and simply use one of your read dbs rather than the write db.
Beyond this the key items to work on are ensuring you have efficient indexes and applying other best practices around maintaining a sensible data structure. You might also want to consider having 3 distinct "groups" of database servers. I generally like to have write, read and "stats" db groups. The write group for create, update and delete operations (as well as select for update), the read for general read items that must return their results quickly, and stats for anything that is going to be under high load and that you do not rely on for a prompt response (this keeps heavy queries that are not time sensitive away from your read db that you need quick responses from for general reads)
Once you get into a situation where you can no longer buy larger hardware and you're near maxing out your write capacity, you'll need to look into sharding, however that will take a lot of traffic / data (so dont worry about it unless you've done all of the above already).
I have a project where we have one central system that exposes an API on top of MySQL. We now need to replicate that same service locally on several different boxes (which could be 50+). We wanted to have a local cache of the DB on each of those boxes to ensure quick responses and failover if the "central" system goes down.
Any idea what's the best design for this? I was thinking some sort of master/slave set up, but I'm not sure if that works with 50+ servers. I'm not sure what's the best approach.
What about MySQL's own replication solution? If you've already ruled that out, you should say why.
With the replication that I've seen, you have a master and slave(s). If the master goes down, one of the slaves takes over. With 50+ slaves, you'd have a long (and confusing) chain of masters.
Not knowing anything about the type of data you have or the read/write percentages, I would suggest one of the following:
Cache static data locally (memcache, etc). Reads would be local, with writes going back to the mysql master. This works for mostly-static configuration information. I have 6 servers in that setup now.
Shard your data. With 50 servers, set them up in 25 master/slave pairs and put 1/25th of the data on each shard. Get one more server for N+1 redundancy.
Hope that helps.
Q:
I've inherited a system that consists (for simplicity) of 2 application servers that write to a single master database. One application server performs quite a few operations {small amount of time, like milli seconds. } per unit of time. The other application server acts like an API Server, through which clients interact. This "API" server operates on half the tables in the database most of which are not needed by the other application server. However the "API" server does cause the other application server, through its interaction with SQL Server, to lose time and performance.
I wanted to know what would be a good approach in resolving this.
idea's so far
[1] create a second database which will be master-master slaved with current database. Getting http://mysql-mmm.org/ scripts and running then. (concurrency?)
[2] slowly begin moving tables from "master" database into a new "API" database. (lots of legacy code..)
[3] some kind of a SQL priority queue.. (how fault tolerant can this be?)
Step 1 - work out where your bottleneck is
Step 2 - decide where your best return on effort is
If you simply want to make it perform better, then you have to work out where the slow point is. Ideally you would use 3 hosts, one for each application server and one for the database. In this configuration, you should quickly be able to work out if it is the database working the disks hard, or if it's CPU loading, lock contention etc.
Once you know where the bottleneck is, you'll have a much more focussed problem to fix. The options you have suggested may or may not help depending on what the real bottleneck is.
Consider a reasonably large website (2M+ pageviews / m, lots of users) with 2 frontend servers: one front server in the US, and one in Europe. Two dedicated URL bring the visitors on one of the server, one in the french language, the other one in english. Both sites share exactly the same data.
What would be the most cost effective solution? (DB used at my company: MySQL)
1/ A single Master server on Amazon EC2 (US), and slaves on the frontend servers?
Advantages: no master-master rep, meaning no risk of data conflict with autoincrement and duplicates on unique columns, etc..
Drawbacks: The lag! Won't there be too much lagging for writing in the US when you are in Europe?
Another drawback could be the lack of quick n dirty solution in case the master dies. And what about having slaves on same server as front?
2/ Two Amazon EC2 instances, one in the US, one in Europe, acting as master-master replication servers. Plus two slaves on each of the frontends?
Adv: Speed, and security of data. Of course there is no load balancer, but making a hack to switch the master to the other one seems pretty trivial.
Drwbcks: Price. And the risk of corruption on the DB
3/ Any other solution ?
As it is my first time working with servers in 2 continents, I would really appreciate learning from you experience in that area, including MySQL or not, including EC2 or not.
Thanks
Marshall
As usual, what I'm about to say depends on your app, how it uses the database, etc. You need to ask yourself:
If you're using off the shelf software, what have other people done in this situation?
Does the app need to work on the entire dataset, or can you partition?
Is your app built to handle multi-master replication (usually means uses autoincrement pk's)
What are the chances of update/delete collisions? What are the costs?
What's the read:write ratio? What's the nature of the writes? Are they typically update or append operations?
I'm assuming the french server is in Europe, while the English server is in the US? If you can partition your data so that the french site uses one DB and the english site uses the other, you're better off. Even if both sites access both DB's, since you don't have to worry about collisions. You can even run two mysql instances on each master server and do multimaster replication for both.
If you can't partition, I'd probably go with #2, but I'd designate one of the machines as the 'true' master and send all the writes to it to help avoid data clobber. This way it's easy to switch in a pinch.
If you're cost sensitive and you're going to run replicas on your front end servers anyway, just run the master databases on the front end servers. You can always pull it off later. Replicas can often have higher CPU/IO costs than masters taking the same read load: they have to execute their writes in serial, which can really screw things up.
Also, don't use m1.small instances for your DB. Or at least keep an eye on your performance. m1.smalls are significantly under powered, and if you watch top, you'll notice a significant percentage of your CPU time being stolen by the hypervisor. I recommend c1.medium's.
Don't use master-master replication, ever. There is no mechanism for resolving conflicts. If you try to write to both masters at the same time (or write to one master before it has caught up with changes you previously wrote to the other one), then you will end up with a broken replication scenario. The service won't stop, they'll just drift further and further apart making reconciliation impossible.
Don't use MySQL replication without some well-designed monitoring to check that it's working ok. Don't assume that becuase you've configured it correctly initially it'll either keep working, OR stay in sync.
DO have a well-documented, well-tested procedure for recovering slaves from being out of sync or stopped. Have a similarly documented procedure for installing a new slave from scratch.
Your application may need sufficient intelligence to know that a slave is out of sync or stopped, and that it should not be used, if you care about correct or up-to-date data. You'll need some kind of feedback from your monitoring to do this.
If you have a slave in, say the US when your master is in Europe, that would normally give you the amount of latency you expect, i.e. something in the order of 150ms more than if they were co-located.
In MySQL, the slave does not start a query until the master finishes it, so it will always be behind by the length of time an update takes.
Also, the slave is single-threaded, so a single "hard" update query will delay all subsequent ones.
If you're pushing your master hard on multithreaded write-load, assuming your slaves have identical hardware, it is very unlikely that they'll be able to keep up.
We are looking at a similar scenario - after Amazon Eastcoast has completely been cut off the net twice this week - meaning not even being replicated in multiple regions and using RDB instances in kept us available.
But DRB does not allow crossing from East to West or even into Europe.
We are now reviewing the approach of Master Master in East and West or even Europe with one master acting as a failover only, and failover via dnsmadeeasy which responds extremely fast.
Advantage: quick and reliable failover, short downtime, no complex management of the failover function.
Disadvantage: One extra system running without using it - but compared to using RDB that's not more expensive
DRB is nicely managed by Amazon including point in time recovery and so on - all that is lost by switching away from it. But the fact that it is limited to replications within only one area and that area can be completely cut off make it problematic. As an alternative to RDB backup we are looking at Zmanda open source tools to take care of backup management. NOt yet tested, but based on all our stuffing around with failover and databases and hardware and so this looks like the simplest and therefore most promising approach for high availiability.
This question is old, but the solution exists now: Galera. It does MySQL (InnoDB) replication, and works well with WANs, too. http://codership.com/
For my current project we are thinking of setting up a dual master replication topology for a geographically separated setup; one db on the us east coast and the other db in japan.
I am curious if anyone has tried this and what there experience has been.
Also, I am curious what my other options are for solving this problem; we are considering message queues.
Thanks!
Just a note on the technical aspects of your plan: You have to know that MySQL does not officially support multi-master replication (only MySQL Cluster provides support for synchronous replication).
But there is at least one "hack" that makes multi-master-replication possible even with a normal MySQL replication setup. Please see Patrick Galbraith's "MySQL Multi-Master Replication" for a possible solution. I don't have any experience with this setup, so I don't dare to judge on how feasible this approach would be.
There are several things to consider when replicating databases geographically. If you are doing this for performance reasons, be sure your replication model supports your data being "eventually consistent" as it can take time to bring the replication current in both, or many, locations. If your throughput or response times between locations is not good, active replication may not be the best option.
Setting up mysql as dual master does actually work fine in the right scenario done correctly. But I am not sure it fits very well in your scenario.
First of all, dual master setup in mysql is really a ring-setup. Server A is defined as master of B, while B is at the same time defined as the master of A, so both servers act as both master and slave. The replication works by shipping a binary log containing the sql statements which the slave inserts when it sees fit, which is usually right away. But if you're hammering it with local insertions, it will take a while to catch up. The slave insertions are sequential by the way, so you won't get any benefit of multiple cores etc.
The primary use of dual master mysql is to have redundancy on the server level with automatic fail-over (often using hearbeat on linux). Excluding mysql-cluster (for various reasons), this is the only usable automatic failover for mysql. The setup for basic dual master is easily found on google. The heartbeat stuff is a bit more work. But this is not really what you were asking about, since this really behaves as a single database server.
If you want the dual master setup because you always want to write to a local database (write to both of them at the same time), you'll need to write your application with this in mind. You can never have auto-incrementing values in the database, and when you have unique values, you must make sure that the two locations never write the same value. For example location A could write odd unique numbers and location B could write even unique numbers. The reason is that you're not guaranteed that the servers are in sync at any given time, so if you've inserted a unique row in A, and then an overlapping unique row in B before the second server catches up, you'll have a broken system. And if something first breaks, the entire system stops.
To sum it up: it's possible, but you'll need to tip-toe very carefully if you're building business software on top of this.
Because of the one-to-many architecture of MySQL replication, you have to have a replication ring with multiple masters: that is, each replicates from the next in a loop. For two, they replicate off each other. This has been supported from as far back as v3.23.
In a previous place I worked, we did it with v3.23 with quite a number of customers as a way of providing exactly what you're asking. We used SSH tunnels over the Internet to do the replication. It took us some time to get it reliable and several times we had to do a binary copy of one database to another (fortunately, none of them were over 2Gb nor needed 24-hour access). Also the replication in v3 was not nearly as stable as in v4 but even in v5, it will just stop if it detects any sort of error.
To accomodate the inevitable replication lag, we re-structured the application so that it didn't rely on AUTOINCREMENT fields (and removed that attribute from the tables). This was reasonably straightforward due to the data-access layer we had developed; instead of it using mysql_insert_id() for new objects, it created the new ID first and inserted it along with the rest of the row. We also implemented site IDs that we stored in the top half of the ID, because they were BIGINTs. This also meant we didn't have to change the application when we had a client who wanted the database in three locations. :-)
It wasn't 100% robust. InnoDB was just gaining some visibility so we couldn't easily use transactions, although we considered it. So there were race conditions occasionally when two objects tried to be created with the same ID. This meant one failed and we tried to report that in the app. But it was still a significant part of someone's job to watch over the replication and fix things when it broke. Importantly, to fix it before we got too far out of sync, because in a few cases the databases were being used in both sites and would quickly become difficult to re-integrate if we had to rebuild one.
It was a good exercise to be a part of, but I wouldn't do it again. Not in MySQL.