Homemade cheap and cheerful clustering with MySQL+EC2? - mysql

I've got a Java web service backed by MySQL + EC2 + EBS. For data integrity I've looked into DRBD, MySQL cluster etc. but wonder if there isn't a simpler solution. I don't need high availability (can handle downtime)
There are only a few operations whose data I need to preserve -- creating an account, changing password, purchase receipt. The majority of the data I can afford to recover from a stale backup.
What I am thinking is that I could pipe selected INSERT/UPDATE commands to storage (S3, SimpleDB for instance) and when required (when the db blows up) replay these commands from the point of last backup. And wouldn't it be neat if this functionality was implemented in the JDBC driver itself.
Is this too silly to work, or am I missing another obvious and robust solution?

Have you looked into moving your MySQL into Amazon Web Services as well? You can use Amazon Relational Database Service (RDS). Also see MySQL Enterprise Support.

You always have a window where total loss of a server and associated file storage will result in some amount of lost data.
When I ran a modestly busy SaaS solution in AWS, I had a MySQL Master running on a large instance and a MySQL Slave running on a small instance in a different availability zone. The replication lag was typically no more than 2 seconds, though a surge in traffic could take that up to a minute or two.
If you can't afford losing 5 minutes of data, I would suggest running a Master/Slave setup over rolling your own recovery mechanism. If you do roll your own, ensure the "stale" backups and the logged/journaled critical data are in a different availability zone. AWS has lost entire zones before.

Related

Downgrade AWS RDS Instance

I am using t2.large RDS instance, I want to downgrade to t2.micro to fit my current business. I have a few question to ask:
- How can I downgrade RDS instance without losing data and downtime ?
Thanks,
You can't really do it without downtime, but you could minimize the downtime.
The easiest option is to Modify the DB instance. This will result in downtime because a new database will be provisioned, the data will be relocated and the DNS name will be changed to point to the new instance.
Seeing that you believe a t2.micro will be sufficient for your database, it would be fair to assume that there would be times when your database is not in use so that you can perform the Modify operation. It should only take a few minutes.
Officially, the best way to modify a database without downtime is to use Multi-AZ, which can update one node while traffic is still being served by another node. However, your goal seems to be to reduce cost, rather than spending more to ensure uptime.
By the way, a t2.micro is quite limited in terms of CPU and network bandwidth. You are trying to save 21c per day, at the potential cost of having a poorly-responding database.
You can consider creating a read replica (t2.micro) of the master instance (t2.large). Once the read replica is in sync with the master instance, you can promote the read replica and then point the application towards the new master instance (which is the promoted read replica).
For reference, see:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_MySQL.Replication.ReadReplicas.html
https://aws.amazon.com/blogs/aws/amazon-rds-for-mysql-promote-read-replica/

MySQL Load Balancing

We have 6 Servers (4 Applications servers and 2 DB Servers)
We are using HAProxy to load balance between the Application and API servers (2/2)
Now the issue I'm having is that the system administrator setup a Master/Slave on the MySQL but it's always failing and until now we cannot use the slave since most data are always corrupted and we always need to fix it and each time we are getting different errors .
We tried to make some sort of load balancing for the read/write (write on master , read on slave) but we were not able to use that since slave data are not always correct .
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
Can someone tell me his own experience and what he used ?
What i found : Percona XtraDB Cluster , but before going into this direction need input ...
Thank's !!
You can choose MySQL/MariaDB+ Percona + HAproxy. This combination support Master- Master synchronization and Data sync work really well. The most of the Real-time Data synchronization has Issue with primary and foreign Key. You can avoid those issue too using Percona. Go ahead and Good Luck
The "table is full" error means your slave doesn't have enough space to perform the ALTER TABLE. You need to get larger disks to resolve that error.
But the subtext is that no one is monitoring your database servers, and that's a bigger problem. You need to get a database administrator, or else get a professional service to do it.
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
First, get it out of your head that any system has no risk. That's impossible, if you plan to use the system at all. You can't eliminate the possibility of errors, but you can be prepared to recover from them seamlessly.
The big guys do the following:
Hire operations staff including system administrators, network administrators, database administrators to take care of the servers.
Monitor everything. Use software to track system load, disk space, errors, and many other things continuously. The best option is New Relic. For MySQL slave integrity, use a tool like pt-table-checksum.
Redundancy. Create standby systems and data to take over when (not if) the primary system fails.
You probably want to learn about the field of high availability architecture. Check out this talk: Scalable Internet Architectures
Get on amazon ec2. You can launch 4 app server along with 2 db servers on the fly and set up load balancing using aws engineering features.
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-getting-started.html#define-load-balancer
https://aws.amazon.com/articles/1639

Should I be using MySQL Cluster, Master to Master replication, or something else?

I started just using a solo MySQL server and moved on to using Master to Slave replication. This is all inside our network. I have a cloud application where customers post orders to our system. When our ISP goes down it's a nightmare.
I'm looking to have one server on-site and one server off-site that can be in sync and if the one goes down the other one can take over and not miss a step. I have my DNS failover in place and 2 web servers, but I can't decide what I need to do for the MySQL servers.
I don't mind putting in the work to learn MySQL cluster, but I'm not sure if that is the correct solution or master to master or something else?
Scale: I have an orders table currently sitting at 150,000 rows and that could grow to 500,000 this year and possibly start getting into the millions over the next couple years.
Any advice would be greatly appreciated as I never had any formal schooling on the issue.
Thanks in advance.
Simpliest way - just place your server in some reliable datacenter. This way you can reduce failure rate to be tolerable to be handled semi-manually with a master-slave configuration.
If you need to host on-site - then look into improving your site's connectivity, like all datacenters do - have backup ISP channels, with own AS(ip autonomous system) and BGP routing - so when one ISP fails - even ips stay the same, traffic just balances to the others.
Mysql server does not support master-master replication, only mysql cluster supports multi-master, so in fact if you need fast failover - question is whether to host it yourself or use DBaaS (database-as-a-service, folks like cleardb).
DBaaS with SLA and failover is quite expensive, also adds some network delays because your own app and db servers groups most probably are in the same datacenters. But on the plus side - they are easier and faster to setup.

Reliability of MySQL master-slave replication

I have a an application that requires a master catalogue of about 30 tables which require to be copied out to many (100+) slave copies of the application. Slaves may be in their own DB instance or there may be multiple slaves in single DB instances. Any changes to the Master catalogue require to be copied out to the slaves within a reasonable time - around 5 minutes. Our infrastructure is all AWS EC2 and we use MySQL. Master and slaves will all reside within a single AWS region.
I had planned to use Master-Slave replication but I see reports of MySQL replication being sometimes unreliable and I am not sure if this is due to failings inherent in the particular implementations or failings in MySQL itself. We need a highly automated and reliable system and it may be that we have to develop monitoring scripts that allow a slave to continuously monitor its catalogue relative to the master.
Any observations?
When I was taking dance lessons before my wedding, the instructor said, "You don't have to do every step perfectly, you just have to learn to recover gracefully when missteps happen. If you can do that quickly, with a smile on your face, no one will notice."
If you have 100+ replicas, expect that you will be reinitializing replicas frequently, probably at least one or two every day. This is normal.
All software has bugs. Expecting anything different is, frankly, naive. Don't expect software to be flawless and continue operating 24/7 indefinitely without errors, because you will be disappointed. You should not seek a perfect solution, you should think like a dancer and recover gracefully.
MySQL replication is reasonably stable, and no less so than other solutions. But there are a variety of failures that can happen, without it being MySQL's fault.
Binlogs can develop corrupted packets in transit due to network glitches. MySQL 5.6 introduced binlog checksums to detect this.
The master instance can crash and fail to write an event to the binlog. sync_binlog can help to ensure all transactions are written to the binlog on commit (though with overhead for transactions).
Replica data can fall out of sync due to non-deterministic SQL statements, or packet corruption, or log corruption on disk, or some user can change data directly on a replica. Percona's pt-table-checksum can detect this, and pt-table-sync can correct errors. Using binlog_format=ROW reduces the chance of non-deterministic changes. Setting the replicas read-only can help, and don't let users have SUPER privilege.
Resources can run out. For example, you could fill up the disk on the master or the replica.
Replicas can fall behind, if they can't keep up with the changes on the master. Make sure your replica instances are not under-powered. Use binlog_format=ROW. Write fewer changes to an individual MySQL master. MySQL 5.6 introduces multi-threaded replicas, but so far I've seen some cases where this is still a bit buggy, so test carefully.
Replicas can be offline for an extended time, and when they come back online, some of the master's binlogs have been expired so the replica can't replay a continuous stream of events from where it left off. In that case, you should trash the replica and reinitialize it.
Bugs happen in any software project, and MySQL's replication has had their share. You should keep reading release notes of MySQL, and be prepared to upgrade to take advantage of bug fixes.
Managing a big collection of database servers in continuous operation takes a significant amount of full-time work, no matter what brand of database you use. But data has become the lifeblood of most businesses, so it's necessary to manage this resource. MySQL is no better and no worse than any other brand of database, and if anyone tells you something different, they're selling something.
P.S.: I'd like to hear why you think you need 100+ replicas in a single AWS region, because that is probably overkill by an order of magnitude for any goal of high availability or scaling.

What would be my best MySQL Synchronization method?

We're moving a social media service to be on separate data centers as our other hosting provider's entire data center went down. Twice.
This means that both websites need to be synchronized in some sense -- I'm less worried about the code of the pages, that's easy enough to sync, but they need to have the same database data.
From my research on SO, it seems MySQL Replication is a good option, but the MySQL manual, for scaling out, says that its best when there are far more reads then there are writes/updates:
http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-scaleout.html
In our case, it's about equal. We're getting around 200-300 thousand requests a day right now, and we can grow rapidly. Every request is both a read and write request.
What would be the best method or tool to handle this?
Replication isn't instantaneous, and all writes have to be sent over the wire to the remote servers, so it takes bandwidth too. As long as this works for you and you understand the consequences, then don't worry about the read/write ratio.
However, are you sure that you need global replication? We handle millions of requests and have one location, with multiple web servers connected to two databases. One database is the live database, and the other is a replicated read only database.
We do have global fail over locations, and some people connect to these on any day, even if our main node is up because they have Internet issues. The data just trickles in though.
If the main node went down, then every body would be using the global fail over locations, in order. So, if our main node died, all customers would connect to Denver. If Denver went down, they'd all connect to Columbus.
Also, our main node is on two different Internet providers, so one ISP going down doesn't take us down.
Is the connection speed between two datacenters good enough? You can copy files to a new server and move database there. And then setup old server so that it will connect to new server's MySQL database in another DC? This will be slower of course, but depending on the nature of your queries it can be acceptable. As soon as DNS or whatever moves/finishes, you just power off the old server when there is no more requests for it.
To help you to assess your options you need to consider what your requirements are in a disaster recovery scenario (i.e. total loss of the system in one data-centre).
In particular for this scenario, how much data can you afford to lose (recovery point objective - RPO), and how quickly do you need to have the standby data-centre version of the site up and running (recovery time objective - RTO).
For example if your RPO is no transactions lost and recovery in 5 minutes, then the solution would be different than if you can afford to lose 5 mins of transactions and an hour to recover.
Another question I'd ask is if you're using SAN storage at all? This gives you options for replication at the storage level (SAN array to SAN array), rather than at the database level (e.g. MySQL replication).
Also to consider is the distance between the data-centres (e.g. timewise can you afford to perform a synchronous write to both databases, or would an asynchronous replication approach be more appropriate)