Scalable web application architecture - mysql

I have a really simple bookshop webapplication written in Spring framework, just to test its scalability.
I deployed this bookshop on one EC2 instance (t1.micro), and database on Amazon RDS (t1.micro) with master/slave replication of one master instance and 3 slave instances (There's really a lot more reads than writes). One t1.micro RDS instance can have at maximum of 32 concurrent connections
Then I did stress testing with JMeter, figured out that the bottleneck is in the database, since you can have at maximum 32 concurrent connections to t1.micro RDS instance.
Should I auto scale RDS database instances, since creating new replica modifies master and it really takes long time to make it available?
Instead of using RDS should I create EC2 instances with MySQL master/replica and then auto scale these instances?
Should I shard my database instead of replication?
Application also uses com.mysql.jdbc.ReplicationDriver to load balance between master and slave instances. Should I use something different like HAProxy?

Have you ever consider Caching and Partitioning ? The web application we have worked have used Memcache. It really helps in performance issues. On the other hand If you have tables that have so much records, you should consider partitioning, accessing these tables on partitions can have remarkable affect.

Related

MySql localhost vs Amazon RDS instance Performance

I am running Django Rest API on an AWS ec2-server. Right now the Api's are using MySql localhost database. Should I shift my database from MySql localhost to Amazon RDS instance?
As per what I Know for remote servers would take a little extra time to transmit the request and shared resources. Would this little extra time be worth migrating my database from MySql localhost to Amazon RDS instance?
I read this answer but it didn't helped me much.
MySql localhost vs Amazon RDS instance
An answer with all possible Pros and Cons will really be appreciated.
Pros for local MySQL
Slightly faster, because of proximity to the application
Cons for local MySQL
Not Easily Scalable
If you want to use autoscaling for your application load and traffic then you might have nightmares, because as you scale you will have even the MySQL servers running on each new node.
Pros for RDS
You don't have to worry about installing and maintaining MySQL server
You don't have to worry about scaling
You don't have to worry about load balancing
You don't have to worry about EC2 upgrades and patching
You don't have to worry about failure recovery because when you provision a Multi-AZ DB Instance, Amazon RDS synchronously replicates the data to a standby instance in a different Availability Zone (AZ)List item
Cons for RDS
Slightly slower due to network latency
It depends how database intensive your application is.
See this benchmark The local database blew RDS out of the water on query latency with low load.
The answer is probably use both? Use both a local Redis/MySQL for quick queries and an off server RDS for long queries over large data sets where paying the additional network latency makes sense.
Also think about using SQLite on S3. If you can easily shard your data, and most queries are read intensive it could be a lot cheaper, especially with something like Redis on the server to cache frequent queries.
If you want to really eek out performance per $$, you can use a lot of Pang's tricks by having a hierarchy of SQLite files.

MySQL Master-Master replication performance

I have the following situation:
I have to set up a high-performance server-cluster with maximum availability with nginx and MySQL. The cluster consists of four web servers which are load ballanced with nginx+gluster which works just fine.
In addition there's another server with 2 SSDs in RAID1. On that server I intend to install 2 VMs each with 12GB of RAM where I set up the MySQL cluster with Master-Master replication.
But that only prevents the system to break down if the MySQL service breaks down on one of the VMs, not if the host system is offline.
To counter that I thought of adding 2 more nodes on other machines to the MySQL cluster as failover. Unfortunately I don't have more machines with SSDs.
Now my question: Would I have to expect performance issues because of the much slower hard drives on the failover machines? And if so, would these issues occur only when inserting data or also when calling pure select queries?
Of course I'd set the loadballancer to prioritize the faster nodes.

Architecture of MySQL on EBS for scaling (Amazon Web Services)

I'm trying to understand how to architect an Amazon Web Services application.
I have an instance running off of EBS. As far as I understand, I need to mount the EBS drive so that I can store my MySQL database on it.
When I later want to scale up, how do I do so? I understand that I can add more server instances, but how will they be accessing the database? Since from what I understand, the EBS volume can only be attached to one server instance.
I can't speak to this particular setup as I do not have experience using EBS with a MySQL instance but how this type of scaling is typically accomplished is by dedicating a particular instance as the master database server. Any time you spin up additional web servers those are still using the master DB IP to connect. At the time in which your database is the bottleneck you then spin up a slave DB instance on one of the boxes (or its own dedicated box). You can then configure replication in either a master to slave direction or a circular replication so that you can write to the slave instance as well.
If you choose the classic master to slave replication then you will have to make sure your writes are only performed on the master DB instance.
You can setup something like Zeus or any other connection load balancer so that you only ever have to connect to a single Database IP which will then round-robin route your read connections to your pool of servers. Otherwise you'd have to manage the connections yourself which is definitely not trivial. Good luck.
Growing Amazon EBS Volume sizes
You can give a try to MySQL clustering on your EBS backed instances. I have similar query, with more requirements, posted here.
EBS Volumes capacity can be scaled up using Snapshot->launch new volume technique, alternatively storage capacity can be scaled out using EBS Striping (RAID 0).
In AWS you cannot mount same EBS Volume to 2 EC2 instances simultaneously, so when you are scaling your application you need to scale out / up your MySQL DB either thru Replication or clustering. AWS RDS is a very good option for MySQL , if your application is read intensive then you can scale out using RDS Read replica's as well. If you need write scaling then functional partition or MySQL Shards can be explored.
AWS has an entire product dedicated to this: RDS.
In all but the rarest and most specialized of circumstances you're going to be better off using RDS than trying to create and tune your own EBS/EC2/MySQL infrastructure.
RDS also directly answers your question - they directly enable the creation of readonly databases to use as query slaves. RDS also performs backups, upgrades, and all sorts of fail-over infrastructure for you.
With EBS there's no way to attach a disk to multiple EC2 instances, so you're not going to be able to build out a failure cluster using that approach. Instead you're going to need replication or backup tools of some type.

What are the respective advantages/limitations of Amazon RDS vs. EC2 with MySQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I realize a couple of basic differences between the two, i.e.
EC2 is going to be cheaper
RDS I wouldn't have to do maintenance
Other than those two, are there any advantages to running my database from RDS as opposed to a separate EC2 server acting as a MySQL server. Assuming similar instance sizes, are both going to run into the same limitations in terms of being able to handle a load?
To give you a little bit more info about my use, I've got a database, nothing too big or anything (biggest table 1 million rows), just high SELECT volume.
This is a simple question with a very complicated answer!
In short: EC2 will provide maximum performance if you go with a RAID0 EBS. Doing RAID0 EBS requires a pretty significant amount of maintenance overhead, for example:
http://alestic.com/2009/06/ec2-ebs-raid
http://alestic.com/2009/09/ec2-consistent-snapshot
EC2 without RAID0 EBS will provide crappy I/O performance, thus it's not even really an option.
RDS will provide very good (though not maximum) performance out of the box. The management console is fantastic and it's easy to upgrade instances. High availability and read only slaves are a click away. It's REALLY awesome.
Short answer: Go with RDS. Still on the fence? Go with RDS!!! if you enjoy headaches and tuning every last little bit for maximum performance, then you can consider EC2 + EBS RAID 0. Vanilla EC2 is a terrible option for MySQL hosting.
In this post there is an excellent benchmark between:
Running MySql on a Small EC2 + EBS
Running MySql on a Small EC2 + EBS + adjusted MySql parameters
A Small RDS
The benchmark is very good since it is not focused only in ideal conditions (only one thread) but also in more realistic scenarios, with 50 threads hitting the database.
RDS is not really a high availability system. Read the fine print in the RDS faq. During a failover event it can take up to 3 minutes to failover. Additional amazon will decide it needs to "upgrade" your rds instance and do a failover at that point which will take your database down for "up to 3 minutes" (our experience is that it can take a longer than that).
RDS high availability is very different than master - master or master - slave replication and is much slower. They don't use mysql replication but uses some kind of ebs replication. So in a failover situation it will mount the ebs on the backup machine, start mysql, wait for mysql to do failure recover (hopefully nothing got corrupted too bad), then do a dns switch.
I hope this helps you with you evaluation.
We chose to use EC2 MySQL instances because we have a high read volume and need master-slave replication. Of course, you can spin up multiple RDS instances and setup MySQL replication between them yourself, but we use Scalr.net, which manages that for you using EC2 instances.
Basically, we just tell Scalr how many MySQL instances we want at it keeps them up, automates the setup of replication, handles automatic failover of slave promotion to master if the master gets terminated etc. It does both SQL dump backups and EBS volume snapshots of the master. So, when it needs to create a new slave, it automatically temporarily mounts an EBS volume of the last master snapshot to initialize the slave DB, then starts replication from the appropriate point. All point and click :)
(and no, I don't work for Scalr or anything. Scalr is available as Open Source if you don't want to use their service)
Regarding the maintenance window question. If you use Multi-AZ then RDS will create a standby replica in another availability zone so that there's no down time for maintenance and you protect yourself against a zone failure.
That's what I'm planning to do in the next week or so. Of course it's going to cost you more but I haven't worked that bit out yet.
MySQL on EC2 vs RDS MySQL
Advantages of MySQL on EC2
Amazon EC2 Inter Region Replication
Copy Snapshots across Amazon EC2 regions
RAID 0 with EBS Striping in MySQL EC2
More than 3TB of Disk space ( You will not need this for your size) can be attached on MySQL on EC2.
Disadvantages of MySQL on EC2
Configuration, Monitoring and Maintenance compared to RDS
Point in time backups available in RDS
IOPS lesser than RDS MySQL ( even after RAID 0) currently, 10800 with 6 disks for MySQL on EC2 whereas 12500 IOPS 16KB on RDS MySQL
I have been trying out RDS for a few months and here are some issues I have:
Using SQL profiler is tricky. Since you cannot connect profiler directly to the server, you have to run some stored procedures to create a log file that you can analyze. While they offer some suggestions about how that is done, it is far from user friendly. I would only recommend that you have a certified SQL professional do this kind of work.
while Amazon backs up your instance, you cannot restore an individual database. I have a web app with several separate customer-specific databases and my solution was to launch an EC2 instance with SQL running on it to attach to the production RDB database and import the data and then back it up on the EC2 instance. The other solution was to use a 3rd party tool that creates a massive SQL script (on the app server) that will recreate the schema and populate the data back to a restore point.
I had the same question this weekend. There is a 4 hour downtime window per week for RDS where they do maintenance. RDS seemed more expensive if you can get away with a micro instance of EC2. (This is true of test instances which has minimum traffic) I also wasn't able to change the timezone of the RDS instance because I dont have permission.
I am now actually looking at http://xeround.com/ which is mysql on EC2 by another company. They do not use InnoDB, instead they have their own engine called IDG. I am just starting to investigate that but they are in BETA and will give 500MB of space.

Amazon RDS: can databases be setup in replicaton mode?

I am studying the new Amazon RDS product and it seems it can be scaled only vertically (i.e. put a stronger server).
Did anyone see a possibility to configure multiple instances so that one is master and the other/s is/are replication slaves?
Same question asked (and answered) here http://developer.amazonwebservices.com/connect/thread.jspa?threadID=37823
Looks like there are plans for Master-Master HA or similar but that's not the same a replicated scale-out offering.
According to the FAQ it is possible now, see http://aws.amazon.com/rds/faqs/#86 :
Q: What types of replication does
Amazon RDS support and when should I
use each?
Amazon RDS provides two distinct replication options to serve different
purposes.
If you are looking to use replication to increase database
availability while protecting your
latest database updates against
unplanned outages, consider running
your DB Instance as a Multi-AZ
deployment. When you create or modify
your DB Instance to run as a Multi-AZ
deployment, Amazon RDS will
automatically provision and manage a
“standby” replica in a different
Availability Zone (independent
infrastructure in a physically
separate location). In the event of
planned database maintenance, DB
Instance failure, or an Availability
Zone failure, Amazon RDS will
automatically failover to the standby
so that database operations can resume
quickly without administrative
intervention. Multi-AZ deployments
utilize synchronous replication,
making database writes concurrently on
both the primary and standby so that
the standby will be up-to-date in the
event a failover occurs. While our
technological implementation for
Multi-AZ DB Instances maximizes data
durability in failure scenarios, it
precludes the standby from being
accessed directly or used for read
operations. The fault tolerance
offered by Multi-AZ deployments make
them a natural fit for production
environments; to learn more about
Multi-AZ deployments, please visit
this FAQ section.
If you are looking to take advantage of MySQL 5.1’s built-in
replication to scale beyond the
capacity constraints of a single DB
Instance for read-heavy database
workloads, Amazon RDS makes it easier
with Read Replicas. You can create a
Read Replica of a given “source” DB
Instance using the AWS Management
Console or CreateDBInstanceReadReplica
API. Once the Read Replica is created,
database updates on the source DB
Instance will be propagated to the
Read Replica. You can create multiple
Read Replicas for a given source DB
Instance and distribute your
application’s read traffic amongst
them. Unlike Multi-AZ deployments,
Read Replicas use MySQL 5.1’s built-in
replication and are subject to its
strengths and limitations. In
particular, updates are applied to
your Read Replica(s) after they occur
on the source DB Instance
(“asynchronous” replication), and
replication lag can vary
significantly. This means recent
database updates made to a standard
(non Multi-AZ) source DB Instance may
not be present on associated Read
Replicas in the event of an unplanned
outage on the source DB Instance. As
such, Read Replicas do not offer the
same data durability benefits as
Multi-AZ deployments. While Read
Replicas can provide some read
availability benefits, they and are
not designed to improve write
availability.
With Amazon RDS, you can use Multi-AZ deployments and Read Replicas
in conjunction to enjoy the
complementary benefits of each. You
can simply specify that a given
Multi-AZ deployment is the source DB
Instance for your Read Replica(s).
That way you gain both the data
durability and availability benefits
of Multi-AZ deployments and the read
scaling benefits of Read Replicas.