MySQL Master-Master replication performance - mysql

I have the following situation:
I have to set up a high-performance server-cluster with maximum availability with nginx and MySQL. The cluster consists of four web servers which are load ballanced with nginx+gluster which works just fine.
In addition there's another server with 2 SSDs in RAID1. On that server I intend to install 2 VMs each with 12GB of RAM where I set up the MySQL cluster with Master-Master replication.
But that only prevents the system to break down if the MySQL service breaks down on one of the VMs, not if the host system is offline.
To counter that I thought of adding 2 more nodes on other machines to the MySQL cluster as failover. Unfortunately I don't have more machines with SSDs.
Now my question: Would I have to expect performance issues because of the much slower hard drives on the failover machines? And if so, would these issues occur only when inserting data or also when calling pure select queries?
Of course I'd set the loadballancer to prioritize the faster nodes.

Related

ProxySQL active-standby setup

My setup:
Two MySQL servers running with Master-Master replication using third party Tungsten Replicator (for a legacy reasons, can't change that now).
Typically this cluster is used as Active-Standby. In normal operation all queries should hit first server. Only in case of first DB server failure queries should hit secondary server. Master-Master is for convinience of not using any master failover scripting. If primary server is back online, all queries should be sent to it.
I'm now using Galera Load Balancer configured in active-standby mode with simple health check (no mysql ping for x times = skip this server) and it works OK.
Problem:
I'd like to migrate glbd to ProxySQL and to replicate my setup. Started with two hosts with different weights ie 100000 vs 1.
Byt apparently ProxySQL uses it to weight traffic and 100000 queries go to primary, one next go to secondary and so on. It causes problems when sometimes replication lag is high, 1 of every 100000 queries will hit secondary server that could have some stale data.
How can I configure ProxySQL to send all queries only to my primary server when health check says it's OK, and to secondary server only if primary is unhealthy? When primary goes back alive all queries should be migrated to it.

MySQL Group Replication or a Single Server is enough?

I'm planning to create a system which tracks visitors clicks into the database. I'm expecting around 1M inserts/day into the Database.
On the backend, I'll have an analytics system which will analyze all the data that's been collected over the days/weeks/months/years.
My question is: is it a practical approach to have 2 different MySQL Servers + 1 Web server? MySQL Server A would insert the clicks into it's DB and it would be connected to MySQL Server B by group replication, so whenever I create reports, etc on MySQL Server B, it doesn't load Server A heavily.
These 2 Database servers would then be connected to the Web Server which would handle all the click requests and displaying the backend reports also.
Is it a practical solution, or is it better to have one bigger server to handle all the MySQL data? Or have multiple MySQL servers that are load balancing each other? Anything else perhaps?
1M inserts/day is not a high load by modern standards. That's less than 12 per second on average.
On sufficiently powerful servers with fast storage and proper tuning of MySQL options, you can expect to support at least 100x that load with a single MySQL server.
A better reason to use multiple MySQL servers is redundancy. Inevitably, any MySQL server needs to be upgraded, or you might have hardware failures and need to replace a disk, or other components. To avoid downtime, you should have a standby database server, which stays in sync with the primary server, either using MySQL replication or by disk-level replication like DRBD.

Benchmarking Mysql cluster using sysbench

When benchmarking a mysql clustering using sysbench, do you have to install sysbench on every machine in the cluster to benchmark the cluster performance? Is there a way to install sysbench on one machine and use it to benchmark other mysql servers on different machines?
If, for example i have HAProxy as the load balancer for the cluster which is configured on its own machine separate from the cluster nodes, then can you use the HAProxy machine only to benchmark the entire cluster since HAProxy machine will be doing the load balancing and acts as the window to all other cluster nodes?
I am knew to MySQL benchmarking, and new to using sysbench.
Thanks.
Yes, you will need to install sysbench on every SQL node, your intending to use and benchmark. (not the NDB data node).
HAProxy & ProxySQL are two different things but you can get the best of both worlds if you really want to.
HAProxy is a very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for very high traffic web-sites and powers quite a number of the world's most visited ones. HAProxy(High Availability Proxy) is an open source load balancer which can load balance any TCP service. It is particularly suited for HTTP load balancing as it supports session persistence and layer 7 processing.
ProxySQL is an open-source MySQL proxy server, meaning it serves as an intermediary between a MySQL server and the applications that access its databases. ProxySQL can improve performance by distributing traffic among a pool of multiple database servers and also improve availability by automatically failing over to a standby if one or more of the database servers fail.
To run the sysbench benchmarks, follow this guide:
https://wiki.gentoo.org/wiki/Sysbench
To setup the ProxySQL:
https://www.digitalocean.com/community/tutorials/how-to-use-proxysql-as-a-load-balancer-for-mysql-on-ubuntu-16-04
Set Up Highly Available HAProxy Servers with Keepalived and Floating IPs: https://www.digitalocean.com/community/tutorials/how-to-set-up-highly-available-haproxy-servers-with-keepalived-and-floating-ips-on-ubuntu-14-04
To create multi-node SQL Cluster: https://www.digitalocean.com/community/tutorials/how-to-create-a-multi-node-mysql-cluster-on-ubuntu-16-04
Set the engine to use sysbench - ENGINE=NDBCLUSTER; on mysql client.
you will need to create database and then prepare the sysbench before running it. Good luck!

Scalable web application architecture

I have a really simple bookshop webapplication written in Spring framework, just to test its scalability.
I deployed this bookshop on one EC2 instance (t1.micro), and database on Amazon RDS (t1.micro) with master/slave replication of one master instance and 3 slave instances (There's really a lot more reads than writes). One t1.micro RDS instance can have at maximum of 32 concurrent connections
Then I did stress testing with JMeter, figured out that the bottleneck is in the database, since you can have at maximum 32 concurrent connections to t1.micro RDS instance.
Should I auto scale RDS database instances, since creating new replica modifies master and it really takes long time to make it available?
Instead of using RDS should I create EC2 instances with MySQL master/replica and then auto scale these instances?
Should I shard my database instead of replication?
Application also uses com.mysql.jdbc.ReplicationDriver to load balance between master and slave instances. Should I use something different like HAProxy?
Have you ever consider Caching and Partitioning ? The web application we have worked have used Memcache. It really helps in performance issues. On the other hand If you have tables that have so much records, you should consider partitioning, accessing these tables on partitions can have remarkable affect.

What are the respective advantages/limitations of Amazon RDS vs. EC2 with MySQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I realize a couple of basic differences between the two, i.e.
EC2 is going to be cheaper
RDS I wouldn't have to do maintenance
Other than those two, are there any advantages to running my database from RDS as opposed to a separate EC2 server acting as a MySQL server. Assuming similar instance sizes, are both going to run into the same limitations in terms of being able to handle a load?
To give you a little bit more info about my use, I've got a database, nothing too big or anything (biggest table 1 million rows), just high SELECT volume.
This is a simple question with a very complicated answer!
In short: EC2 will provide maximum performance if you go with a RAID0 EBS. Doing RAID0 EBS requires a pretty significant amount of maintenance overhead, for example:
http://alestic.com/2009/06/ec2-ebs-raid
http://alestic.com/2009/09/ec2-consistent-snapshot
EC2 without RAID0 EBS will provide crappy I/O performance, thus it's not even really an option.
RDS will provide very good (though not maximum) performance out of the box. The management console is fantastic and it's easy to upgrade instances. High availability and read only slaves are a click away. It's REALLY awesome.
Short answer: Go with RDS. Still on the fence? Go with RDS!!! if you enjoy headaches and tuning every last little bit for maximum performance, then you can consider EC2 + EBS RAID 0. Vanilla EC2 is a terrible option for MySQL hosting.
In this post there is an excellent benchmark between:
Running MySql on a Small EC2 + EBS
Running MySql on a Small EC2 + EBS + adjusted MySql parameters
A Small RDS
The benchmark is very good since it is not focused only in ideal conditions (only one thread) but also in more realistic scenarios, with 50 threads hitting the database.
RDS is not really a high availability system. Read the fine print in the RDS faq. During a failover event it can take up to 3 minutes to failover. Additional amazon will decide it needs to "upgrade" your rds instance and do a failover at that point which will take your database down for "up to 3 minutes" (our experience is that it can take a longer than that).
RDS high availability is very different than master - master or master - slave replication and is much slower. They don't use mysql replication but uses some kind of ebs replication. So in a failover situation it will mount the ebs on the backup machine, start mysql, wait for mysql to do failure recover (hopefully nothing got corrupted too bad), then do a dns switch.
I hope this helps you with you evaluation.
We chose to use EC2 MySQL instances because we have a high read volume and need master-slave replication. Of course, you can spin up multiple RDS instances and setup MySQL replication between them yourself, but we use Scalr.net, which manages that for you using EC2 instances.
Basically, we just tell Scalr how many MySQL instances we want at it keeps them up, automates the setup of replication, handles automatic failover of slave promotion to master if the master gets terminated etc. It does both SQL dump backups and EBS volume snapshots of the master. So, when it needs to create a new slave, it automatically temporarily mounts an EBS volume of the last master snapshot to initialize the slave DB, then starts replication from the appropriate point. All point and click :)
(and no, I don't work for Scalr or anything. Scalr is available as Open Source if you don't want to use their service)
Regarding the maintenance window question. If you use Multi-AZ then RDS will create a standby replica in another availability zone so that there's no down time for maintenance and you protect yourself against a zone failure.
That's what I'm planning to do in the next week or so. Of course it's going to cost you more but I haven't worked that bit out yet.
MySQL on EC2 vs RDS MySQL
Advantages of MySQL on EC2
Amazon EC2 Inter Region Replication
Copy Snapshots across Amazon EC2 regions
RAID 0 with EBS Striping in MySQL EC2
More than 3TB of Disk space ( You will not need this for your size) can be attached on MySQL on EC2.
Disadvantages of MySQL on EC2
Configuration, Monitoring and Maintenance compared to RDS
Point in time backups available in RDS
IOPS lesser than RDS MySQL ( even after RAID 0) currently, 10800 with 6 disks for MySQL on EC2 whereas 12500 IOPS 16KB on RDS MySQL
I have been trying out RDS for a few months and here are some issues I have:
Using SQL profiler is tricky. Since you cannot connect profiler directly to the server, you have to run some stored procedures to create a log file that you can analyze. While they offer some suggestions about how that is done, it is far from user friendly. I would only recommend that you have a certified SQL professional do this kind of work.
while Amazon backs up your instance, you cannot restore an individual database. I have a web app with several separate customer-specific databases and my solution was to launch an EC2 instance with SQL running on it to attach to the production RDB database and import the data and then back it up on the EC2 instance. The other solution was to use a 3rd party tool that creates a massive SQL script (on the app server) that will recreate the schema and populate the data back to a restore point.
I had the same question this weekend. There is a 4 hour downtime window per week for RDS where they do maintenance. RDS seemed more expensive if you can get away with a micro instance of EC2. (This is true of test instances which has minimum traffic) I also wasn't able to change the timezone of the RDS instance because I dont have permission.
I am now actually looking at http://xeround.com/ which is mysql on EC2 by another company. They do not use InnoDB, instead they have their own engine called IDG. I am just starting to investigate that but they are in BETA and will give 500MB of space.