We have a rds t3.small instance which we perform write actions on. We have 2 read replicas for this instance which we route to using weighted routing policy via Route53. The reads are fine for now but we are getting massive operations for the write (master) database instance. CPU Utilization is nearing baseline of 20% and connections keep increasing (we are using connection pools but the traffic is too much)
Any possibilities on how we can manage load for writes? Is it possible to launch another instance for the same rds mysql database?
You can't distribute writes across multiple instances, they all have to go to the master instance. It sounds like you will soon need to increase the size of the writer instance. If downtime is a concern, you could add a new, larger instance as a read replica, and then promote it as the new writer instance.
All RDS instance types suck for write performance. They all use remotely-attached EBS storage, and remote storage incurs a heavy performance penalty for I/O.
At my last job, I benchmarked RDS versus Aurora for MySQL, and also benchmarked our physical servers (non-cloud) and also tested MySQL installed manually on EC2 i3 instances. RDS had consistent poor performance by a wide margin.
The t3 instances will be even worse, because they use burstable performance. Their baseline performance assumes a very light load, and they can add a burst of extra performance but only for short periods.
If you have write performance issues, especially for an application that requires consistent high performance, then you should upgrade to a more powerful instance type such as M5 or R5.
I would move away from RDS. I think it's useful only for very light load, or for temporary use during testing or development. I would recommend using Aurora instead of RDS, but my first preference would be to operate MySQL myself on EC2 i3 instances.
Related
We are using an RDS MySQL 5.6 instance (db.m3.2xlarge) on sa-east-1 region and during write intensive operations we are seeing (on CloudWatch) that both our Write Throughput and the Network Transmit Throughput are capped at 60MB/s.
We suspected that the Multi-AZ could be responsible for this behaviour and turned it off for testing purposes. We did the same operation and noticed now that the Write Througput wasn't capped anymore and the Network Transmit Throughput was actually zero. This reinforced the idea that this network traffic is between the primary instance and the failover instance on the Multi-AZ setup.
Here is the Cloudwatch chart showing the operation without Multi-AZ and right after the same one with Multi-AZ enabled:
We tried upgrading the instance to one with the highest network performance and also provisioned IOPs but there was no change, when Multi-AZ is on we are always capped at 60MB/s for write.
It's our understanding that Multi-AZ uses synchronous data replication but we can't find any information on the bandwidth limits for the link thru which this replication occurs. Does anyone know anything about it and how to avoid this limits? Or should we live with it?
I don't think you're seeing a limitation of the replication service per se, but it appears that your replication bandwidth shares the same transport as the EBS volume on your instance, thus it's a limitation of the Ethernet bandwidth available to your instance itself (remembering that EBS is network-attached storage).
The network connection on an m3.2xlarge is 1000 Mbit/s, which is equivalent to 125 MiB/s.
Divide that number by two and you get ~60 MB/s for writing to the local instance's EBS volume and another ~60 MB/s for writing to the synchronous replica.
Unfortunately, the implementation details of Multi-AZ replication are not something AWS has publicly explained in enough detail to say conclusively that this is indeed the explanation, but the numbers are suspiciously close to what would be predicted if it is correct.
The m3 family and m4 family of instances have similar specs but also (apparently) some fundamental design differences, so it might be informative to see if the same behavior is true of the m4.2xlarge.
I have experienced the same issue, after activating Multi AZ the Write Latency increased dramatically:
(The instance type is m4.4xlarge)
The reason looks to be the synchronous synchronization process, each write action has to wait until both DBs are responding positively to the modification.
Looks like there is not solution and it is an expected behaviour:
DB instances using Multi-AZ deployments may have increased write and
commit latency compared to a Single-AZ deployment, due to the
synchronous data replication that occurs
– from AWS documentation
Here is an interesting Redis thread regarding to this:
https://www.reddit.com/r/aws/comments/61ewvp/rds_multiaz_slow_insert/
the only recommendation I see is moving to Aurora :/
Well, I never got an ACTUAL explanation from anywhere, but after tons of tests it seems that the m3.2x.large is actually "bugged". I wrote a detailed explanation in my blog.
I've got an AWS Aurora DB cluster running that is 99.9% focused on writes. At it's peak, it will be running 2-3k writes/sec.
I know Aurora is somewhat optimized by default for writes, but I wanted to ask as a relative newcomer to AWS - what are some best practices/tips for write performance with Aurora?
From my experience, Amazon Aurora is unsuited to running a database with heavy write traffic. At least in its implementation circa 2017. Maybe it'll improve over time.
I worked on some benchmarks for a write-heavy application earlier in 2017, and we found that RDS (non-Aurora) was far superior to Aurora on write performance, given our application and database. Basically, Aurora was two orders of magnitude slower than RDS. Amazon's claims of high performance for Aurora are apparently completely marketing-driven bullshit.
In November 2016, I attended the Amazon re:Invent conference in Las Vegas. I tried to find a knowledgeable Aurora engineer to answer my questions about performance. All I could find were junior engineers who had been ordered to repeat the claim that Aurora is magically 5-10x faster than MySQL.
In April 2017, I attended the Percona Live conference and saw a presentation about how to develop an Aurora-like distributed storage architecture using standard MySQL with CEPH for an open-source distributed storage layer. There's a webinar on the same topic here: https://www.percona.com/resources/webinars/mysql-and-ceph, co-presented by Yves Trudeau, the engineer I saw speak at the conference.
What became clear about using MySQL with CEPH is that the engineers had to disable the MySQL change buffer because there's no way to cache changes to secondary indexes, while also have the storage distributed. This caused huge performance problems for writes to tables that have secondary (non-unique) indexes.
This was consistent with the performance problems we saw in benchmarking our application with Aurora. Our database had a lot of secondary indexes.
So if you absolutely have to use Aurora for a database that has high write traffic, I recommend the first thing you must do is drop all your secondary indexes.
Obviously, this is a problem if the indexes are needed to optimize some of your queries. Both SELECT queries of course, but also some UPDATE and DELETE queries may use secondary indexes.
One strategy might be to make a non-Aurora read replica of your Aurora cluster, and create the secondary indexes only in the read replica to support your SELECT queries. I've never done this, but apparently it's possible, according to https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/
But this still doesn't help cases where your UPDATE/DELETE statements need secondary indexes. I don't have any suggestion for that scenario. You might be out of luck.
My conclusion is that I wouldn't choose to use Aurora for a write-heavy application. Maybe that will change in the future.
Update April 2021:
Since writing the above, I have run sysbench benchmarks against Aurora version 2. I can't share the specific numbers, but I conclude that current Aurora improvements are better for write-heavy workload. I did run tests with lots of secondary indexes to make sure. But I encourage anyone serious about adopting Aurora to run their own benchmarks.
At least, Aurora is much better than conventional Amazon RDS for MySQL using EBS storage. That's probably where they claim Aurora is 5x faster than MySQL. But Aurora is no faster than some other alternatives I tested, and in fact cannot match:
MySQL Server installed myself on EC2 instances using local storage, especially i3 instances with locally-attached NVMe. I understand instance storage is not dependable, so one would need to run redundant nodes.
MySQL Server installed myself on physical hosts in our data center, using direct-attached SSD storage.
The value of using Aurora as a managed cloud database is not just about performance. It also has automated monitoring, backups, failover, upgrades, etc.
I had a relatively positive experience w/ Aurora, for my use case. I believe ( time has passed ) we were pushing somewhere close to 20k DML per second, largest instance type ( I think db.r3.8xlarge? ). Apologies for vagueness, I no longer have the ability to get the metrics for that particular system.
What we did:
This system did not require "immediate" response to a given insert, so writes were enqueued to a separate process. This process would collect N queries, and split them into M batches, where each batch correlated w/ a target table. Those batches would be put inside a single txn.
We did this to achieve the write efficiency from bulk writes, and to avoid cross table locking. There were 4 separate ( I believe? ) processes doing this dequeue and write behavior.
Due to this high write load, we absolutely had to push all reads to a read replica, as the primary generally sat at 50-60% CPU. We vetted this arch in advance by simply creating random data writer processes, and modeled the general system behavior before we committed the actual application to it.
The writes were almost all INSERT ON DUPLICATE KEY UPDATE writes, and the tables had a number of secondary indexes.
I suspect this approach worked for us simply because we were able to tolerate delay between when information appeared in the system, and when readers would actually need it, thus allowing us to batch at much higher amounts. YMMV.
For Googlers:
Aurora needs to write to multiple replicas in real time, thus there must be a queue w/ locking, waiting, checking mechanisms
This behavior inevitably causes ultra high CPU utilization and lag when there are continuous writing requests which only succeed when multiple replicas are sync'd
This has been around since Aurora's inception, up til 2020, which is logically difficult if not impossible to solve if we were to keep the low storage cost and fair compute cost of the service
High-volume writing performance of Aurora MySQL could be more than 10x worse than RDS MySQL (from personal experience and confirmed by above answers)
To solve the problem (more like a work-around):
BE CAREFUL with Aurora if more than 5% of your workload is writing
BE CAREFUL with Aurora if you need near real-time result of large volume writing
Drop secondary indices as #Bill Karwin points out to improve writing
Batch apply inserts and updates may improve writing
I said "BE CAREFUL" but not "DO NOT USE" as many scenarios could be solved by clever architecture design. Database writing performance can be hardly depended on.
Last night I asked a similar question at DB StackExchange platform, but no one replied so putting my question here.
I have huge DB(almost 7-9GiB) over Amazon RDS also CPU Utilization & I/O is also high, I'm not good at optimizing Database queries, enhancing Data Performance and etc. I need help with this DB, in terms of optimization and improve server performance so that users can gain rich experience.
About my research:
As also mentioned there at DBA. Should I upgrade RDS t2.micro to premium db-instance? or create another DB replica on the premium server and balance this with help of EBS(Load Balancers). So, the server becomes smooth for at-least now. Likewise,
RDS Load Balancing Tricks
Then, I need to optimize SQL queries and enhance APIs performance and update DataBase on RDS by removing EBS & Premium Server?
Some general optimizations you could do,
Instead of using t2.micro migrate to a instance class with high IO capacity (As you have mentioned). t2.micro is for general purpose and not recommend for production databses.
Convert from standard storage to Provisioned IOPS storage (If you have high disk utilization, else a general purpose SSD would also be fine)
Depending on the read traffic, you might want to consider using Read replicas.
Other than that, it will narrow down to general DB optimization techniques such as proper indexing etc.
Some of the AWS RDS specific best practices are listed in this doc.
We have an EC2 running both apache and mysql at the moment. I am wondering if moving the mysql to another EC2 instance will increase or decrease the performance of the site. I am more worried about the network speed issues between the two instances.
EC2 instances in the same availability zone are connected via a 10,000 Mbps network - that's faster than a good solid state drive on a SATA-3 interface (6Gb/s)
You won't see any performance drop by moving a database to another server, in fact you'll probably see a performance increase because of having separate memory and cpu cores for the two servers.
If your worry is network latency then forget about it - not a problem on AWS in the same availability zone.
Another consideration is that you're probably storing your website & db file on an EBS mounted volume. That EBS block is stored off-instance so you're actually storing a storage array on the same super-fast 10Gbps network.
So what I'm saying is... with EBS your website and database are already talking across the network to get their data, putting them on seperate instances won't really change anything in that respect - besides giving more resources to both servers. More resources means more data stored locally in memory and more performance.
The answer depends largely on what resources apache and MySQL are using. They can happily co-habit if demands on your website are low, and each are configured with enough memory that they don't shell out to virtual memory. In this instance, they are best kept together.
As traffic grows, or your application grows, you will benefit from splitting them out because they can then both run inside dedicated memory. Provided that the instances are in the same region then you should see fast performance between them. I have even run a web application in Europe with the DB in USA and performance wasn't noticeably bad! I wouldn't recommend that though!
Because AWS is easy and cheap, your best bet is to set it up and benchmark it!
Did you try amazon-rds? How is it, performance-wise?
I think this is a hard question to answer as it is highly specific to the problem you are trying to solve, but I will try to give you a picture of what we have seen.
We have been benchmarking RDS using CloudWatch metric gathering tools (provided here: http://aws.amazon.com/articles/2934) and have found it does perform nearly as well as our production servers for our data set. We tested both with a single RDS instance and with a Multi-AZ setup (what we plan to use in production) with no back-up retention.
The load we have been able to throw at it so far we are able to get up into the 1000-1100 Write IOPS range (their metric) even on a small database instance (db.m1.small). At least for our load, increasing the instance class did not affect our throughput IOPS or Bytes. We saw about a 10% reduction in performance when
Amazon freely admitted up front that the solution to really scale out is to subdivide your problem such that you can scale/store it across multiple database servers. We in fact have this in our application (very similar to sharding) and therefore will be able to take advantage and very easily move past this IOPS measurement.
We've found RDS to be pretty comparable performance-wise to having our own production servers (either dedicated or virtual or EC2). Note that you will always suffer some IO/performance degradation using a virtualization solution, which is what RDS seems to be using, and this will show up under heavy load (but with heavy load, you should be having a dedicated MySQL/DB box anyway.)
Take note: the biggest performance you will likely see is the network latency - if you are reading/writing from an EC2 box to an RDS box and vice versa, the network latency will probably be the bottlebeck, particularly for a large number of queries. This is likely to be worse if you are connecting from a non-Amazon/non-EC2 box to RDS.
You will probably get more performance from an equivalent spec physical box than a virtual box, but this is true of dedicated vs EC2/RDS, and is not a RDS-specific problem.
Regarding RDS vs EC2, the defaults that Amazon has set up RDS with seem to be pretty good, so if you are simply looking to have database server(s) up and running and connect to it, RDS is more than suitable. Do make sure you have the cost correctly analyzed though - its not the same pricing model as, say, an EC2 instance.