Resize Amazon RDS storage - mysql

We are currently working with a 200 GB database and we are running out of space, so we would like to increment the allocated storage.
We are using General Purpose (SSD) and a MySQL 5.5.53 database (without Multi-AZ deployment).
If I go to the Amazon RDS menu and change the Allocated storage to a bit more (from 200 to 500) I get the following "warnings":
Deplete the initial General Purpose (SSD) I/O credits, leading to longer conversion times: What does this mean?
Impact instance performance until operation completes: And this is the most important question for me. Can I resize the instance with 0 downtime? I mean, I dont care if the queries are a bit slower if they work while it's resizing, but what I dont want to to is to stop all my production websites, resize the instance, and open them again (aka have downtime).
Thanks in advance.

You can expect degraded performance but you should really test the impact in a dev environment before running this on production so you're not caught off guard. If you perform this operation during off-peak hours you should be fine though.
To answer your questions:
RDS instances can burst with something called I/O credits. Burst means its performance can go above the baseline performance to meet spikes in demand. It shouldn't be a big deal if you burn through them unless your instance relies on them (you can determine this from the rds instance metrics). Have a read through I/O Credits and Burst Performance.
Changing the disk size will not result in a complete rds instance outage, just performance degradation so it's better to do it during off-peak hours to minimise the impact as much as possible.

First according to RDS FAQs, there should be no downtime at all as long as you are only increasing storage size but not upgrading instance tier.
Q: Will my DB instance remain available during scaling?
The storage capacity allocated to your DB Instance can be increased
while maintaining DB Instance availability.
Second, according to RDS documentation:
Baseline I/O performance for General Purpose SSD storage is 3 IOPS for
each GiB, which means that larger volumes have better performance....
Volumes below 1 TiB in size also have ability to burst to 3,000 IOPS
for extended periods of time (burst is not relevant for volumes above
1 TiB). Instance I/O credit balance determines burst performance.
I can not say for certain why but I guess when RDS increase the disk size, it may defragment the data or rearrange data blocks, which causes heavy I/O. If you server is under heavy usage during the resizing, it may fully consume the I/O credits and result in less I/O and longer conversion times. However given that you started with 200GB I suppose it should be fine.
Finally I would suggest you to use multi-az deployemnt if you are so worried about downtime or performance impact. During maintenance windows or snapshots, there will be a brief I/O suspension for a few seconds, which can be avoided with standby or read replicas.

The technical answer is that AWS supports no downtime when scaling storage.
However, in the real world you need to factor how busy your current database is and how the "slowdown" will affect users. Consider the possibility that connections might timeout or the site may appear slower than usual for the duration of the scaling event.
In my experience, RDS storage resizing has been smooth without problems. However, we pick the best time of day (least busy) to implement this. We also go thru a backup procedure. We snapshot and bring up a standby server to switch over to manually just in case.

Related

AWS RDS Optimized Memory vs Burstable Instance

I am currently using Memory optimized DB class in AWS (8 CPUs) because some push notifications in our app cause the CPU utilization to skyrocket, but 99% of the time CPU utilization is at about 10% so 8 CPUs aren't really needed most of the time.
Would I be able to deploy less CPUs on a Burstable Instance and have CPUs adjusted when there are those heavy traffic push notifications?
How do Burstable Instances work?
I wouldn't choose a burstable instance for any production traffic.
The burstable instance accumulates a number of "performance credits" per hour. These credits can be consumed when traffic increases and you needs a significant amount of resources. When the credits are spent, the instance still runs, but has only "baseline performance" which is, frankly, insufficient to handle production traffic.
I've seen many users try to economize by using the T family of instance types. They are usually quite disappointed, because they underestimate their need for resources. They end up consuming their burst credits too quickly, and then operate at the baseline performance level too often.
I'd use a burstable instance only for CI testing servers, or development. These instances typically run idle most of the time, and accumulate a good level of performance credits. They use these credits for brief periods, and then return to an idle level of activity.
You can also look into Aurora Serverless. This is supposed to auto-scale more replica instances in response to traffic increases, which should give you more CPU capacity. You only pay for the instances you use. That's the theory, but I can't speak from experience because I haven't used or tested Aurora Serverless. How well it works and how economical it is for you depends on your application's workload. All I can suggest is to give it a try.
I believe most RDBMS, particularly MySQL, can only be scaled "vertically", in the sense that you can't dynamically add/remove CPU resources to handle bursts of reads/writes.
Perhaps you can create a "notification/fanout" service which is more easily dynamically scaled up and down, using perhaps DynamoDB or AWS SNS. This way your primary database can avoid all of that traffic and in turn you can use a much less expensive EC2 instance for your RDS.

RDS Multi-AZ bottlenecking write performance

We are using an RDS MySQL 5.6 instance (db.m3.2xlarge) on sa-east-1 region and during write intensive operations we are seeing (on CloudWatch) that both our Write Throughput and the Network Transmit Throughput are capped at 60MB/s.
We suspected that the Multi-AZ could be responsible for this behaviour and turned it off for testing purposes. We did the same operation and noticed now that the Write Througput wasn't capped anymore and the Network Transmit Throughput was actually zero. This reinforced the idea that this network traffic is between the primary instance and the failover instance on the Multi-AZ setup.
Here is the Cloudwatch chart showing the operation without Multi-AZ and right after the same one with Multi-AZ enabled:
We tried upgrading the instance to one with the highest network performance and also provisioned IOPs but there was no change, when Multi-AZ is on we are always capped at 60MB/s for write.
It's our understanding that Multi-AZ uses synchronous data replication but we can't find any information on the bandwidth limits for the link thru which this replication occurs. Does anyone know anything about it and how to avoid this limits? Or should we live with it?
I don't think you're seeing a limitation of the replication service per se, but it appears that your replication bandwidth shares the same transport as the EBS volume on your instance, thus it's a limitation of the Ethernet bandwidth available to your instance itself (remembering that EBS is network-attached storage).
The network connection on an m3.2xlarge is 1000 Mbit/s, which is equivalent to 125 MiB/s.
Divide that number by two and you get ~60 MB/s for writing to the local instance's EBS volume and another ~60 MB/s for writing to the synchronous replica.
Unfortunately, the implementation details of Multi-AZ replication are not something AWS has publicly explained in enough detail to say conclusively that this is indeed the explanation, but the numbers are suspiciously close to what would be predicted if it is correct.
The m3 family and m4 family of instances have similar specs but also (apparently) some fundamental design differences, so it might be informative to see if the same behavior is true of the m4.2xlarge.
I have experienced the same issue, after activating Multi AZ the Write Latency increased dramatically:
(The instance type is m4.4xlarge)
The reason looks to be the synchronous synchronization process, each write action has to wait until both DBs are responding positively to the modification.
Looks like there is not solution and it is an expected behaviour:
DB instances using Multi-AZ deployments may have increased write and
commit latency compared to a Single-AZ deployment, due to the
synchronous data replication that occurs
– from AWS documentation
Here is an interesting Redis thread regarding to this:
https://www.reddit.com/r/aws/comments/61ewvp/rds_multiaz_slow_insert/
the only recommendation I see is moving to Aurora :/
Well, I never got an ACTUAL explanation from anywhere, but after tons of tests it seems that the m3.2x.large is actually "bugged". I wrote a detailed explanation in my blog.

RDS Migration - MySQL Queries have slowed down

We recently moved to Amazon Web Services from colo hosting. We have two EC2 servers and a RDS instance. Initally everything was running quite smoothly but recently queries that used to take seconds to run are now taking minutes.
We tried upgrading to a larger instance but that does not seemed to have helped. Also, Ive reached the limit of my knowledge and we are stil in the process of trying to find a new DBA after the last one left.
Our RDS is a m3.xlarge and we are using SSD storage. Below is a screenshot of max Read and Write ops as well as CPU usage
Any suggestions or guidance on paramaters that I should check or should change would be much appreciated.
It seems you are having a latency problem, i. e. low availability.
Amazon EBS drives, like almost everything on the cloud, are shared.
And, like everything on the cloud, you can pay extra for maximum peak or extra for minimum availability (or extra for both, of course).
(sorry for being obvious)
Now the tips:
See those low valleys in between the huge peaks on your IOPS graph? That probably doesn't mean your RDBMS isn't requesting them, but that it is not getting them instead, because Amazon is giving those IOPS to other, less IOPS greedy users. But many of them.
If you haven't done so already, read about Provisioned IOPS for low latency SSD disk access,
and how to improve EBS performance.
Also, is EBS optimization active for your instance? Amazon docs say it is enabled by default for c4 instances and supported by m3.xlarge instances, but doesn't mention anything about the optimization being enabled by default for the latter.
I am not in any case an expert, but there is no harm and almost no cost on trying those simple solutions. That should probably be enough. Otherwise, don't wait till you manage to hire a new competent DBA and get some consulting from a reputable firm ASAP (Or even buying AWS premium support for a month). At least they will be able to tell where the bottleneck is and what has to be done to fix it.

How Couchbase Server support high concurrency and high throughput

I am curious to know, how couchbase server support high concurrency and high throughput.
It's a very broad question to answer but I'll try to cover some of the key reasons for why Couchbase is fast and scalable.
Writes in Couchbase are by default asynchronous,replication and persistence happen in the background, and the smart clients (SKD's) are notified of success or failure. So basically any new documents or mutations to documents are written to ram and then asynchronously flushed to disk in the background and replicated to other nodes. This means that there is no waiting time or contention on IO/disk speed. (This means it is possible to write to ram and then the node to fall over before the request has been persisted to disk or replicated to a secondary/third node). It is possible to make writes synchronously but it will slow down throughput considerably.
When dealing with ram, writes and read are VERY fast (we've only pushed our cluster to 20k operations a second) but large companies easily hit upwards of 400k operations a second. LinkedIN sustain this ops rate with only 4 nodes ---> http://www.couchbase.com/customer-stories
In traditional database architectures usually the setup would be a master DB (Mysql/Postgres/Oracle) coupled with a slave DB for data redundancy, also writes/reads can be split between the 2 as load gets higher. Couchbase is meant to be used as a distributed system (Couchbase recommend at least 3 nodes in production). Data is automatically sharded between the nodes in a cluster thus spreading the writes/reads across multiple machines. In the case of needing higher throughput, adding a node in Couchbase is as simple as clicking add node and then rebalance cluster, the data will be automatically partitioned across the new cluster map.
So essentially writing/reading from ram with async disk persistence + distributed reads and writes == high throughput
Hope that helps!
#scalabilitysolved already gave a great overview, but if you want a longer (and more detailed) description take a look at the Couchbase_Server_Architecture_Review on couchbase.com

What are your experiences regarding performance with amazon-rds

Did you try amazon-rds? How is it, performance-wise?
I think this is a hard question to answer as it is highly specific to the problem you are trying to solve, but I will try to give you a picture of what we have seen.
We have been benchmarking RDS using CloudWatch metric gathering tools (provided here: http://aws.amazon.com/articles/2934) and have found it does perform nearly as well as our production servers for our data set. We tested both with a single RDS instance and with a Multi-AZ setup (what we plan to use in production) with no back-up retention.
The load we have been able to throw at it so far we are able to get up into the 1000-1100 Write IOPS range (their metric) even on a small database instance (db.m1.small). At least for our load, increasing the instance class did not affect our throughput IOPS or Bytes. We saw about a 10% reduction in performance when
Amazon freely admitted up front that the solution to really scale out is to subdivide your problem such that you can scale/store it across multiple database servers. We in fact have this in our application (very similar to sharding) and therefore will be able to take advantage and very easily move past this IOPS measurement.
We've found RDS to be pretty comparable performance-wise to having our own production servers (either dedicated or virtual or EC2). Note that you will always suffer some IO/performance degradation using a virtualization solution, which is what RDS seems to be using, and this will show up under heavy load (but with heavy load, you should be having a dedicated MySQL/DB box anyway.)
Take note: the biggest performance you will likely see is the network latency - if you are reading/writing from an EC2 box to an RDS box and vice versa, the network latency will probably be the bottlebeck, particularly for a large number of queries. This is likely to be worse if you are connecting from a non-Amazon/non-EC2 box to RDS.
You will probably get more performance from an equivalent spec physical box than a virtual box, but this is true of dedicated vs EC2/RDS, and is not a RDS-specific problem.
Regarding RDS vs EC2, the defaults that Amazon has set up RDS with seem to be pretty good, so if you are simply looking to have database server(s) up and running and connect to it, RDS is more than suitable. Do make sure you have the cost correctly analyzed though - its not the same pricing model as, say, an EC2 instance.