I have an AWS RDS MySql Cluster. I'm trying to Auto Scale on Mass Write operations, but unable to do so. But, when I'm Running Read Queries it Scales properly. I'm getting "Too Many Connections" error on write. Can anyone let me know what I'm doing wrong? Thanks in advance.
[Edit: 1]
Screenshot of AWS RDS Cluster Config
I've kept the connection limit to 2 because I was testing.
When I'm sending Multiple read requests to AWS RDS I can see new Instances being launched in my RDS Instances Section:
I've also set Scale In Cool Time to 0 so that it will launch a new Instance Instantly. When I'm reading from the database using read endpoint, Auto Scaling is working properly. But when I'm trying to insert data using write endpoint, Auto-Scaling is not working.
Your question is short on specifics so I will list some possible ways to figure this out and possible solve it.
RDS scaling takes time, so you cannot expect that your DB will increase in capacity instantly when a sudden spike of traffic exceeds its current capacity.
The maximum number of connects to a MySQL instance is set by max_connections in your parameter group. How many connections are happening and what is the max_connections value? This value affects memory usage, so be review any changes. Note: Increasing this value does not always help if there is a bug in your client code that erroneously creates too many connections. If the number of peak connections is exceeding the max_connections value, sometimes you just need to scaled up to a larger instance. Details determine the correct solution.
Use MySQL's Gobal Status History and look into what happens and when. This is useful for detecting locking or memory issues.
Related
I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?
Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.
I've purchased a single VPC on AWS and initiated there 6 MySql databases, and foreach one I've created a reading replica, so that I can always run queries on the reading replicas quickly.
Most of the day, my writing instances (original instances) are fully loaded and their CPUs percentage is mostly 99%. However, the reading replicas shows something ~7-10% CPU usage, but sometimes I get an error when I run a service connecting to the reading replica "TOO MANY CONNECTIONS".
I'm not that expert with AWS, but is this happening because the writing replicas are fully loaded and they're on the same VPC?
this happening because the writing replicas are fully loaded and they're on the same VPC?
No, it isn't. This is unrelated to replication. In replication, the replica counts as exactly 1 connection on the master, but replication does not consume any connections on the replica itself. There is no impact on connections related to the intensity of the total workload from replication.
This issue simply means you have more clients connecting to the replica than are allowed by the parameter group based on your RDS instance type. Use the query SELECT ##MAX_CONNECTIONS; to see what this limit is. Use SHOW STATUS LIKE 'THREADS_CONNECTED'; to see how many connections exist currently, and use SHOW PROCESSLIST; (as the administrative user, or any user holding the PROCESS privilege) in order to see what all of these connections are doing.
If many of them show Sleep and have long values in Time (seconds spent in the current state) then the problem is that your application is somehow abandoning connections, rather than properly closing them after use or when they are otherwise no longer needed.
QUESTION OUTLINE
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage. (see below the question update for detailed problem description)
Question Update
So after more than one month of investigating and some developer support by AWS, I am not exactly closer to a solution.
Here are a couple of steps which I checked off the list, more or less without any further hint of the problem:
Index / Fragmentation (all tables have correct indexes/keys and have no fragmentation)
MySQL Stats Update (manually updating stats source)
Thread Concurrency (changing innodb_thread_concurrency to various different parameters)
Query Cache Hit Ratio doesn't show problems
EXPLAIN to see if any SELECTs are actually slow or not using indexes/keys
SLOW QUERY LOG (returns no results, because see paragraph below, it's a number of prepared SELECTs)
RDS and EC2 are within one VPC
For explanation, the used PlayFramework (2.3.8) has BoneCP and we are using eBeans to select our data. So basically I am running through a nested object and all those child objects, this produces a couple of hundred prepared SELECTs for the API call in question. This should basically also be fine for the used hardware, neither CPU nor RAM are extensively used by these operations.
I also included NewRelic for more insights on this issue and did some JVM profiling. Obviously, most of the time is consumed by NETTY/eBeans?
Is anyone able to make sense of this?
ORIGINAL QUESTION: Problem Outline
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage.
Infrastructure
We run a PlayFramework backend for a mobile app on AWS EC2 instances, connected to AWS RDS MySQL instances, one PROD environment, one DEV environment. Usually the PROD EC2 instance is pointing to the PROD RDS instance, and the DEV EC2 points to the DEV RDS (hi from captain obvious!); however sometimes we also let the DEV EC2 point to the PROD DB for some testing purposes. The PlayFramework in use is working with BoneCP.
Detailed Problem Description
In a quite essential sync process, our app is making a certain API call many times a day per user. I discussed the backgrounds of the functionality in this SO question, where, thanks to comments, I could nail the problem down to be a MySQL issue of some kind.
In short, the API call is loading a set of data, the maximum is about 1MB of json data, which currently takes about 18s to load. When things are running perfectly fine, this takes about 4s to load.
Curious enough, what "solved" the problem last time was upgrading the RDS instance to another instance type (from db.m3.large to db.m4.large, which is just a very marginal step). Now, after about 2-3 weeks, the RDS instance is once again performing slow as before. Rebooting the RDS instance showed no effect. Also re-launching the EC2 instance shows no effect.
I also checked if the indices of the affected mySQL tables are set properly, which is the case. The API call itself is not eager-loading any BLOB fields or similar, I double-checked this. The CPU-usage of the RDS instances is below 1% most of the time, when I stress tested it with 100 simultaneous API calls, it went to ~5%, so this is not the bottleneck. Memory is fine too, so I guess the RDS instance doesn't start swapping which could slow down the whole process.
Giving hard evidence, a (smaller) public API call on the DEV environment currently takes 2.30s load, on the PROD environment it takes 4.86s. Which is interesting, because the DEV environment has both in EC2 and RDS a much smaller instance type. So basically the turtle wins the race here. (If you are interested in this API call I am happy to share it with you via PN, but I don't really want to post links to API calls, even if they are basically public.)
Conclusion
Concluding, it feels (I wittingly say 'feels') like the DB is clogged after x days of usage / after a certain amount of API calls. Not sure if this a RDS-specific issue, once I 'largely' reset the DB instance by changing the instance type, things run fast and smooth. But re-creating my DB instance from a snapshot every 2 weeks is not an option, especially if I don't understand why this is happening.
Do you have any ideas what further steps I could take to investigate this matter?
(Too long for just a comment) I know you have checked a lot of things, but I would like to look at them with a different set of eyes...
Please provide
SHOW VARIABLES; (probably need post.it or something, due to size)
SHOW GLOBAL STATUS;
how much RAM? Sounds like 7.5G
The query. -- Unclear what query/queries you are using
SHOW CREATE TABLE for the table(s) in the query -- indexes, datatypes, etc
(Some of the above may help with "clogging over time" question.)
Meanwhile, here are some guesses/questions/etc...
Some other customer sharing the hardware is busy.
It could be a network problem?
Shrink long_query_time to 1 so you can catch slow queries.
When are backups done on your instance?
4s-18s to load a megabyte -- what percentage of that is SQL statements?
Do you "batch" the inserts? Is it a single transaction? Are lengthy queries going on at the same time?
What, if any, MySQL tunables did you change from the AWS defaults?
6GB buffer_pool on a 7.5GB partition? That sounds dangerously tight. Can you see if there was any swapping?
Any PARTITIONing involved? (Of course the CREATE will answer that.)
There is one very important bit of information missing from your description: The total allocated space for the database. I/O for RDS is around 3x the allocated space, so for a 100GB allocation, you should get around 300 IOPS. That allocated space also includes logs.
Since you don't really know what's going on, the first step should be to turn on detailed monitoring, which will give you more idea of what is happening on the instance.
Until you have additional stats gathered during a slowdown, you can try increasing the allocated space, which will increase the IOPS available.
Also, check the events for the db - are logs getting purged on a regular basis? That might indicate that there's not enough space.
Finally, you can try going with PIOPS (provisioned IOPS) if you have an idea of what the application needs, though at this point it sounds like that would be a guess.
maybe your burst credit balance is (slowly) being depleted? finally, you end up with baseline performance, which may appear "too slow".
this would also explain why the upgrade to another instance type did help, as you then start with a full burst balance again.
i would suggest to increase the size of the volume, even if you don't need the extra space, as the baseline performance grows linearly with volume size.
Today I have notice that my Amazon RDS instance memory monitor threshold shows me with red line. Here I have attached screen shot for the same.
So, My question is what is that Memory threshold, and why it is crossing limit? Anything wrong with my instance? What is the solution to decrease/control this hike?
The Red line you see is a threshold set by AWS if the RDS is causing that threshold many times then there might be a performance issue that you need to take a look.
MySQL try to use all available memory as needed. However, the limits are defined by RDS' server parameters which you can modify and you may not need to scale up your server.
RDS instances are created with default values for those parameters (the most relevant of them being innodb_buffer_pool_size) to optimize memory usage. In order to see which server variables are applied to your instance, connect to it and execute the "show global variables" command.
It is normal for that number to go up and down as matter of course.
If you are seeing performance issues and you have no more freeable memory, then you should be looking at causes or upgrading to a larger instance.
Those values may not be right for all workloads, but you can adjust them as needed using parameter groups. This document explains how you can use parameter groups:
http://docs.amazonwebservices.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html
I have published my website on Amazon EC2 (Singapore region) and I have used MySQL RDS instance for the data storage. Everything is working very fine except performance.
I seems that, my all queries, especially the select statement, is performing very slowly. If I check this issue on my local PC, there it is working very well. But when I am trying to get data from RDS instance, it is very slow. Some of the select statements takes 2-3 seconds to fetch data.
I have properly tuned up all table indexes, and normalized/de-normalized as required. I have made all necessary settings on RDS custom parameter group (eg. max_connection, buffer etc). I don't know if I am missing something, but it didn't work for me - performance didn't increase.
So, can someone please help me with this issue?
It is worth noting that, for whatever reason, MySQL query cache is OFF by default in RDS. We learned that the hard way ourselves this week.
This won't help performance of your initial query, but it may speed things up in general.
To re-enable query cache:
Log in to the RDS Console
Click on your RDS instance to view it's details
Edit the Database Parameter Group
Be sure to set both query_cache_size and query_cache_type
(Disclaimer: I am not a DBA so there may be additional things I'm missing here)
For me, it was nothing to do with MySQL but rather the instance type I was on t2.medium. The problem is I ran out of CPU credits because the load on the DB was too high and the balance kept going down until finally, I was getting far fewer credits hourly than were needed.
Here is what I saw in RDS CloudWatch under CPU Credit Usage:
If you have the same problem it may be time to switch to a different instance. Here is the list of instance types:
https://aws.amazon.com/rds/instance-types/
Hope this helps.
It is important to have your RDS and EC2 instances not just in the same region but also in the same availability zone to minimize the latency.
I had an API hosted in Ireland on EC2 and moved the Database to a MySQL cluster in Virginia USA that we had set up for another project and the round trip on every SQL query made the API unusable.
RDS MySQL performance can be increased in following ways assuming the system has more read ratio:
Use Larger instance types, they come with better NW bandwidth. Example AWS Quadruple EXL comes with 1,000 Mbps bandwidth.
Use PIOPS storage you can extract 12,500 IOPS of 16KB from MySQL DB
If lots of read is performed, add one or more Read Replica's to increase read performance
Apply standard practices like: Tune the queries, apply the indexes etc
First i highly recommend to look over these queries using
SHOW FULL PROCESSLIST
You can read more about it on SHOW FULL PROCESSLIST
This will show you the time each query take.
Then you can use
EXPLAIN
You can read more about it on EXPLAIN
This will show you if you need some enhancement on your queries
You can check where the query is taking time by making use of profiling. Use the below query:
set profiling=1
execute your select query
show profile
This will tell you about the status of the query and where the query is spending its time. If the sum of all the time returned by the profiling is less than the actual execution time of the query, then maybe other factors like Network bandwidth may be the cause of it.
Always should deploy source and rds in the same AWS availability zone for lower network latency and Should create a private endpoint link in VPC for RDS to connect RDS endpoint through the internal network instead of routing through the internet.
Reference: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/vpc-interface-endpoints.html
I found that after migrating to RDS all my database Indexes are gone! They weren't migrated along with the schema and data. Make sure you're indexes are there.