rds mysql performance slow under connections - mysql

I have a db.r4.4xlarge (should be able to handle 10000 connections per aws's metrics) mariadb rds instance on aws. Today we saw a big spike in database connections (around 1000 simultaneous and sustained connections). We have so many connections because our clients use has use has grown significantly after an acquisition they were a part of. This instance is supposed to support 10k connections. CPU and Memory were ok, ie., not pinned an no swapping to my knowledge. All queries, even simple ones started to take forever. Even stuff like
CREATE TEMPORARY TABLE if not exists TTT0dfa3c18b1036a73dd5d5581bddac484_2 LIKE TTT0dfa3c18b1036a73dd5d5581bddac484;
was taking > 5 seconds... and that's not even loading any table data, just the structure. I'm trying to get to the root of the issue and find a solution. Can anyone point me in a direction that may lead to a better understanding of what was going on. The crux of my confusion is that the cpu and memory looked ok while all this was happening, just the connections stood out as unusually high. Should I try using ec2 with ssd, etc, any ideas/thoughts would be helpful as I am not experienced with this type of issue.
below I've added a couple pics of the only metrics that stood out at all. connections and binlogusage and freeable memory. IO/cpu was good.

Related

How to identify AWS RDS Scaling conditions

I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?
Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.

AWS RDS MySQL performance drop after random timespan

QUESTION OUTLINE
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage. (see below the question update for detailed problem description)
Question Update
So after more than one month of investigating and some developer support by AWS, I am not exactly closer to a solution.
Here are a couple of steps which I checked off the list, more or less without any further hint of the problem:
Index / Fragmentation (all tables have correct indexes/keys and have no fragmentation)
MySQL Stats Update (manually updating stats source)
Thread Concurrency (changing innodb_thread_concurrency to various different parameters)
Query Cache Hit Ratio doesn't show problems
EXPLAIN to see if any SELECTs are actually slow or not using indexes/keys
SLOW QUERY LOG (returns no results, because see paragraph below, it's a number of prepared SELECTs)
RDS and EC2 are within one VPC
For explanation, the used PlayFramework (2.3.8) has BoneCP and we are using eBeans to select our data. So basically I am running through a nested object and all those child objects, this produces a couple of hundred prepared SELECTs for the API call in question. This should basically also be fine for the used hardware, neither CPU nor RAM are extensively used by these operations.
I also included NewRelic for more insights on this issue and did some JVM profiling. Obviously, most of the time is consumed by NETTY/eBeans?
Is anyone able to make sense of this?
ORIGINAL QUESTION: Problem Outline
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage.
Infrastructure
We run a PlayFramework backend for a mobile app on AWS EC2 instances, connected to AWS RDS MySQL instances, one PROD environment, one DEV environment. Usually the PROD EC2 instance is pointing to the PROD RDS instance, and the DEV EC2 points to the DEV RDS (hi from captain obvious!); however sometimes we also let the DEV EC2 point to the PROD DB for some testing purposes. The PlayFramework in use is working with BoneCP.
Detailed Problem Description
In a quite essential sync process, our app is making a certain API call many times a day per user. I discussed the backgrounds of the functionality in this SO question, where, thanks to comments, I could nail the problem down to be a MySQL issue of some kind.
In short, the API call is loading a set of data, the maximum is about 1MB of json data, which currently takes about 18s to load. When things are running perfectly fine, this takes about 4s to load.
Curious enough, what "solved" the problem last time was upgrading the RDS instance to another instance type (from db.m3.large to db.m4.large, which is just a very marginal step). Now, after about 2-3 weeks, the RDS instance is once again performing slow as before. Rebooting the RDS instance showed no effect. Also re-launching the EC2 instance shows no effect.
I also checked if the indices of the affected mySQL tables are set properly, which is the case. The API call itself is not eager-loading any BLOB fields or similar, I double-checked this. The CPU-usage of the RDS instances is below 1% most of the time, when I stress tested it with 100 simultaneous API calls, it went to ~5%, so this is not the bottleneck. Memory is fine too, so I guess the RDS instance doesn't start swapping which could slow down the whole process.
Giving hard evidence, a (smaller) public API call on the DEV environment currently takes 2.30s load, on the PROD environment it takes 4.86s. Which is interesting, because the DEV environment has both in EC2 and RDS a much smaller instance type. So basically the turtle wins the race here. (If you are interested in this API call I am happy to share it with you via PN, but I don't really want to post links to API calls, even if they are basically public.)
Conclusion
Concluding, it feels (I wittingly say 'feels') like the DB is clogged after x days of usage / after a certain amount of API calls. Not sure if this a RDS-specific issue, once I 'largely' reset the DB instance by changing the instance type, things run fast and smooth. But re-creating my DB instance from a snapshot every 2 weeks is not an option, especially if I don't understand why this is happening.
Do you have any ideas what further steps I could take to investigate this matter?
(Too long for just a comment) I know you have checked a lot of things, but I would like to look at them with a different set of eyes...
Please provide
SHOW VARIABLES; (probably need post.it or something, due to size)
SHOW GLOBAL STATUS;
how much RAM? Sounds like 7.5G
The query. -- Unclear what query/queries you are using
SHOW CREATE TABLE for the table(s) in the query -- indexes, datatypes, etc
(Some of the above may help with "clogging over time" question.)
Meanwhile, here are some guesses/questions/etc...
Some other customer sharing the hardware is busy.
It could be a network problem?
Shrink long_query_time to 1 so you can catch slow queries.
When are backups done on your instance?
4s-18s to load a megabyte -- what percentage of that is SQL statements?
Do you "batch" the inserts? Is it a single transaction? Are lengthy queries going on at the same time?
What, if any, MySQL tunables did you change from the AWS defaults?
6GB buffer_pool on a 7.5GB partition? That sounds dangerously tight. Can you see if there was any swapping?
Any PARTITIONing involved? (Of course the CREATE will answer that.)
There is one very important bit of information missing from your description: The total allocated space for the database. I/O for RDS is around 3x the allocated space, so for a 100GB allocation, you should get around 300 IOPS. That allocated space also includes logs.
Since you don't really know what's going on, the first step should be to turn on detailed monitoring, which will give you more idea of what is happening on the instance.
Until you have additional stats gathered during a slowdown, you can try increasing the allocated space, which will increase the IOPS available.
Also, check the events for the db - are logs getting purged on a regular basis? That might indicate that there's not enough space.
Finally, you can try going with PIOPS (provisioned IOPS) if you have an idea of what the application needs, though at this point it sounds like that would be a guess.
maybe your burst credit balance is (slowly) being depleted? finally, you end up with baseline performance, which may appear "too slow".
this would also explain why the upgrade to another instance type did help, as you then start with a full burst balance again.
i would suggest to increase the size of the volume, even if you don't need the extra space, as the baseline performance grows linearly with volume size.

RDS Migration - MySQL Queries have slowed down

We recently moved to Amazon Web Services from colo hosting. We have two EC2 servers and a RDS instance. Initally everything was running quite smoothly but recently queries that used to take seconds to run are now taking minutes.
We tried upgrading to a larger instance but that does not seemed to have helped. Also, Ive reached the limit of my knowledge and we are stil in the process of trying to find a new DBA after the last one left.
Our RDS is a m3.xlarge and we are using SSD storage. Below is a screenshot of max Read and Write ops as well as CPU usage
Any suggestions or guidance on paramaters that I should check or should change would be much appreciated.
It seems you are having a latency problem, i. e. low availability.
Amazon EBS drives, like almost everything on the cloud, are shared.
And, like everything on the cloud, you can pay extra for maximum peak or extra for minimum availability (or extra for both, of course).
(sorry for being obvious)
Now the tips:
See those low valleys in between the huge peaks on your IOPS graph? That probably doesn't mean your RDBMS isn't requesting them, but that it is not getting them instead, because Amazon is giving those IOPS to other, less IOPS greedy users. But many of them.
If you haven't done so already, read about Provisioned IOPS for low latency SSD disk access,
and how to improve EBS performance.
Also, is EBS optimization active for your instance? Amazon docs say it is enabled by default for c4 instances and supported by m3.xlarge instances, but doesn't mention anything about the optimization being enabled by default for the latter.
I am not in any case an expert, but there is no harm and almost no cost on trying those simple solutions. That should probably be enough. Otherwise, don't wait till you manage to hire a new competent DBA and get some consulting from a reputable firm ASAP (Or even buying AWS premium support for a month). At least they will be able to tell where the bottleneck is and what has to be done to fix it.

Amazon Linux EC2 Webserver / MYSQL Upgrade – Traffic causing error establishing a database connection

To give you a little background, I currently have a website that allows users to upload photos. The website was initially on a GoDaddy shared server, but recent surges in traffic have forced me to explore other options. During peak hours, the site contains 400+ active visitors, which when combined with user uploads, forces the shared server to shut down.
I have a small amount of experience with setting up servers through AWS and attempted to place the website on a c1.medium instance, Amazon Linux. The website along with the MYSQL Database is on the same instance. While I have read that this is in general frowned upon, I have similarly read that moving the database to another instance would not significantly increase speeds. Unfortunately, the c1.medium instance also was unable to support the traffic and I soon received an error Establishing a Database connection. The site does load on occasion, so the problem stems from the traffic load and not an actual problem with the database.
My question is whether the problem revolves solely around MySQL? The database itself when backed up is around 250MB. Is the issue caused by input / output requests to the database? I read posts with people with similar problems in which they stated that installing MySQL 5.6 solved the problem, but also have read that MySQL 5.6 is slower than MySQL 5.5, which is my current version.
After conducting some preliminary research I started to believe that I could resolve the problem by increasing the IPOS of the EBS. Originally I had it set the IPOS as standard, but changed it to Provisioned IOPS and 30x the size of the EBS (i.e., 60GB – 1800 IOPS). This once again appeared to have little impact. Do I need to upgrade my instance? What measures should I be focused on when deciding on the instance? It appears that the cheapest instance with high network performance and EBS optimized would be c3.xlarge. Suggestions?
Several things to consider:
1)Separate the database server from the web server
Your database should not share resources with your web server. They will both perform poorly as the result.
It is easier to find what the bottle-neck is.
2) Upgrade to MySQL 5.6
In all the benchmarks that I have seen and done 5.6 performs better than 5.5
3) Configure your database to take advantage of your resources
Depending on the storage engine and the memory allocated in your machine configure MySQL for example set innodb_buffer_pool_size to 70% of the (DEDICATED) RAM
4) Monitor MySQL and check slow query log
Slow query log shows the queries that are slow and inefficient
5) Learn to use EXPLAIN
EXPLAIN shows query plan in MySQL run EXPLAIN on slow queries to tune them
6) Use Key-Value Stores to Cache queries
Using Memcached or Redis cache queries so they don't hit your database and return repeated queries from the memory
7) Increasing IOPS and Scaling Out
Increasing IOPS and getting better hardware helps but using efficient queries is much more effective. Queries and application most of the time are a greater contributing factor to performance problems
8) Replication
To help with concurrency consider moving to a MySQL Master/Slave replication , if you still had issues.
Final Note: use EBS because the storage on EC2 is ephemeral and will not persistent.
We recently did extensive research on the performance bottlenecks associated with massive end-user peaks across our global customer base, and the analysis actually indicates the database as - by far - the most frequent cause of slowdowns or even crashes. The report (https://queue-it.com/trend-report) includes best practice advice from our customers on how to improve the situation, which you may find helpful.

MySQL active connections at once, Windows Server

I have read every possible answer to this question and searched via Google in order to find the correct answer to the following question, but I am rather a novice and don't seem to get a clear understanding.
A lot I've read has to do with web servers, but I don't have a web server, but an intranet database.
I have a MySQL dsatabase in a Windows server at work.
I will have many users accessing this database constantly to perform simple queries and writting back to it new records.
The read/write will not be that heavy (chances are 50-100 users will do so exactly at the same time, even if 1000's could be connected).
The GUI will be either via Excel forms and/or Access.
What I need to know is the maximum number of active connections I can have at any given time to the database.
I know I can change the number on Mysql Admin however I really need to know what will really work...
I don't want to put 1000 users if the system will really handle 100 correctly (after that, although connected, the performance will be too slow, for example)
Any ideas or own experiences will be appreciated
This depends mainly on your server hardware (RAM, cpu, networking) and server load for other processes if not dedicated to the database. I think you won't have an absolute answer and the best way is testing.
I think something like 1000 should work ok, as long as you use 64 bit MySQL server. With 32 bit, too many connections may create virtual memory pressure - a connection has an own thread, and every thread needs a stack, so the stack memory will reduce possible size of the buffer pool and other buffers.
MySQL generally does not slow down if you have many idle connections, however special commands e.g "show processlist" or "kill", that enumerate every connection will be somewhat slower.
If idle connection stays idle for too long (idle time exceeds wait_timeout parameter), it is dropped by the server. If this is the case in your possible scenario, you might want to increase wait_timeout (its default value is 8 hours)