I'm using a MySQL-Server from AWS RDS. I would like to inspect the queries made by the app to optimize them. My problem is, that almost every query is longer than 1024 chars (which is the max-size, as stated here).
So I cannot identify the query by the first 1024 chars, as thats only the SELECT-Part - the interesting parts WHERE, ORDER, and so on are truncated. Since the app uses an ORM-System, I cannot change the queries to shorten them.
Already tried to increase the option performance_schema_max_digest_length in the parameter-group to 4096, but that has no effect (no change can be seen in the options directly on the server).
What can I do?
That is a static parameter, so rebooting your instance may resolve the issue. Please see the following explanation from the AWS docs:
When you change a static parameter and save the DB parameter group,
the parameter change takes effect after you manually reboot the DB
instance. You can reboot a DB instance using the RDS console or by
explicitly calling the RebootDbInstance API action (without failover,
if the DB instance is in a Multi-AZ deployment). The requirement to
reboot the associated DB instance after a static parameter change
helps mitigate the risk of a parameter misconfiguration affecting an
API call, such as calling ModifyDBInstance to change DB instance class
or scale storage
Working with Parameter groups
Now, around three years later, I discovered a page in the documentation about that topic which answers my question:
Accessing more SQL text in the Performance Insights dashboard
So for me it now works as follows:
parameters max_digest_length and performance_schema_max_digest_length are high enough (8192 does it in my case)
in performance insights, tab "Top SQL" only the first 500 chars of the general query are shown. When clicked on the plus on the left, it expands and a query with parameters is visible. Click on the dot in the first column to see the full query in the area below.
I have an AWS RDS MySql Cluster. I'm trying to Auto Scale on Mass Write operations, but unable to do so. But, when I'm Running Read Queries it Scales properly. I'm getting "Too Many Connections" error on write. Can anyone let me know what I'm doing wrong? Thanks in advance.
[Edit: 1]
Screenshot of AWS RDS Cluster Config
I've kept the connection limit to 2 because I was testing.
When I'm sending Multiple read requests to AWS RDS I can see new Instances being launched in my RDS Instances Section:
I've also set Scale In Cool Time to 0 so that it will launch a new Instance Instantly. When I'm reading from the database using read endpoint, Auto Scaling is working properly. But when I'm trying to insert data using write endpoint, Auto-Scaling is not working.
Your question is short on specifics so I will list some possible ways to figure this out and possible solve it.
RDS scaling takes time, so you cannot expect that your DB will increase in capacity instantly when a sudden spike of traffic exceeds its current capacity.
The maximum number of connects to a MySQL instance is set by max_connections in your parameter group. How many connections are happening and what is the max_connections value? This value affects memory usage, so be review any changes. Note: Increasing this value does not always help if there is a bug in your client code that erroneously creates too many connections. If the number of peak connections is exceeding the max_connections value, sometimes you just need to scaled up to a larger instance. Details determine the correct solution.
Use MySQL's Gobal Status History and look into what happens and when. This is useful for detecting locking or memory issues.
I have a database running on Amazons RDS platform and it does not seem to be using the full amount of memory available.
The instance type is db.m4.xlarge, this should give me 16 GiB of memory but when i look at the monitoring page it shows I am reaching the threshold with a current value of 2460 MB.
When I look at the db-parameter-group it shows that the innodb_buffer_pool_size should be the 3/4 of the DBinstanceClassMemory however when i check the actual value set (by logging into the db and running show global variables) is it set as 12465471488 (i assume this is bytes?)
Does anyone know why this is and what options I should set to make the RDS instance take full advantage of the memory that is available?
The number shown in the console is free memory -- not used memory. It's arguably counter-intuitive, but that's what's being shown, here. Note that the small bar graph adjacent to the number is mostly full, not mostly empty.
QUESTION OUTLINE
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage. (see below the question update for detailed problem description)
Question Update
So after more than one month of investigating and some developer support by AWS, I am not exactly closer to a solution.
Here are a couple of steps which I checked off the list, more or less without any further hint of the problem:
Index / Fragmentation (all tables have correct indexes/keys and have no fragmentation)
MySQL Stats Update (manually updating stats source)
Thread Concurrency (changing innodb_thread_concurrency to various different parameters)
Query Cache Hit Ratio doesn't show problems
EXPLAIN to see if any SELECTs are actually slow or not using indexes/keys
SLOW QUERY LOG (returns no results, because see paragraph below, it's a number of prepared SELECTs)
RDS and EC2 are within one VPC
For explanation, the used PlayFramework (2.3.8) has BoneCP and we are using eBeans to select our data. So basically I am running through a nested object and all those child objects, this produces a couple of hundred prepared SELECTs for the API call in question. This should basically also be fine for the used hardware, neither CPU nor RAM are extensively used by these operations.
I also included NewRelic for more insights on this issue and did some JVM profiling. Obviously, most of the time is consumed by NETTY/eBeans?
Is anyone able to make sense of this?
ORIGINAL QUESTION: Problem Outline
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage.
Infrastructure
We run a PlayFramework backend for a mobile app on AWS EC2 instances, connected to AWS RDS MySQL instances, one PROD environment, one DEV environment. Usually the PROD EC2 instance is pointing to the PROD RDS instance, and the DEV EC2 points to the DEV RDS (hi from captain obvious!); however sometimes we also let the DEV EC2 point to the PROD DB for some testing purposes. The PlayFramework in use is working with BoneCP.
Detailed Problem Description
In a quite essential sync process, our app is making a certain API call many times a day per user. I discussed the backgrounds of the functionality in this SO question, where, thanks to comments, I could nail the problem down to be a MySQL issue of some kind.
In short, the API call is loading a set of data, the maximum is about 1MB of json data, which currently takes about 18s to load. When things are running perfectly fine, this takes about 4s to load.
Curious enough, what "solved" the problem last time was upgrading the RDS instance to another instance type (from db.m3.large to db.m4.large, which is just a very marginal step). Now, after about 2-3 weeks, the RDS instance is once again performing slow as before. Rebooting the RDS instance showed no effect. Also re-launching the EC2 instance shows no effect.
I also checked if the indices of the affected mySQL tables are set properly, which is the case. The API call itself is not eager-loading any BLOB fields or similar, I double-checked this. The CPU-usage of the RDS instances is below 1% most of the time, when I stress tested it with 100 simultaneous API calls, it went to ~5%, so this is not the bottleneck. Memory is fine too, so I guess the RDS instance doesn't start swapping which could slow down the whole process.
Giving hard evidence, a (smaller) public API call on the DEV environment currently takes 2.30s load, on the PROD environment it takes 4.86s. Which is interesting, because the DEV environment has both in EC2 and RDS a much smaller instance type. So basically the turtle wins the race here. (If you are interested in this API call I am happy to share it with you via PN, but I don't really want to post links to API calls, even if they are basically public.)
Conclusion
Concluding, it feels (I wittingly say 'feels') like the DB is clogged after x days of usage / after a certain amount of API calls. Not sure if this a RDS-specific issue, once I 'largely' reset the DB instance by changing the instance type, things run fast and smooth. But re-creating my DB instance from a snapshot every 2 weeks is not an option, especially if I don't understand why this is happening.
Do you have any ideas what further steps I could take to investigate this matter?
(Too long for just a comment) I know you have checked a lot of things, but I would like to look at them with a different set of eyes...
Please provide
SHOW VARIABLES; (probably need post.it or something, due to size)
SHOW GLOBAL STATUS;
how much RAM? Sounds like 7.5G
The query. -- Unclear what query/queries you are using
SHOW CREATE TABLE for the table(s) in the query -- indexes, datatypes, etc
(Some of the above may help with "clogging over time" question.)
Meanwhile, here are some guesses/questions/etc...
Some other customer sharing the hardware is busy.
It could be a network problem?
Shrink long_query_time to 1 so you can catch slow queries.
When are backups done on your instance?
4s-18s to load a megabyte -- what percentage of that is SQL statements?
Do you "batch" the inserts? Is it a single transaction? Are lengthy queries going on at the same time?
What, if any, MySQL tunables did you change from the AWS defaults?
6GB buffer_pool on a 7.5GB partition? That sounds dangerously tight. Can you see if there was any swapping?
Any PARTITIONing involved? (Of course the CREATE will answer that.)
There is one very important bit of information missing from your description: The total allocated space for the database. I/O for RDS is around 3x the allocated space, so for a 100GB allocation, you should get around 300 IOPS. That allocated space also includes logs.
Since you don't really know what's going on, the first step should be to turn on detailed monitoring, which will give you more idea of what is happening on the instance.
Until you have additional stats gathered during a slowdown, you can try increasing the allocated space, which will increase the IOPS available.
Also, check the events for the db - are logs getting purged on a regular basis? That might indicate that there's not enough space.
Finally, you can try going with PIOPS (provisioned IOPS) if you have an idea of what the application needs, though at this point it sounds like that would be a guess.
maybe your burst credit balance is (slowly) being depleted? finally, you end up with baseline performance, which may appear "too slow".
this would also explain why the upgrade to another instance type did help, as you then start with a full burst balance again.
i would suggest to increase the size of the volume, even if you don't need the extra space, as the baseline performance grows linearly with volume size.
RDS uses the same set of parameters for all instance sizes and they don't seem to be that far from defaults (max_packet_size being too small etc.)
Has anyone looked at them and reviewed them for how sensible they are, especially on relatively bigger instances?
The default buffer size for MyISAM is too small in RDS parameter group, I have made changes of that, make significant improvement in read/write performance. However, just beware, not all MySQL parameters are honored by RDS, only those ones shown in parameter group.
you can always change the parameter set to fit your application.