Every couple of days we have been getting a small number of MySql timeout errors that correspond with a large spike in CPU and DB connections on our MySQL RDS instance. These are queries that are typically very fast (<5ms) that suddenly timeout.
At this point, database operations are very slow for a minute or so (likely because new connections are being allocated). The number of new connections often doubles and seem to correspond to the entire Connection Pool being recycled.
The timeouts do not seem to correspond with heavy database load. The CPU is often under 7% when this happens spiking up to around 12%.
Once these connections are created, the old connections seem to stay around for several hours.
We have some theories:
An occasional network hiccup between EC2 and RDS
A connection pool recycle (is there such a thing?)
Resource contention on the server that backs up all queries (no deadlocks present)
Any help on debugging this would be very much appreciated.
System Details:
Windows 2012 EC2 instances
.NET 4.5
MySql Connector 6.8.3
Entity Framework 6.0.2
MySql.Data.Entities 6.8.3
MySql 5.6.12 (Hosted in Amazon's RDS)
I wanted to put this as a comment not an answer but "...must have 50 reputation to comment..."
Are you maxing out on connections? show variables like 'max_connections'; show process_list; (as root user)
How's your disk I/O: iostat -x 5 via command line and pay special attention to queue sizes & service/wait times. If its an issue you can purchase AWS reserved IOPS for better reliability & performance.
You can profile it - i like Jet Profiler, simple & low load.
Related
I have an RDS instance hosting a mySQL database. Instance size is db.t2.micro
I also have an ExpressJS backend connecting to the mySQL RDS instance via a connection pool:
Additionally i have a mobile app, the client, feeding off the ExpressJS API.
The issue i'm facing is, either via the mobile app or via Postman, there are times where i get a 'Too many connections' error and therefore several requests fail:
On the RDS instance. On current activity i sometimes get 65 connections, showing it's reaching the limit. What i need clarity on is:
When 200 mobile app instances connect to the API, to the RDS instance, does it register as 200 connections or 1 connection from ExpressJS?
Is it normal to be reaching the RDS instance 65 connection limit?
Is this just a matter of me using db.t2.micro instance size which is not recommended for prod? Will upgrading the instance size resolve this issue?
Is there something i'm doing wrong with my requests?
Thank you and your feedback is appreciated.
If your app creates a connection pool of 100, that's the number of database connections it will try to open. It must be lower than your MySQL connection limit.
Typically connection pools open all the connections for the pool, so they are ready when a client calls the http API. The connections might normally be running no SQL queries, if there are not many clients using the API at a given moment. The database connections are nevertheless connected.
Sort of like when you ssh to a remote linux server but you just sit there at a shell prompt for a while before running any command. You're still connected.
You asked if a db.t2.micro instance was not recommended for production. Yes, I would agree with that. It's tempting to use the smallest instance possible to save money, but a db.t2.micro is too small for anything but light testing, in my opinion.
In fact, I would not use any t2 instance for production, regardless of size. The t2 type uses "burstable" performance. This means it can provide only brief periods of good performance. Once the instance depletes its performance credits, they recharge slowly, and while they recharge, the performance of that instance is very low. This is okay for testing, but not for production, if you expect to provide consistent performance at any time.
I’m working on a relatively small application, serving about 1,500 users and running on a Mysql database that is about 300 megs. The entire system runs on AWS with a single dedicated EC2 node running the Grails application on Tomcat 8 and a single dedicated Mysql RDS instance. The system has been running live in production for about three years with no database issues. The two largest tables contain about 40k records. The application is built using Grails and Java 1.7.
Yesterday our application began throwing the following exception, with the underlying error message of:
"Got error 28 from storage engine"
The logs available from the RDS admin web console are empty.
Googling has not revealed any promising leads that have helped us resolve the issue, other than most messages point out that the disk is out of space. Given that most search results refer to disk space. Being software developers rather than DBA's with significant Mysql expertise, we boosted the storage space of the Mysql RDS instance. Unfortunately, today our application is still sporadically throwing the same exception. Having created our Mysql DRS instance with 15 gigs of space -- which is several orders of magnitude of additional space than our application makes use of -- we are at a lost as to what is the root cause of this issue. Our guess is that there is possibly some out of the box Mysql limitation that we are hitting up against but have no idea what it may be or how to solve it. Indeed, the whole reason we host onRDS was to avoid issues of this type.
Doing some Googling, this seems like a somewhat common Mysql error but that does not have any concrete trail for us to follow. Most suggestions talk about checking the filesystem or "inode" space. Given that this is a hosted Mysql RDS instance on AWS, I am unsure if or how to check such things. Looking at the CloudWatch for the RDS instance, I can see that the CPU is idle and that the instance is dramatically under the 15 gig storage limit.
Does anyone have any suggestions for us to investigate?
Given that we are new to RDS, can you please point us to any documentation or -- even better -- suggest what settings we can tweak in the RDS console -- to help prevent this error from occuring? Ideally, we moved to RDS thinking that if this was a mysql sizing or scaling issue that moving to RDS would solve the problem. As a last resort, this morning we then deleted about 20k rows of unessential data. Unfortunately, the issue persists and we continue to experience the issue.
A few questions:
Are there any RDS settings we can adjust to avoid this issue?
Can this be solved by moving to a larger RDS instance, perhaps with more memory?
Would we experience this issue if we moved to Aurora?
Well from your comment this is definitely a low on storage issue. Because 13 Gb is very less storage. you can check the free available storage in the dashboard. Check this screenshot below in the "Storage" metric under monitoring if it goes ahead of the red line you will start getting Error 28. You will have to increase the storage of your RDS instance or free up some space. I will suggest increase the storage to avoid this issue in future.
I have a rails activerecord project that has been scaled out to serve approximately 60-100k requests per minute. We use AWS and it takes about 5 xlarge c4 ec2 instances to serve this many requests. We have optimized the system to serve 99.99% of those requests off of a redis cache rendering our mysql DB barely used.
This is great and all but we keep running into connection limits for mysql. Amazon RDS apparently limits the number of connections we can have and it seems silly for upping our RDS instance size just so that we can have a larger number of sleeping connections. We literally profiled the RDS server and it maybe gets 10-50 queries a day depending on how many times we update the system.
Is there any way to keep the activerecord connection pool from reserving connections?
We tried simply lowering the connection pool and it helped in lowering connections, but then we started getting:
(ActiveRecord::ConnectionTimeoutError) "could not obtain a database connection within 5 seconds
What I would like to achieve is for the rails project to stop trying to pre-allocate connections and only open then up when they are necessary. I'm not well-versed enough in the rails framework and activerecord to understand how the system is reserving connections and why we are getting ConnectionTimeoutErrors even though the application isn't even making any DB calls.
When the system runs out of memory, ubuntu 12.04 kills the mysql process:
Out of memory: Kill process 17074 (mysqld) score 146 or sacrifice child
So the process ends up killed.
This happens at peaks of server load and mainly because of apache getting wild and eating the remaining available memory. Possible approaches could be:
Change somewhere somehow the priority of mysql, so it's not killed (probably a bad fix as something else will be killed)
Monitor the status of mysql and restart automatically whenever it's killed (the one I'm thinking about, but don't know how to do it).
How do you see it?
Abrupt termination of a database server is a very serious crash. You need to avoid this in a production system, because it may not restart cleanly.
The database server is a shared resource, and should almost never terminate in an unplanned fashion in production. The only thing that should cause unplanned termination is a catastrophic hardware or power failure. Most properly configured production data base servers have an unplanned termination once every ten years or less frequently. Seriously.
What to do?
Fix your apache configuration. Limit the number of worker threads and processes it can use, so it can't run wild. Learn how to do this. It's vital. See here: http://httpd.apache.org/docs/current/mod/mpm_common.html#maxrequestworkers
Fix the defects in your web app that are causing your apache to run wild.
If you can, move your mysqld server to a different server machine from apache, so the two don't contend for the same hardware resources.
Configure your mysqld to limit the number of connections it will accept from apache worker threads or other clients. Your web app probably handles the situation where a worker thread needs to wait for a connection. See here. http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_connections
Are you on an EC2 micro instance? You need to do some serious tuning. See here: http://ubuntuforums.org/showthread.php?t=1979049
You can check mysql status every minute (with cron) and restart if it is crashed:
* * * * * service mysql status | grep running || service mysql restart
Recently we changed app server of our rails website from mongrel to passenger [with REE and Rails 2.3.8]. The production setup has 6 machines pointing to a single mysql server and a memcache server. Before each machine had 5 mongrel instance. Now we have 45 passenger instance as the RAM in each machine is 16GB with 2, 4 core cpu. Once we deployed this passenger set up in production. the Website became so slow. and all the request starting to queue up. And eventually we had to roll back.
Now we suspect that the cause should be the increased load to the Mysql server. As before there where only 30 mysql connection and now we have 275 connection. The mysql server has the similar set up as our website machine. bUt all the configs were left to the defaul limit. The buffer_pool_size is only 8 mb though we have 16GB ram. and number of Concurrent threads is 8.
Will this increased simultaneous connection to mysql would have caused mysql to respond slowly than when we had only 30 connections? If so, how can we make mysql perform better with 275 simultaneous connection in place.
Any advice greatly appreciated.
UPDATE:
More information on the mysql server:
RAM : 16GB CPU: two processors each having 4 cores
Tables are innoDB. with only default innodb config values.
Thanks
An idle MySQL connection uses up a stack and a network buffer on the server. That is worth about 200 KB of memory and zero CPU.
In a database using InnoDB only, you should edit /etc/sysctl.conf to include vm.swappiness = 0 to delay swapping out processes as long as possible. You should then increase innodb_buffer_pool_size to about 80% of the systems memory assuming a dedicated database server machine. Make sure the box does not swap, that is, VSIZE should not exceed system RAM.
innodb_thread_concurrency can be set to 0 (unlimited) or 32 to 64, if you are a bit paranoid, assuming MySQL 5.5. The limit is lower in 5.1, and around 4-8 in MySQL 5.0. It is not recommended to use such outdated versions of MySQL in a machine with 8 or 16 cores, there are huge improvements wrt to concurrency in MySQL 5.5 with InnoDB 1.1.
The variable thread_concurrency has no meaning inside a current Linux. It is used to call pthread_setconcurrency() in Linux, which does nothing. It used to have a function in older Solaris/SunOS.
Without further information, the cause for your performance problems cannot be determined with any security, but the above general advice may help. More general advice geared at my limited experience with Ruby can be found in http://mysqldump.azundris.com/archives/72-Rubyisms.html That article is the summary of a consulting job I once did for an early version of a very popular Facebook application.
UPDATE:
According to http://pastebin.com/pT3r6A9q , you are running 5.0.45-community-log, which is awfully old and does not perform well under concurrent load. Use a current 5.5 build, it should perform way better than what you have there.
Also, fix the innodb_buffer_pool_size. You are going nowhere with only 8M of pool here.
While you are at it, innodb_file_per_table should be ON.
Do not switch on innodb_flush_log_at_trx_commit = 2 without understanding what that means, but it may help you temporarily, depending on your persistence requirements. It is not a permanent solution to your problems in any way, though.
If you have any substantial kind of writes going on, you need to review the innodb_log_file_size and innodb_log_buffer_size as well.
If that installation is earning money, you dearly need professional help. I am no longer doing this as a profession, but I can recommend people. Contact me outside of Stack Overflow if you want.
UPDATE:
According to your processlist, you have very many queries in state Sending data. MySQL is in this state when a query is being executed, that is, the main interior Join Loop/Query Execution loop is busy. SHOW ENGINE INNODB STATUS\G will show you something like
...
--------------
ROW OPERATIONS
--------------
3 queries inside InnoDB, 0 queries in queue
...
If that number is larger than say 4-8 (inside InnoDB), 5.0.x is going to have trouble. 5.5.x will perform a lot better here.
Regarding the my.cnf: See my previous comments on your InnoDB. See also my comments on thread_concurrency (without innodb_ prefix):
# On Linux, this does exactly nothing.
thread_concurrency = 8
You are missing all innodb configuration at all. Assuming that you ARE using innodb tables, you are not performing well, no matter what you do.
As far as I know, it's unlikely that merely maintaining/opening the connections would be the problem. Are you seeing this issue even when the site is idle?
I'd try http://www.quest.com/spotlight-on-mysql/ or similar to see if it's really your database that's the bottleneck here.
In the past, I've seen basic networking craziness lead to behaviour similar to what you describe - someone had set up the new machines with an incorrect submask.
Have you looked at any of the machine statistics on the database server? Memory/CPU/disk IO stats? Is the database server struggling?