I'm seeing a few of these errors during high load times:
mysql_connect() [<a
href='function.mysql-connect'>function.mysql-connect</a>]: [2002] Resource
temporarily unavailable (trying to connect via
unix:///var/lib/mysql/mysql.sock)
From what I can tell the mysql server isn't hitting its max connections limit, but there's something else stopping it from serving the query. What other limits would MySQL be hitting?
I'm running RHEL 6.2 64bit with MySQL 5.5.21
Let's assume your system is currently Unix-based (as given in your problem statement). If this is correct, here's the set of issues you may be running into:
You've run out of memory available to MySQL.
This is the most likely problem you're facing. Each connection in MySQL's connection pool requires memory to function, and if this resource is exhausted, no further connections can be made. Of course, the memory footprints and maximum packet sizes of various operations can be tuned in your equivalent to my.cnf if you discover this to be an issue.
Here's an additional thread that can help there, but you may also consider using simpler profiling tools like top to get a good ballpark estimate of what's going on.
You've run out of file descriptors available to your MySQL user account.
Another common issue: if you're trying to service requests that require file IO above the 1,024 boundary (by default), you will run into cases where the operation simply fails. This is because most systems specify a soft and hard limit on the number of open file descriptors each user can have available at one time, and walking over this threshold can cause problems.
This will usually have a series of glaringly obvious signs expressed in your log files. Check /var/log/messages and your comparable directories (for example, /var/log/mysql to see if you can find anything interesting.
You've run into a livelock or deadlock scenario where your thread is unsatisfiable.
Corollary to memory and file descriptor exhaustion, threads can time out if you've overstepped the computational load your system is capable of handling. It won't throw this error message, but this is something to watch out for in the future.
Your system is running out of PIDs available to fork.
Another common scenario: fork only has so many PIDs available for its use at any given time. If your system is simply overforked, it will cease to be able to service requests.
The easiest check for this is to see if any other services can connect through to the machine. For example, trying to SSH into the box and discovering that you cannot is a big clue.
An upstream proxy or connection manager has run out of resources and ceased servicing requests.
If you have any service layer between your client and MySQL, it bears inspecting to see if it has crashed, hung, or otherwise become unstable. The advice above applies.
Your port mapper has exhausted itself after 65,536 connections.
Unlikely, but again, a possible exhaustion case. Checking the trivial service connection as above is, ehm, also the best port of call here.
In short: this is a resource exhaustion scenario, inclusive of the server simply being "down". You're going to have to profile your system further to see what you're blocking on. All the error message gives us in this case is the fact the resource is unavailable to the client -- we'd need to see more information about the server to determine a more adequate remedy.
I still haven't found which limits it was hitting, but I did manage to work around the problem. There was a problem with our session table (in vbulletin) which uses the MEMORY engine. The indexes for this table were HASH and thus when vbulletin purged this table once an hour it would lock the table just long enough to hold up other queries and push mysql to the limit of its resources.
By changing the indexes to BTREE this allowed MySQL to delete the rows from the session table a lot quicker and avoid any limits there were reached previously. The errors only started when we upgraded our master db server to MySQL 5.5, so I'm guessing MEMORY tables are handled differently in the latest release.
See http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/ for information on speed increases from using BTREE indexes over HASH For MEMORY.
Geez, this could be so many things. It could be that the socket buffer space is exhausted. It could be that mysql is not accepting connections as fast as they are coming in and the backlog limit is reached (though I'd expect that to give you a "Connection Refused" error, I don't know for sure that's what you'll get for a Unix domain socket). It could be any of the things #MrGomez pointed out.
Since you are running Apache and MySQL on the same server and this is a problem under high load, it could well be that Apache is starving the system of some resource and you're just not seeing (noticing?) the dropped/failed incoming connections/requests in your logs.
Are you using connection pooling? If not, I'd start there.
I'd also look for errors in the Apache logs and syslog around the same time as the mysql_connect error and see what else turns up. I'd especially recommend getting MySQL moved over to its own separate dedicated server.
In my case, I was working with JSON data types with PDO (PHP Driver).
I was using fetch to retrieve one item but forgot to add LIMIT 1 to the query. Adding it solved the problem.
Related
i have a database that thousands of users need to connect to (via ODBC) for very brief periods (it's a subscription licensing database for a win32 desktop app). They connect, get their approval to run and disconnect).
max_connections is set to 1000 but am not seeing the re-use i would expect server side. i.e. server currently has about 800 processes/connections sleeping (and another 200 connected to real data in other databases on the same server) .... yet a new attempt by a client app was rejected 'too many connections'.
What am i missing?
have increased the max_connections for now to 1500 but if that just means another 500 sleeping connections it's not a long term solution. pretty sure clients are disconnecting properly but am adding some diagnostics to the win32 app just in case.
MariaDB 10.3.11
with MySQL ODBC 5.3 ANSI Driver
It's normal to see a lot of sessions "Sleeping". That means the client is connected, but not executing a query at this moment. The client is likely doing other tasks, before or after running an SQL query. Just like if you are logged into a server with ssh, most of the time you're just sitting at the shell prompt not running any program.
It's up to you to design your clients to wait to connect until they need data, then disconnect promptly after getting their data. It's pretty common in apps that they connect to the database at startup, and remain connected. It's also pretty common in some frameworks to make multiple connections at startup, and treat them as a pool that can be used by multiple threads of the client app. It's your app, so you should configure this as needed.
Another thing to try is to enable the thread pool in the MariaDB server. See https://mariadb.com/kb/en/thread-pool-in-mariadb/
This is different from a client-side connection pool. The thread pool allows many thousands of clients to think they're connected, without allocating a full-blown thread in the MariaDB server for every single connection. When a client has something to query, at that time it is given one of the threads. When that client is done, it may continue to maintain a connection, but the thread in the MariaDB server is reallocated to a different client's request.
This is good for "bursty" workloads by many clients, and it sounds like your case might be a good candidate.
We have a 2GB Digital Ocean server, and it is dedicated for a MySQL server of other two PHP servers. we are using Percona MySQL Server 5.6 on this server. We configured MySQL replication and these configuration is working fine
Our issue is sometime our site monitoring tools reporting that some of the URL hosted with this server is down (May be this is happening once in a week or two). When I am checking, I could see that Mysql Master server load is too much high (May be 35 - 40), so the MySQL server was not responded. # that I usually do a MySQl service restart, this restart cause to server load become normal and the sites started working after service restart.
This is a back-end MySQL database server of 20-25 PHP applications (WordPress, Drupal and some custom applications server).
Here are my questions,
Why this server load automatically goes down, after a spikes happens?
Is there any way in which database is causing issues? So that I can identify the application too.
How can I identify the root cause of this issues
Depending upon your working dataset, a 2GB server providing access for 20-25 PHP applications (WordPress, Drupal and some custom applications server) could be the issue.
For example, if you have a 1.4GB buffer pool (assuming all tables are InnnoDB) and 10GB of data, then your various applications could end up competing for resources, such as I/O, buffer pool pages, Adaptive Hash Index, query cache. They could also, assuming caching is used, be invalidating theit caches within a similar timeframe, thus sending expensive queries to the database.
Whilst a load of 50 is something that you would normally want to avoid, the load average is not something that you should concern yourself with if showing in isolation.
The use of the uninterruptible state has since grown in the Linux
kernel, and nowadays includes uninterruptible lock primitives. If the
load average is a measure of demand in terms of running and waiting
threads (and not strictly threads wanting hardware resources), then
they are still working the way we want them to.
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
If the issue is happening once per week then it is starting to sound like a batch process, or cache expiration issue - too much happening at once for the resources available.
The best thing to do is to monitor and look for the cause. Since you are already using Percona Server, using PMM should give you the perfect insight to find the cause, although it works with Oracle MySQL, MariaDB, Aurora, etc. You can try a demo to see the insights that you can gain:
https://pmmdemo.percona.com. The software is Open Source and free to use.
You can look in QAN to find the most expensive queries, whilst looking at Prometheus data to give an insight into the host itself. There are some recommendations to get the most from PMM, depending upon your flavour of MySQL.
FINAL UPDATE: We fixed this problem by finding a way to accomplish our goals without forking. But forking was the cause of the problem.
---Original Post---
I'm running a ruby on rails stack, our mysql server is separate, but housed at the same site as our app servers. (we've tried swapping it out for a different mysql server with double the specs, but no improvement was seen.
during business hours we get a handful of these from no particular query.
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
most of the queries that fail are really simple, and there seems to be no pattern between one query and another. This all started when I upgraded from Rails 4.1 to 4.2.
I'm at a loss as to what to try. Our database server is less than 5% CPU throughout the day. I do get bug reports from users who have random interactions fail due to this, so it's not queries that have been running for hours or anything like that, of course when they retry the exact same thing it works.
Our servers are configured by cloud66.
So in short: our mysql server is going away for some reason, but it's not because of lack of resources, it's also a brand new server as we migrated from another server when this problem started.
this also happens to me on localhost while developing features sometimes, so I don't believe it's a load issue.
We're running the following:
ruby 2.2.5
rails 4.2.6
mysql2 0.4.8
UPDATE: per the first answer below I increased our max_connections variable to 500 last night, and confirmed the increase via
show global variables like 'max_connections';
I'm still getting dropped connection, the first one today was dropped only a few minutes ago....
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
UPDATE: I just set net_read_timeout = 60 (it was 30 before) I'll see if that helps
UPDATE: It didn't help, I'm still looking for a solution...
Heres my Database.yml with credentials removed.
production:
adapter: mysql2
encoding: utf8
host: localhost
database:
username:
password:
port: 3306
reconnect: true
The connection to MySQL can be disrupted by a number of means, but I would recommend revisiting Mario Carrion's answer since it's a very wise answer.
It seems likely that connection is disrupted because it's being shared with the other processes, causing communication protocol errors...
...this could easily happen if the connection pool is process bound, which I believe it is, in ActiveRecord, meaning that the same connection could be "checked-out" a number of times simultaneously in different processes.
The solution is that database connections must be established only AFTER the fork statement in the application server.
I'm not sure which server you're using, but if you're using a warmup feature - don't.
If you're running any database calls before the first network request - don't.
Either of these actions could potentially initialize the connection pool before forking occurs, causing the MySQL connection pool to be shared between processes while the locking system isn't.
I'm not saying this is the only possible reason for the issue, as stated by #sloth-jr, there are other options... but most of them seem less likely according to your description.
Sidenote:
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
Each process could hold a number of connections. In your case, you might have up to 500X36 connections. (see edit)
In general, the number of connections in the pool can often be the same as the number of threads in each process (it shouldn't be less than the number of thread, or contention will slow you down). Sometimes it's good to add a few more depending on your application.
EDIT:
I apologize for ignoring the fact that the process count was referencing the MySQL data and not the application data.
The process count you showed is the MySQL server data, which seems to use a thread per connection IO scheme. The "Process" data actually counts active connections and not actual processes or threads (although it should translate to the number of threads as well).
This means that out of possible 500 connections per application processes (i.e., if you're using 8 processes for your application, that would be 8X500=4,000 allowed connections) your application only opened 36 connections so far.
This indicates a timeout error. It's usually a general resource or connection error.
I would check your MySQL config for max connections on MySQL console:
show global variables like 'max_connections';
And ensure the number of pooled connections used by Rails database.yml is less than that:
pool: 10
Note that database.yml reflects number of connections that will be pooled by a single Rails process. If you have multiple processes or other servers like Sidekiq, you'll need to add them together.
Increase max_connections if necessary in your MySQL server config (my.cnf), assuming your kit can handle it.
[mysqld]
max_connections = 100
Note other things might be blocking too, e.g. open files, but looking at connections is a good starting point.
You can also monitor active queries:
select * from information_schema.processlist;
as well as monitoring the MySQL slow log.
One issue may be a long-running update command. If you have a slow-running command that affects a lot of records (e.g. a whole table), it might be blocking even the simplest queries. This means you could see random queries timeout, but if you check MySQL status, the real cause is another long-running query.
Things you did not mention but you should take a look:
Are you using unicorn? If so, are your reconnecting and disconnecting in your after_fork and before_fork?
Is reconnect: true set in your database.yml configuration?
Well,at first glance this sounds like your webserver is keeping the mysql sessions open and sometimes a user runs into a timeout. Try disabling the keep mysql sessions alive.
It will be a hog but you only use 5% ...
other tipps:
Enable the mysql "Slow Query Log" and take a look.
write a short script which pulls and logs the mysql processlist every minute and cross check the log with timeouts
look at the pool size in your db connection or set one!
http://guides.rubyonrails.org/configuring.html#database-pooling
should be equal to the max-connections mysql likes to have!
Good luck!
Find out if your database is limited in terms of multiple connections. Because normally a SQL database is supposed to have more than one active connection.
(Contact your network provider)
Would you mind posting some of your queries? The MySQL documentation has this to say about it:
https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html
TL;DR:
Network problems; are any of your boxes renewing leases
periodically, or experiencing other network connection errors
(netstat / ss), firewall timeouts, etc. Not sure how managed your
hosts are by cloud66....
Query timed out. This can happen if you've got commands backed up
behind blocking statements (eg, alters/locking backups on MyISAM
tables). How simple are your queries? No cartesian products in-play?
EXPLAIN query could help.
Exceeding MAX_PACKET_SIZE. Are you storing pictures, video content, etc.?
There are lots of possibilities here, and without more information, will be difficult to pinpoint this.
Would look first at mysql_error.log, then work your way from the DB server back to your application.
UPDATE: this didn't work.
Heres the solution, special thanks to #Myst for pointing out that forking can cause issues, I had no idea to look at this particular code. As the errors seemed random because we forked in this fashion in several places.
It turns out that when I was forking processes, rails was using the same database connection for all forked processes, This created a situation where when one of the processes (the parent process?) terminated the database connection, the remaining process would have its connection interrupted.
The solution was to change this code:
def recalculate_completion
Process.fork do
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
end
end
into this code:
def recalculate_completion
ActiveRecord::Base.remove_connection
Process.fork do
ActiveRecord::Base.establish_connection
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
ActiveRecord::Base.remove_connection
end
ActiveRecord::Base.establish_connection
end
Making this change stopped the errors from our servers and everything appears to be working well now. If anyone has any more info as to why this worked I would be happy to hear it, as I would like to have a deeper understanding of this.
Edit: it turns out this didn't work either.... we still got dropped connections but not as often.
If you have query cache enabled, please reset it and it should work.
RESET QUERY CACHE;
I’m working on a relatively small application, serving about 1,500 users and running on a Mysql database that is about 300 megs. The entire system runs on AWS with a single dedicated EC2 node running the Grails application on Tomcat 8 and a single dedicated Mysql RDS instance. The system has been running live in production for about three years with no database issues. The two largest tables contain about 40k records. The application is built using Grails and Java 1.7.
Yesterday our application began throwing the following exception, with the underlying error message of:
"Got error 28 from storage engine"
The logs available from the RDS admin web console are empty.
Googling has not revealed any promising leads that have helped us resolve the issue, other than most messages point out that the disk is out of space. Given that most search results refer to disk space. Being software developers rather than DBA's with significant Mysql expertise, we boosted the storage space of the Mysql RDS instance. Unfortunately, today our application is still sporadically throwing the same exception. Having created our Mysql DRS instance with 15 gigs of space -- which is several orders of magnitude of additional space than our application makes use of -- we are at a lost as to what is the root cause of this issue. Our guess is that there is possibly some out of the box Mysql limitation that we are hitting up against but have no idea what it may be or how to solve it. Indeed, the whole reason we host onRDS was to avoid issues of this type.
Doing some Googling, this seems like a somewhat common Mysql error but that does not have any concrete trail for us to follow. Most suggestions talk about checking the filesystem or "inode" space. Given that this is a hosted Mysql RDS instance on AWS, I am unsure if or how to check such things. Looking at the CloudWatch for the RDS instance, I can see that the CPU is idle and that the instance is dramatically under the 15 gig storage limit.
Does anyone have any suggestions for us to investigate?
Given that we are new to RDS, can you please point us to any documentation or -- even better -- suggest what settings we can tweak in the RDS console -- to help prevent this error from occuring? Ideally, we moved to RDS thinking that if this was a mysql sizing or scaling issue that moving to RDS would solve the problem. As a last resort, this morning we then deleted about 20k rows of unessential data. Unfortunately, the issue persists and we continue to experience the issue.
A few questions:
Are there any RDS settings we can adjust to avoid this issue?
Can this be solved by moving to a larger RDS instance, perhaps with more memory?
Would we experience this issue if we moved to Aurora?
Well from your comment this is definitely a low on storage issue. Because 13 Gb is very less storage. you can check the free available storage in the dashboard. Check this screenshot below in the "Storage" metric under monitoring if it goes ahead of the red line you will start getting Error 28. You will have to increase the storage of your RDS instance or free up some space. I will suggest increase the storage to avoid this issue in future.
I have a Rails 4.x application running on server A and MySQL on server B.
Using ab to do a load test of my API calls I notice that the MySQL server is showing CPU activity. So I go back to the code and check, but no SQL statements are triggered, to be sure I also deactivate all before filters, but still the MySQL server shows CPU load.
I went to MySQL and run
show processlist;
but that also shows no active SQL statements
Why would there be load on the DB server?
A Rails application initializes connection pools to the configured database on app load and also loads basic schema data for each ActiveModel defined to populate runtime mappings from the DB to instances of that model.
These connections/queries will happen as soon as you have loaded the application and running traffic.
If this is not what is responsible for the activity on your database server, you will need to use other tools to see what is responsible. For example, NewRelic's system monitoring tools are great for snapshotting CPU/memory usage over time correlated to what processes were running. This will help you rule out MySQL itself using resources vs. other things running on the DB server.
According to this article, storage engines like Innodb may have their own per thread and/or global memory allocations which is probably accounting for the CPU overhead. If this is a stock (non-tuned) MySql install, you're probably just seeing baseline CPU activity. The article mentions a number of places to look that might indicate areas that can be tuned to reduce this footprint.