How to debug why Rails app in Puma server goes down - mysql

I have a webapp with a Rails API running in a Puma server with a MySQL database.
I´m running a batch process that is writing a lot of logs to a log file. In a specific moment after some time running the Puma server goes down. In the logs everythink looks fine until aparently with no reason it gets down. But I don´t see any error. Neither in the puma.err. Not sure if it´s related to the server or maybe the database (something related with the pool?). I´m blind. I don´t know where to look at to debug the problem.
UPDATE: I think I have narrowed down the problem. I have realised that the Puma server gets down always when trying to load the same element. I tried to make a test in development and it works there. So, I think I know what is killing it. I´m not sure but strongly suspect the element I´m trying to load in my batch process run a lot of queries within a single transaction in the MySQL database. I think somehow the transaction is not able to process all the queries and for some reason the Puma server gets down. I don´t know if this makes sense, but this is my main suspect. I have read something about transaction sizes and log files How do I determine maximum transaction size in MySQL?
My new question is: if this is really happening, can I see an error related to this in any log file (Puma or Mysql)?
UPDATE 2: I attach enviroment information:
DEVELOPMENT: MacOS, Processor: 2,4GHz, RAM: 8GB
PRODUCTION: Ubuntu14, Processor: 2,5GHz, RAM: 3,7GB (AWS instance)
Regarding the Puma configuration, I´m not very skilled but I´m starting the server with a config file with just two lines in both development and configuration environments:
stdout_redirect 'path_to_log_file.log', 'path_to_error_file.log', true
bind 'unix:///tmp/puma.sock'
When starting up I neither see and workers configuration, so I assume I have only one worker and the default number of threads 0 to 16.

Related

MySQL 5.7 default install running much slower than 5.6

I have a java / spring / hibernate app running with connections to a MySQL db.
We recently upgraded from 5.6 to 5.7 (on a Windows server) and the app has gone from taking 3 hours to 3 days to complete. It essentially uses hibernate connections to retrieve read only data from the db before processing it and dumping the result elsewhere.
However as a first step, partly to check it was the upgraded version that is causing the problem, I installed 5.7.21 on my dev machine. I then noticed that even doing a db restore took several hours rather than what used to take about 10 minutes on 5.6. This has lead me to believe it may be more of a config issue than 'drivers' being out of date (I did think my first step was going to be upgrading app dependencies). I didn't install the server but I installed my dev machine with a default 'developer' install. Both the server and the dev machine are 64 bit Windows.
I've had a scoot around for obvious gotchas and not found anything yet. I just wondered if anyone could point me in the right direction before I start seriously thrashing about ? I have a good basic understanding of out of the box MySQL but I haven't done much config so even pointers to likely suspects in my.ini and best ways to investigate would be helpful.
When upgrading, pay attention to innodb_buffer_pool_size variable value.
It controls how much memory MySQL uses to make I/O operations faster. Usually, this is the one that makes it fly or that makes it crawl like a snail. There's a lot to be written about this particular variable, there's a plethora of excellent blog posts about it so I'll avoid explaining it into detail.
To see the current value, type in MySQL terminal:
SHOW VARIABLES LIKE '%innodb_buffer_pool%';
Change the value in config file and restart MySQL.
For the value, don't go overboard, don't exceed your entire RAM. You want it as high as possible, especially for servers with a lot of data.

Lost connection to MySQL server during query on random simple queries

FINAL UPDATE: We fixed this problem by finding a way to accomplish our goals without forking. But forking was the cause of the problem.
---Original Post---
I'm running a ruby on rails stack, our mysql server is separate, but housed at the same site as our app servers. (we've tried swapping it out for a different mysql server with double the specs, but no improvement was seen.
during business hours we get a handful of these from no particular query.
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
most of the queries that fail are really simple, and there seems to be no pattern between one query and another. This all started when I upgraded from Rails 4.1 to 4.2.
I'm at a loss as to what to try. Our database server is less than 5% CPU throughout the day. I do get bug reports from users who have random interactions fail due to this, so it's not queries that have been running for hours or anything like that, of course when they retry the exact same thing it works.
Our servers are configured by cloud66.
So in short: our mysql server is going away for some reason, but it's not because of lack of resources, it's also a brand new server as we migrated from another server when this problem started.
this also happens to me on localhost while developing features sometimes, so I don't believe it's a load issue.
We're running the following:
ruby 2.2.5
rails 4.2.6
mysql2 0.4.8
UPDATE: per the first answer below I increased our max_connections variable to 500 last night, and confirmed the increase via
show global variables like 'max_connections';
I'm still getting dropped connection, the first one today was dropped only a few minutes ago....
ActiveRecord::StatementInvalid: Mysql2::Error: Lost connection to MySQL server during query
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
UPDATE: I just set net_read_timeout = 60 (it was 30 before) I'll see if that helps
UPDATE: It didn't help, I'm still looking for a solution...
Heres my Database.yml with credentials removed.
production:
adapter: mysql2
encoding: utf8
host: localhost
database:
username:
password:
port: 3306
reconnect: true
The connection to MySQL can be disrupted by a number of means, but I would recommend revisiting Mario Carrion's answer since it's a very wise answer.
It seems likely that connection is disrupted because it's being shared with the other processes, causing communication protocol errors...
...this could easily happen if the connection pool is process bound, which I believe it is, in ActiveRecord, meaning that the same connection could be "checked-out" a number of times simultaneously in different processes.
The solution is that database connections must be established only AFTER the fork statement in the application server.
I'm not sure which server you're using, but if you're using a warmup feature - don't.
If you're running any database calls before the first network request - don't.
Either of these actions could potentially initialize the connection pool before forking occurs, causing the MySQL connection pool to be shared between processes while the locking system isn't.
I'm not saying this is the only possible reason for the issue, as stated by #sloth-jr, there are other options... but most of them seem less likely according to your description.
Sidenote:
I ran select * from information_schema.processlist; and I got 36 rows back. Does this mean my app servers were running 36 connections at that moment? or can a process be multiple connections?
Each process could hold a number of connections. In your case, you might have up to 500X36 connections. (see edit)
In general, the number of connections in the pool can often be the same as the number of threads in each process (it shouldn't be less than the number of thread, or contention will slow you down). Sometimes it's good to add a few more depending on your application.
EDIT:
I apologize for ignoring the fact that the process count was referencing the MySQL data and not the application data.
The process count you showed is the MySQL server data, which seems to use a thread per connection IO scheme. The "Process" data actually counts active connections and not actual processes or threads (although it should translate to the number of threads as well).
This means that out of possible 500 connections per application processes (i.e., if you're using 8 processes for your application, that would be 8X500=4,000 allowed connections) your application only opened 36 connections so far.
This indicates a timeout error. It's usually a general resource or connection error.
I would check your MySQL config for max connections on MySQL console:
show global variables like 'max_connections';
And ensure the number of pooled connections used by Rails database.yml is less than that:
pool: 10
Note that database.yml reflects number of connections that will be pooled by a single Rails process. If you have multiple processes or other servers like Sidekiq, you'll need to add them together.
Increase max_connections if necessary in your MySQL server config (my.cnf), assuming your kit can handle it.
[mysqld]
max_connections = 100
Note other things might be blocking too, e.g. open files, but looking at connections is a good starting point.
You can also monitor active queries:
select * from information_schema.processlist;
as well as monitoring the MySQL slow log.
One issue may be a long-running update command. If you have a slow-running command that affects a lot of records (e.g. a whole table), it might be blocking even the simplest queries. This means you could see random queries timeout, but if you check MySQL status, the real cause is another long-running query.
Things you did not mention but you should take a look:
Are you using unicorn? If so, are your reconnecting and disconnecting in your after_fork and before_fork?
Is reconnect: true set in your database.yml configuration?
Well,at first glance this sounds like your webserver is keeping the mysql sessions open and sometimes a user runs into a timeout. Try disabling the keep mysql sessions alive.
It will be a hog but you only use 5% ...
other tipps:
Enable the mysql "Slow Query Log" and take a look.
write a short script which pulls and logs the mysql processlist every minute and cross check the log with timeouts
look at the pool size in your db connection or set one!
http://guides.rubyonrails.org/configuring.html#database-pooling
should be equal to the max-connections mysql likes to have!
Good luck!
Find out if your database is limited in terms of multiple connections. Because normally a SQL database is supposed to have more than one active connection.
(Contact your network provider)
Would you mind posting some of your queries? The MySQL documentation has this to say about it:
https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html
TL;DR:
Network problems; are any of your boxes renewing leases
periodically, or experiencing other network connection errors
(netstat / ss), firewall timeouts, etc. Not sure how managed your
hosts are by cloud66....
Query timed out. This can happen if you've got commands backed up
behind blocking statements (eg, alters/locking backups on MyISAM
tables). How simple are your queries? No cartesian products in-play?
EXPLAIN query could help.
Exceeding MAX_PACKET_SIZE. Are you storing pictures, video content, etc.?
There are lots of possibilities here, and without more information, will be difficult to pinpoint this.
Would look first at mysql_error.log, then work your way from the DB server back to your application.
UPDATE: this didn't work.
Heres the solution, special thanks to #Myst for pointing out that forking can cause issues, I had no idea to look at this particular code. As the errors seemed random because we forked in this fashion in several places.
It turns out that when I was forking processes, rails was using the same database connection for all forked processes, This created a situation where when one of the processes (the parent process?) terminated the database connection, the remaining process would have its connection interrupted.
The solution was to change this code:
def recalculate_completion
Process.fork do
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
end
end
into this code:
def recalculate_completion
ActiveRecord::Base.remove_connection
Process.fork do
ActiveRecord::Base.establish_connection
if self.course
self.course.user_groups.includes(user:[:events]).each do |ug|
ug.recalculate_completion
end
end
ActiveRecord::Base.remove_connection
end
ActiveRecord::Base.establish_connection
end
Making this change stopped the errors from our servers and everything appears to be working well now. If anyone has any more info as to why this worked I would be happy to hear it, as I would like to have a deeper understanding of this.
Edit: it turns out this didn't work either.... we still got dropped connections but not as often.
If you have query cache enabled, please reset it and it should work.
RESET QUERY CACHE;

Rails Resque workers fail with mysql "Too many connections"

We recently switched our (ruby) job queueing system from DelayedJob to Resque.
While our latency has gone down, and we've eliminated the database bottleneck, we're now seeing a new problem; one or more of our workers seems to leave a database connection open when it exits. When we look at the process list, there are hundreds of connections in a 'sleep' state. They eventually time out after 90 seconds. We've been throttling back our workers to keep from running out of client connections, but what we really need to find
out is which one (or more) of our jobs is not being polite when it disconnects using the mysql2 ruby client.
Any ideas how we could (1) find the culprits or (2) instrument our code so we can make sure that we are actually disconnecting before the job terminates?
Rails 4.0.x
Resque 1.25.2
mysql2 gem 0.3.16
Make sure your Resque process is disconnecting from the database before forking and re-establishing the connection afterwards. Create an initializer file config/initializers/resque.rb
which contains:
Resque.before_fork do
defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
end
Resque.after_fork do
defined?(ActiveRecord::Base) && ActiveRecord::Base.establish_connection
end
I gave up on Resque and moved to Sidekiq. I'm much happier now.

Tomcat Freezes just one application

I have an Ubuntu LAMJ server running Tomcat6.
One of my JSP applications freezes every couple of days and I am having trouble figuring out why. I have to reboot tomcat to get that one app going again, as it won't cone back on its own. I am getting nothing in my own log4j logs for that app, and can't see anything in Catalina.out either.
This applications shares a javax.sql.DataSource resource with another, via a context element in the server.xml file. I don't think this is the cause of the problem, but I may as well mention it.
Could anyone point me in the right direction to find the cause of this intermittent issue?
thanks in advance,
Christy
Get a Thread dump of the running server
There are two options
Use VisualVM
in your %java_home%/bin folder there will be a file called jvisualvm. Run this and connect to your tomcat server. Click the Threads tab and then "Thread Dump"
Manually from the Command Line
open up a command line and find the process id for your tomcat
ps -ef | grep java
Once you identify the process ID for the running tomcat instance,
kill -3 <pid>
replace the process Id here. This will send your thread dump to the stdout for your tomcat. Most likely catalina.out file.
edit - As per Mark's comments below:
It is normal to take 3 thread dumps ~10s apart and compare them. It
makes it much easier to see which threads are 'stuck' and which ones
are moving
Once you have the thread dump you can analyse it for stuck threads. It may not be stuck threads as the problem, but at least you can see what is going on inside the server to analyze the problem further.

PHP App (ExpressionEngine) slow to load on IIS 7, MySQL, FastCGI

I'm working on debugging a slowness issue I've got with running ExpressionEngine (a PHP application) on IIS 7.
I don't think this is actually an issue with ExpressionEngine, but rather an issue with my PHP/MySQL setup.
The problem shows itself thusly:
Go to webite address
IE "spins" for 10-15 seconds, waiting to load. During this time:
processor usage is minimal on the server, and PHP's process is inactive
I see a connection for the site user in MySQL, but the thread is in "sleeping" more.
There is plenty of free memory on the server
pretty much, the server is doing nothing
After 10-15 seconds, I see the connection MySQL run some really quick queries (very fast) and the site loads in under a second.
This is a fairly complex site, but it doesn't make any sense that the whole system is just sitting there waiting for 10 seconds - not processing anything. I'm using FastCGI on IIS7, which seems to be working fine, and to me this seems like some sort of a timeout issue where FastCGI, PHP, or maybe even MySQL is waiting for something, not getting it, and after the timeout occurs, continuing to process.
Anyone had similar experiences?
Thanks!
P.S. - I should also add that the database (MySQL) and PHP are running on the same server.
mysql might be trying to do a reverse DNS on the connection from the web server. if you don't need to filter mysql connections by DNS name, then add skip-name-resolve to your mysql configuration file
Twitter plugins cause lots of issues in CMS systems. usually there is a function that checks if twitter is there/alive. When twitter gets busy, this slows down the system (thats why it can appear as intermittent) Find the twitter plugin, and the routine that checks if twitter is there. Comment out this code and return true (ie. don't ask twitter if it is there, just assume it is)
I have no idea why, but the solution to this was to install PHP 5.3. I had PHP 5.2.10 running, and I guess 5.3 added some extensive optimizations for Windows. Or fixed some other weird problem - who knows.
Actually, after some further digging, it appears the problem was with the Twitter plugin. It waits 25 seconds to come back from Twitter sometime with an error. Maybe this is related to the DNS?