I have an Ubuntu LAMJ server running Tomcat6.
One of my JSP applications freezes every couple of days and I am having trouble figuring out why. I have to reboot tomcat to get that one app going again, as it won't cone back on its own. I am getting nothing in my own log4j logs for that app, and can't see anything in Catalina.out either.
This applications shares a javax.sql.DataSource resource with another, via a context element in the server.xml file. I don't think this is the cause of the problem, but I may as well mention it.
Could anyone point me in the right direction to find the cause of this intermittent issue?
thanks in advance,
Christy
Get a Thread dump of the running server
There are two options
Use VisualVM
in your %java_home%/bin folder there will be a file called jvisualvm. Run this and connect to your tomcat server. Click the Threads tab and then "Thread Dump"
Manually from the Command Line
open up a command line and find the process id for your tomcat
ps -ef | grep java
Once you identify the process ID for the running tomcat instance,
kill -3 <pid>
replace the process Id here. This will send your thread dump to the stdout for your tomcat. Most likely catalina.out file.
edit - As per Mark's comments below:
It is normal to take 3 thread dumps ~10s apart and compare them. It
makes it much easier to see which threads are 'stuck' and which ones
are moving
Once you have the thread dump you can analyse it for stuck threads. It may not be stuck threads as the problem, but at least you can see what is going on inside the server to analyze the problem further.
Related
I'm unsure if my mysql is actually being monitor by monit. See screenshot.
As you can see under processes the mysqld process is not being monitored (it failed a sew times first) but under files there is mysql_bin and mysql_rc both of which are OK.
Is it safe to remove the mysql monitoring symbolic link or do i need this anyway?
thanks
Short answer is no. Some more info:
That both of the entries in the File section are OK does not relate to any kind of working or running application. The File section of monit simply checks file state information, such as, but not limited to, last modification time, size, file hashes, etc.
So basically the two OKs in File section just tell you that the mysql files are present and have not been changed.
The entry inside Process section is what you want to focus an. It checks the presence of a running mysqld process on your system. You need to check if the configuration of that entry inside monitrc (or included files) is looking for the right parameters. It should watch out for a process using a pidfile and could potentially check if a connection could be established.
See monit docs on MySQL and/or paste your monit config for any in-depth help regarding monit.
PS: #Shadow: The question (more likely: the resulting problem) has nothing to do with a DB, but is a question about monitoring.
I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.
I have a webapp with a Rails API running in a Puma server with a MySQL database.
I´m running a batch process that is writing a lot of logs to a log file. In a specific moment after some time running the Puma server goes down. In the logs everythink looks fine until aparently with no reason it gets down. But I don´t see any error. Neither in the puma.err. Not sure if it´s related to the server or maybe the database (something related with the pool?). I´m blind. I don´t know where to look at to debug the problem.
UPDATE: I think I have narrowed down the problem. I have realised that the Puma server gets down always when trying to load the same element. I tried to make a test in development and it works there. So, I think I know what is killing it. I´m not sure but strongly suspect the element I´m trying to load in my batch process run a lot of queries within a single transaction in the MySQL database. I think somehow the transaction is not able to process all the queries and for some reason the Puma server gets down. I don´t know if this makes sense, but this is my main suspect. I have read something about transaction sizes and log files How do I determine maximum transaction size in MySQL?
My new question is: if this is really happening, can I see an error related to this in any log file (Puma or Mysql)?
UPDATE 2: I attach enviroment information:
DEVELOPMENT: MacOS, Processor: 2,4GHz, RAM: 8GB
PRODUCTION: Ubuntu14, Processor: 2,5GHz, RAM: 3,7GB (AWS instance)
Regarding the Puma configuration, I´m not very skilled but I´m starting the server with a config file with just two lines in both development and configuration environments:
stdout_redirect 'path_to_log_file.log', 'path_to_error_file.log', true
bind 'unix:///tmp/puma.sock'
When starting up I neither see and workers configuration, so I assume I have only one worker and the default number of threads 0 to 16.
My server got stucked last night because of database connection error.
I investigated it is caused by too many database connections. After a research from google and stackoverflow, I didn't get any useful information. While I am trying to investigate all plugins one by one to see if any of them has a bug or something did this, I would like to ask your helps..
First of all, when I logged in to MySQL I can see a lot of SLEEP queries with NULL info there. I tried to use command line to kill all sleep queries but there still more requests fill all connections right away.
The weird thing is, the apache server is not actually getting high volumn of requests. I am actually using AWS RDS as my database server so the apache and mysql is not on same server. The RDS server doesn't have public access so I am sure all requests are only from my apache server. The cpu usage on apache server is not high. Also, I searched the apache's access_log there are not a lot requests at that time. And I cannot find anything wrong with these requests. Especially there is no requests is performing injection attack. I think it is possible some thing triggered in the code so I searched 'SLEEP' in all my code but can only find some in the w3 total cache plugin, which the code blocks in this plugin is not easily get reached..I turned off the XML-RPC in apache level so it shouldn't be the XML-RPC attack.
I know there are a lots of possibility since I am using about twenty plugins in my site, but it is really weird I cannot find any possible requests caused this on apache level. Is it possible any requests can hit the server without being recorded in access_log?
I am pretty new to configure apache and mysql on my own and still learning these features..Thanks in advance for helping me!
I've desperately tried to figure out what's happened here, but haven't seen this particular problem anywhere. I've 'inherited' (as in, not built any of it myself) management of a database server (remote, in a data warehouse, accessed by ssh) where some php daemons are running on a Linux server acting as data crawlers, inserting and processing information in a relatively steady stream into mysql.
A couple of days ago, the server crashed and came back on again. I logged in an restarted the mysql server and the crawlers, thinking no more of it. A day and a half later, the mysql server stopped working, and I couldn't diagnose it since I couldn't log into it, nor did it respond to "/etc/init.d/mysql stop" or varieties thereof. According to the log file, it kept throwing errors very regularly (once every four minutes and 16 seconds) and said that it had too many file handlers open. When I shut down the crawlers, however, I could log in again, but mysql kept throwing the errors. I checked lsof and it showed a lot of open sockets with "can't identify protocol" error.
mysqld 28843 mysql 1990u sock 0,4 2856488 can't identify protocol
mysqld 28843 mysql 1989u sock 0,4 2857220 can't identify protocol
^Thousands of these rows
I thought it was something the crawlers had done, and I restarted mysql and the failed sockets disappeared. But I was surprised to see that mysql kept opening new ones, even when the crawlers weren't running. It did this very regularly, about two new failed sockets a minute, regardless of whether the crawlers were active or not. I increased the maximum amount of filehandlers allowed for mysql to buy some time, but I'm obviously looking for a diagnosis and permanent solution.
All descriptions of such errors (socket leaks) that I've found on forums seems to be about your own software leaking, not closing its sockets. But this seems to be mysql itself that does it, and there has been no change in any of the code from when it worked fine, just a server crash and restart.
Any ideas?