Detecting hanging processes in Perl/MySQL (FreeBSD) - mysql

I have a Perl script running on a FreeBSD/Apache system, which makes some simple queries to a MySQL database via DBI. The server is fairly active (150k pages a day) and every once in a while (as much as once a minute) something is causing a process to hang. I've suspected a file lock might be holding up a read, or maybe it's a SQL call, but I have not been able to figure out how to get information on the hanging process.
Per Practical mod_perl it sounds like the way to identify the operation giving me the headache is either system trace, perl trace, or the interactive debugger. I gather the system trace is ktrace on FreeBSD, but when i attach to one of the hanging processes in top, the only output after the process is killed is:
50904 perl5.8.9 PSIG SIGTERM SIG_DFL
That isn't very helpful to me. Can anyone suggest a more meaningful approach on this? I am not terribly advanced in Unix admin, so your patience if I sound stupid is greatly appreciated.... :o)

If I understood correctly, your Perl process is hanging while querying the MySQL, which, by itself, is still operational. MySQL server has the embedded troubleshooting feature for that, the log_slow_queries option. Putting the following lines in your my.cnf enables the trick:
[mysqld]
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 10
After that, restart or reload the MySQL daemon. Let the server run for a while to collect the stats and analyse what's going on:
mysqldumpslow -s at /var/log/mysql/mysql-slow.log | less
On one server of mine, the top record (-s at orders by average query time, BTW) is:
Count: 286 Time=101.26s (28960s) Lock=14.74s (4214s) Rows=0.0 (0), iwatcher[iwatcher]#localhost
INSERT INTO `wp_posts` (`post_author`,`post_date`,`post_date_gmt`,`post_content`,`post_content_filtered`,`post_title`,`post_excerpt`,`post_status`,`post_type`,`comment_status`,`ping_status`,`post_password`,`post_name`,`to_ping`,`pinged`,`post_modified`,`post_modified_gmt`,`post_parent`,`menu_order`,`guid`) VALUES ('S','S','S','S','S','S','S','S','S','S','S','S','S','S','S','S','S','S','S','S')
FWIW, it is a WordPress with over 30K posts.

Ktracing only gives you system calls, signals I/O and namei processing. And it generates a lot of data very quickly. So it might not be ideal to fish out trouble spots.
If you can see the standard output for your script, put some strategically placed print statements in your code around suspected trouble spots. Then running the program should show you were the hang occurs:
print "Before query X"
$dbh->do($statement)
print "After query X".
If you cannot see the standard output, either use e.g. the Sys::Syslog perl module, or call FreeBSD's logger(1) program to write the debugging info to a logfile. It is probably easiest to encapsulate that into a debug() function and use that instead or print statements.
Edit: If you don't want a lot of logging on disk, write the logging info to a socket (Sys::Syslog supports that with setlogsock()), and write another script to read from that socket and dump the debug text to a terminal, prefixed with the time the data was received. Once the program hangs, you can see what it was doing.

Related

Kill Long Running Processes in MySQL

Scenario - you have hundreds of reports running on a slave machine. These reports are either scheduled by MySQL's event scheduler or are called via a Python/R or Shell script. Apart from that, there are fifty odd users who are connecting to MySQL slave running random queries. These people don't really know how to write good queries and that's fair. They are not supposed to. So, every now and then (read every day), you see some queries which are stuck because of read/write locks. How do you fix that.
What you do is that you don't kill whatever is being written. Instead, you kill all the read queries. Now, that is also tricky because, if you kill all the read queries, you will also let go off OUTFILE queries, which are actually write queries (they just don't write to MySQL, but write to disk).
Why killing is necessary (I'm only speaking for MySQL, do not take this out of context)
I have got two words for you - Slave lag. We don't want that to happen, because if that happens, all users, reports, consumers suffer.
I have written the following to kill processes in MySQL based on three questions
how long has the query been running?
who is running the query?
do you want to kill write/modify queries too?
What I have intentionally not done yet is that I have not maintained a history of the processes that have been killed. One should do that so as to analyse and find out who is running all the bad queries. But there are other ways to find that out.
I have create a procedure for this. Haven't spend much time on this. So, please suggest if this is a good way to do it or not.
GitHub Gist
Switch to MariaDB. Versions 10.0 and 10.1 implement several limits and timeouts: https://mariadb.com/kb/en/library/query-limits-and-timeouts/
Then write an API between what the users write and actually hitting the database. In this layer, add the appropriate limitations.

How to do non-obtrusive number-crunching on mysql db?

Not sure how to state this question.
I have a very busy DB in production with close to 1 million hits daily.
Now I would like to do some research on the real-time data (edit: "real-time" can be a few minutes old).
What is the best way to do this without interrupting production?
Ideas:
in the unix shell, there is the nice concept. It lets me give a low priority to a specific thread so it only uses CPU when the other threads are idle. I am basically looking for the same in a mysql context.
Get a DB dump and do the research offline:
Doesn't that take down my site for the several minutes it takes to get the dump?
Is there a way to configure the dump command so it does the extraction in a nice way (see above)?
Do the SQL commands directly on the live DB:
Is there a way, again, to configure the commands so they are executed in a nice way?
Update: What are the arguments against Idea 2?
From the comments on StackOverflow and in-person discussions, here's an answer for whoever gets here with the same question:
In MySQL, there seems not to be any nice type control over prioritization of processes (I hear there is in Oracle, for example)
Since any "number-crunching" is at most treated like one more visitor to my website, it won't take down the site performance-wise. So it can safely be run in production (read-only, of course...).

Handling doctrine 2 connections in long running background scripts

I'm running PHP commandline scripts as rabbitmq consumers which need to connect to a MySQL database. Those scripts run as Symfony2 commands using Doctrine2 ORM, meaning opening and closing the database connection is handled behind the scenes.
The connection is normally closed automatically when the cli command exits - which is by definition not happening for a long time in a background consumer.
This is a problem when the consumer is idle (no incoming messages) longer then the wait_timeout setting in the MySQL server configuration. If no message is consumed longer than that period, the database server will close the connection and the next message will fail with a MySQL server has gone away exception.
I've thought about 2 solutions for the problem:
Open the connection before each message and close the connection manually after handling the message.
Implementing a ping message which runs a dummy SQL query like SELECT 1 FROM table each n minutes and call it using a cronjob.
The problem with the first approach is: If the traffic on that queue is high, there might be a significant overhead for the consumer in opening/closing connections. The second approach just sounds like an ugly hack to deal with the issue, but at least i can use a single connection during high load times.
Are there any better solutions for handling doctrine connections in background scripts?
Here is another Solution. Try to avoid long running Symfony 2 Workers. They will always cause problems due to their long execution time. The kernel isn't made for that.
The solution here is to build a proxy in front of the real Symfony command. So every message will trigger a fresh Symfony kernel. Sound's like a good solution for me.
http://blog.vandenbrand.org/2015/01/09/symfony2-and-rabbitmq-lessons-learned/
My approach is a little bit different. My workers only process one message, then die. I have supervisor configured to create a new worker every time. So, a worker will:
Ask for a new message.
If there are no messages, sleep for 20 seconds. If not, supervisor will think there's something wrong and stop creating the worker.
If there is a message, process it.
Maybe, if processing a message is super fast, sleep for the same reason than 2.
After processing the message, just finish.
This has worked very well using AWS SQS.
Comments are welcomed.
This is a big problem when running PHP-Scripts for too long. For me, the best solution is to restart the script some times. You can see how to do this in this Topic: How to restart PHP script every 1 hour?
You should also run multiple instances of your consumer. Add a counter to any one and terminate them after some runs. Now you need a tool to ensure a consistent amount of worker processes. Something like this: http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4-managing-worker/

What are the limitations of running a SQL script from a file in Powershell?

I have a rather complex SQL script that does several things:
Drops an existing database
Reloads the database from a backup
Sets permissions
A few other miscellaneous db-specific tasks
I've saved this as a .sql file for mobility and ease of use, but I'd like to incorporate this into a Powershell script to make it even simpler to run (and to encapsulate a few other tasks that need to be done around the same process). Here are the bits of code I have in Powershell for the script:
add-pssnapin sqlservercmdletsnapin100
invoke-sqlcmd -inputfile "D:\Scripts\loaddatabase31_Loadtest.sql" -serverinstance "PerfSQL02" -hostname "PerfSQL02"
Assume that the serverinstance, inputfile, and hostname exist and are put in correctly.
The database I am restoring is a couple hundred gigabytes, so the actual drop and restore process takes around 20 minutes. Right now, I'm getting the following error when I try to run that script (it's being run from a workstation within the same network but on a different domain):
invoke-sqlcmd : Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
At line:2 char:1
I've tried using the -connectiontimeout switch on invoke-sqlcmd, but that doesn't seem to work to resolve this issue (so it's probably not timing out on connecting). Most of the examples I've seen of running SQL via Powershell don't use a file, but instead define the SQL within the script. I'd prefer not to do this due to the complexity of the SQL needed to run (and since it's dropping and restoring a database, I can't really wrap all of this up in a SP unless I use one of the System DBs).
Anyone have some ideas as to what could be causing that timeout? Does invoke-sqlcmd not work gracefully with files?
You don't want the -connectiontimeout, you likely want the -QueryTimeout
http://technet.microsoft.com/en-us/library/cc281720.aspx

Potential issues with a very long PHP script

I have a PHP script that runs once a day, and it takes a good 30 minutes to run (I think). Everything there is a safe & secure operation. I keep getting the 500 error after about 10~15 minutes of it. However I can't see anything in the logs etc. so I'm a bit confused.
So far the things I set up as "unlimited" are:
max_execution_time
max_input_time
default_socket_timeout
Also set these to obscenely high numbers just for this section (the folder in which the script runs)
memory_limit
post_max_size
The nature of this script is a SOAP type API that imports thousands of rows of data from a 3rd party URL, puts them into a local MySQL table, and then downloads images attached with each and every row, so the amount of data is significant.
I'm trying to figure out what other PHP variables etc. I'm missing in order to get this to complete through the whole thing. Other PHP vars I have set:
display_errors = On
log_errors = On
error_reporting = E_ALL & ~E_NOTICE & ~E_WARNING
error_log = "error_log"
There are three timeouts:
PHP Level: set_time_limit
Apache Level: Timeout
Mysql Level: Mysql Options
In your case seems like the Apache reached its timeout. In such situation it is better to use PHP CLI. But if you really need to do this operation in real-time. Then you can make use of Gearman through which you will achieve true parallelism in PHP.
If you need simple solution that trigger your script from normal HTTP request (Browser->Apache), you can run your back-end script (CLI script) as shell command from PHP but 'asynchronously'. More info can be found in Asynchronous shell exec in PHP
Try to use PHP Command-Line Interface (php-cli) to do lengthy task. Execution time is infinity in command line unless you set it / terminate it. Also you can set schedule by cron job.
Run it from command line with PHP (e.g. php yourscript.php) and this error shouldn't occur. Also it's not a good idea to use set_time_limit(0); you should at most use set_time_limit(86400). You can set a cron job to do this once per day. Just make sure that all filepaths in the script are absolute and not relative so it doesn't get confused.
Compiling the script might also help. HipHop is a great PHP compiler, then your script will run faster, use less memory, and can use as many resources as it likes. HipHop is just very difficult to install.
If the execution time is a problem, then maybe you should set the max_execution time using set_time_limit function inside the script:
set_time_limit(0);
I would also invoke the script on the command line using php directly, instead of through apache. In addition, print out some status messages, and pipe them into a log.
I suspect that your actual problem is that the script chokes on bad data somewhere along the line.