Need handler for worker timeout - gunicorn

I need to be able to do follow-up processing if gunicorn reports a worker timeout and "murders" the worker. It looks like worker_exit is only called if the worker process had crashed, but not if it is unresponsive. How do I figure out whether one or more workers are timing out?
Thanks.

Ok, I figure out this is the solution: http://docs.python.org/2/library/signal.html
I would just need to set an alarm signal where I suspect there could be a timeout.

Related

Will my server be able to run only one client if its a single-threaded process. If yes, why?

I have googled decent enough to understand threads and processes. One thing I am confused is about single-threaded process.
The scenario is the Server-Client application process where each client is treated as a single process in the server that is single-threaded.
Some blogs are saying that single-threaded program will not be able to handle multiple clients at a time ie., it can start one process and only after finishing it, another process can be started. But some blogs saying its possible.
According to my understanding, in a single-core processor system, if its programmed for multi-tasking, depending on the time-slice allocated for each processes, more than one process can be concurrently handled. In multi-processor system also, more than one client process can be handled in parallel.
Is it just the web servers that does not handle more than one process at a time because it is iterative server? If its anyother concurrent server, will it be handling more than one process, without waiting for each process to get completed to handle the next one..?
As I am confused by many different explanations in different blogs, I am expecting a very accurate answer just for the above mentioned scenario, in either a single-processor with multi-tasking environment (or) a multi-processor environment. Also, would like to know if there are any benefits of using a single-threaded process over a process without any threads.
Sorry If my understanding is wrong. Kindly respond.
Thanks in advance..
If your server does this:
while(true)
listen()
accept()
read()
process()
write()
close()
And each of the calls to accept, read, process, write and close are single-threaded, then the server cannot listen for new requests again until those calls complete. The requests might queue up or simply get dropped if the server is not listening for them. If accept/read/process/write/close take a long time compared to listen(), then your server will not be available to service requests frequently. Moreoever, if some requests take 100 seconds and some requests take 1 second, and a lot of that time is blocking on I/O to e.g. a database or a web service, then you're spending most of your time in the server doing nothing. Here's the alternative:
while(true)
listen()
start new thread()
accept()
read()
process()
write()
close()
Now all you're doing synchronously is kicking off the new thread to handle the incoming request, and then immediately beginning to listen again. The OS will handle scheduling the threads and if some of them end up waiting a long time for some I/O to occur, your server can still process the fast requests in the meantime.

c3p0 getNumBusyConnectionsDefaultUser()... What does busy mean?

I know this is a very basic question, but I would like a deeper understanding of what "busy" means. I have done a double check that I close all my connections. I know that in c3p0, "the pool will intercept the call to close() and check the underlying Connection back into the pool." I would expect the number of busy connections to trend to zero, but this does not happen. Any ideas why? How long does a connection stay in the "busy" state? Shouldn't the connection become unbusy when I close it? Thx in advance.
OK... I finally figured this out by watching the DEBUG log statements from mchange. There is a "check for expired resources" every 5 seconds. If you get and use a connection just before that window and then you call "getNumConnectionsDefaultUser()" within that window you may not get an accurate count because the connection could be marked s busy and become unbusy before the window closes.
Essentially (I think), c3p0 does not maintain its own counters based on a change in status, it cycles through the collection of connections every five seconds checking current status.

How to increase time of app engine request handler as it abort each request in 60 sec?

I have an application deployed on GAE having endpoints. Each endpoint make a connection with database , get data and close connection and return data. Normally everything works fine but when there is hike in requests it starts taking more than 60 sec and requests get aborted. Due to this it does not close database connection and there mysql got 1000+ connections and then each requests starts aborting and it shows deadline exceeded error. Is there any solution for this ?
You could wrap the "get data" portion with a try... finally... statement and move the "close connection" portion in the finally section. Then start an "about to exceed deadline" timer before "get data" (something like say 45 seconds) and raise an exception if the timer expires, allowing you to close the connection in the finally portion, which should take care of the orphan open connections (but would not prevent errors in those requests).
If your application tolerates it you could also look into using task queues which have a 10 min deadline, which could help reducing/eliminating the errors in the requests as well.
You can also find some general advice for addressing deadline exceeded errors here: https://cloud.google.com/appengine/articles/deadlineexceedederrors, donno if applicable to your app.
EDIT: actually the suggestion in the first paragraph above doesn't work on GAE as the Python sandbox doesn't allow installing a custom signal handler:
signal.signal(signal.SIGALRM, timer_expired)
AttributeError: 'module' object has no attribute 'signal'
After seeing your code a somehow equivalent solution would be to replace your cursor.fetchall() with a loop of cursor.fetchone() or cursor.fetchmany() to split your operation in smaller pieces:
http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-fetchone.html. You'd get a start timestamp (with time.time() for example) when entering your request handler. Then inside the loop you'd get another timestamp to measure the time elapsed so far since the start timestamp and you'd break out of the loop and close the DB connection when deadline expiration nears. Again, this won't help with actually replying successfully to the requests if it takes so much time to prepare the replies.
You can use this solution to close connections when deadlines are exceeded:
Dealing with DeadlineExceededErrors
This way you won't have any open connections hanging there forever.
Think about the design of your application -
1.Use the deadline exception handling - Design smells
Because there will be situation(s) where db operation takes more than 60 seconds , If its a simple query then its well and good , but reconsider the design of the application . User Expierence is going to be hurt.
2.Lets change the design to use the endpoints-
https://cloud.google.com/appengine/docs/java/endpoints/
The way to go ,future proof.
3.Using Back-end or Task-queues as descibed in this post
Max Time for computation on Google App Engine
You can set the Timeouts
interactive_timeout
and / or
wait_timeout
based on connection Type they use one of them

Handling doctrine 2 connections in long running background scripts

I'm running PHP commandline scripts as rabbitmq consumers which need to connect to a MySQL database. Those scripts run as Symfony2 commands using Doctrine2 ORM, meaning opening and closing the database connection is handled behind the scenes.
The connection is normally closed automatically when the cli command exits - which is by definition not happening for a long time in a background consumer.
This is a problem when the consumer is idle (no incoming messages) longer then the wait_timeout setting in the MySQL server configuration. If no message is consumed longer than that period, the database server will close the connection and the next message will fail with a MySQL server has gone away exception.
I've thought about 2 solutions for the problem:
Open the connection before each message and close the connection manually after handling the message.
Implementing a ping message which runs a dummy SQL query like SELECT 1 FROM table each n minutes and call it using a cronjob.
The problem with the first approach is: If the traffic on that queue is high, there might be a significant overhead for the consumer in opening/closing connections. The second approach just sounds like an ugly hack to deal with the issue, but at least i can use a single connection during high load times.
Are there any better solutions for handling doctrine connections in background scripts?
Here is another Solution. Try to avoid long running Symfony 2 Workers. They will always cause problems due to their long execution time. The kernel isn't made for that.
The solution here is to build a proxy in front of the real Symfony command. So every message will trigger a fresh Symfony kernel. Sound's like a good solution for me.
http://blog.vandenbrand.org/2015/01/09/symfony2-and-rabbitmq-lessons-learned/
My approach is a little bit different. My workers only process one message, then die. I have supervisor configured to create a new worker every time. So, a worker will:
Ask for a new message.
If there are no messages, sleep for 20 seconds. If not, supervisor will think there's something wrong and stop creating the worker.
If there is a message, process it.
Maybe, if processing a message is super fast, sleep for the same reason than 2.
After processing the message, just finish.
This has worked very well using AWS SQS.
Comments are welcomed.
This is a big problem when running PHP-Scripts for too long. For me, the best solution is to restart the script some times. You can see how to do this in this Topic: How to restart PHP script every 1 hour?
You should also run multiple instances of your consumer. Add a counter to any one and terminate them after some runs. Now you need a tool to ensure a consistent amount of worker processes. Something like this: http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4-managing-worker/

What causes mysterious hanging threads in Colfusion -> mysql communication

One of the more interesting "features" in Coldfusion is how it handles external requests. The basic gist of it is that when a query is made to an external source through <cfquery> or or any other external request like that it passes the external request on to a specific driver and at that point CF itself is unable to suspend it. Even if a timeout is specified on the query or in the cfsetting it is flatly ignored for all external requests.
http://www.coldfusionmuse.com/index.cfm/2009/6/9/killing.threads
So with that in mind the issue we've run into is that somehow the communication between our CF server and our mySQL server sometimes goes awry and leaves behind hung threads. They have the following characteristics.
The hung thread shows up in CF and cannot be killed from FusionReactor.
There is no hung thread visible in mySQL, and no active running query (just the usual sleeps).
The database is responding to other calls and appears to be operating correctly.
Max connections have not been reached for the DB nor the user.
It seems to me the only likely candidate is that somehow CF is making a request, mySQL is responding to that request but with an answer which CF ignores and continues to keep the thread open waiting for a response from mySQL. That would explain why the database seems to show no signs of problems, but CF keeps a thread open waiting for the mysterious answer.
Usually these hung threads appear randomly on otherwise working scripts (such as posting a comment on a news article). Even while one thread is hung for that script, other requests for that script will go through, which would imply that the script isn't neccessarily at fault, but rather the condition faced when the script was executed.
We ran some test to determine that it was not a mysql generated max_connections error... we created a user, gave it 1 max connections, tied that connection with a sleep(1000) query and executed another query. Unfortunately, it correctly errored out without generating a hung thread.
So, I'm left at this point with absolutely no clue what is going wrong. Is there some other connection limit or timeout which could be causing the communication between the servers to go awry?
One of the things you should start to look at is the hardware between the two servers. It is possible that you have a router or bridge or NIC that is dropping occasional packets. This can result in the mySQL box thinking it has completed the task while the CF server sits there and waits for a complete response indefinitely, creating a hung thread.
3com has some details on testing for packet loss here: http://support.3com.com/infodeli/tools/netmgt/tncsunix/product/091500/c11ploss.htm#22128
We had a similar problem with a MS SQL server. There, the root cause was a known issue in which, for some reason, the server thinks it's shutting down, and the thread hangs (even though the server is, obviously, not shutting down).
We weren't able to eliminate the problem, but were able to reduce it by turning off pooled DB connections and fiddling with the connection refresh rate. (I think I got that label right -- no access to administrator at my new employment.) Both are in the connection properties in Administrator.
Just a note: The problem isn't entirely with CF. The problem, apparently, affects all Java apps. Which does not, in any way, reduce how annoyed I get by this.
Long story short, but I believe the caused was due to Coldfusion's CF8 image processing. It was just buggy and now in CF9 I have never seen that problem again.