What causes mysterious hanging threads in Colfusion -> mysql communication - mysql

One of the more interesting "features" in Coldfusion is how it handles external requests. The basic gist of it is that when a query is made to an external source through <cfquery> or or any other external request like that it passes the external request on to a specific driver and at that point CF itself is unable to suspend it. Even if a timeout is specified on the query or in the cfsetting it is flatly ignored for all external requests.
http://www.coldfusionmuse.com/index.cfm/2009/6/9/killing.threads
So with that in mind the issue we've run into is that somehow the communication between our CF server and our mySQL server sometimes goes awry and leaves behind hung threads. They have the following characteristics.
The hung thread shows up in CF and cannot be killed from FusionReactor.
There is no hung thread visible in mySQL, and no active running query (just the usual sleeps).
The database is responding to other calls and appears to be operating correctly.
Max connections have not been reached for the DB nor the user.
It seems to me the only likely candidate is that somehow CF is making a request, mySQL is responding to that request but with an answer which CF ignores and continues to keep the thread open waiting for a response from mySQL. That would explain why the database seems to show no signs of problems, but CF keeps a thread open waiting for the mysterious answer.
Usually these hung threads appear randomly on otherwise working scripts (such as posting a comment on a news article). Even while one thread is hung for that script, other requests for that script will go through, which would imply that the script isn't neccessarily at fault, but rather the condition faced when the script was executed.
We ran some test to determine that it was not a mysql generated max_connections error... we created a user, gave it 1 max connections, tied that connection with a sleep(1000) query and executed another query. Unfortunately, it correctly errored out without generating a hung thread.
So, I'm left at this point with absolutely no clue what is going wrong. Is there some other connection limit or timeout which could be causing the communication between the servers to go awry?

One of the things you should start to look at is the hardware between the two servers. It is possible that you have a router or bridge or NIC that is dropping occasional packets. This can result in the mySQL box thinking it has completed the task while the CF server sits there and waits for a complete response indefinitely, creating a hung thread.
3com has some details on testing for packet loss here: http://support.3com.com/infodeli/tools/netmgt/tncsunix/product/091500/c11ploss.htm#22128

We had a similar problem with a MS SQL server. There, the root cause was a known issue in which, for some reason, the server thinks it's shutting down, and the thread hangs (even though the server is, obviously, not shutting down).
We weren't able to eliminate the problem, but were able to reduce it by turning off pooled DB connections and fiddling with the connection refresh rate. (I think I got that label right -- no access to administrator at my new employment.) Both are in the connection properties in Administrator.
Just a note: The problem isn't entirely with CF. The problem, apparently, affects all Java apps. Which does not, in any way, reduce how annoyed I get by this.

Long story short, but I believe the caused was due to Coldfusion's CF8 image processing. It was just buggy and now in CF9 I have never seen that problem again.

Related

Connections Option in RDS Mysql and best way to handle many connections

In the below image it shows current activity as 99 Connections.
How exactly it is counted.
RDS is accessed through node.js webservices, php website. Every time I do some operations I close the connection. So once after closing it doesn't decrease rather it keeps increasing. Later I got the too many connections error message once the connections became 608. I restarted then it works. I never seen it decreasing.
So what is the best way I can handle it.
Below is the image which is showing when I run SHOW FULL PROCESSLIST;
PHP-based web pages that use a MySQL connection generally exit as soon as they're done rendering page content, so the connection gets closed whether you explicitly call a mysqli or PDO close method or not.
The same is not true of Node services, which run for a long time and can therefore easily leak resources. It's probable that you're opening connections, but not closing them, in your Node service, which would produce the sort of behavior you're seeing here. (This is an easy mistake to make, especially for those of us whose background is largely in more ephemeral PHP scripts.)
One good way to identify the problem is to connect to the MySQL instance via Workbench or the console monitor and issue SHOW FULL PROCESSLIST; to get a list of currently active connections, their originating hosts, and the queries (if any) they are executing. This may help you narrow down the source of the leaking connections, so that you can identify the code at fault and repair it.

Handling doctrine 2 connections in long running background scripts

I'm running PHP commandline scripts as rabbitmq consumers which need to connect to a MySQL database. Those scripts run as Symfony2 commands using Doctrine2 ORM, meaning opening and closing the database connection is handled behind the scenes.
The connection is normally closed automatically when the cli command exits - which is by definition not happening for a long time in a background consumer.
This is a problem when the consumer is idle (no incoming messages) longer then the wait_timeout setting in the MySQL server configuration. If no message is consumed longer than that period, the database server will close the connection and the next message will fail with a MySQL server has gone away exception.
I've thought about 2 solutions for the problem:
Open the connection before each message and close the connection manually after handling the message.
Implementing a ping message which runs a dummy SQL query like SELECT 1 FROM table each n minutes and call it using a cronjob.
The problem with the first approach is: If the traffic on that queue is high, there might be a significant overhead for the consumer in opening/closing connections. The second approach just sounds like an ugly hack to deal with the issue, but at least i can use a single connection during high load times.
Are there any better solutions for handling doctrine connections in background scripts?
Here is another Solution. Try to avoid long running Symfony 2 Workers. They will always cause problems due to their long execution time. The kernel isn't made for that.
The solution here is to build a proxy in front of the real Symfony command. So every message will trigger a fresh Symfony kernel. Sound's like a good solution for me.
http://blog.vandenbrand.org/2015/01/09/symfony2-and-rabbitmq-lessons-learned/
My approach is a little bit different. My workers only process one message, then die. I have supervisor configured to create a new worker every time. So, a worker will:
Ask for a new message.
If there are no messages, sleep for 20 seconds. If not, supervisor will think there's something wrong and stop creating the worker.
If there is a message, process it.
Maybe, if processing a message is super fast, sleep for the same reason than 2.
After processing the message, just finish.
This has worked very well using AWS SQS.
Comments are welcomed.
This is a big problem when running PHP-Scripts for too long. For me, the best solution is to restart the script some times. You can see how to do this in this Topic: How to restart PHP script every 1 hour?
You should also run multiple instances of your consumer. Add a counter to any one and terminate them after some runs. Now you need a tool to ensure a consistent amount of worker processes. Something like this: http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4-managing-worker/

mysql connections. Should I keep it alive or start a new connection before each transaction?

I'm doing my first foray with mysql and I have a doubt about how to handle the connection(s) my applications has.
What I am doing now is opening a connection and keeping it alive until I terminate my program. I do a mysql_ping() every now and then and the connection is started with MYSQL_OPT_RECONNECT.
The other option (I can think of), would be to start a new connection before doing anything that requires my connection to the database and closing it after I'm done with it.
What are the pros and cons of these two approaches?
what are the "side effects" of a long connection?
What is the most used method of handling this?
Cheers ;)
Some extra details
At this point I am keeping the connection alive and I ping it every now and again to now it's status and reconnect if needed.
In spite of this, when there is some consistent concurrency with queries happening in quick succession, I get a "Server has gone away" message and after a while the connection is re-established.
I'm left wondering if this is a side effect of a prolonged connection or if this is just a case of bad mysql server configuration.
Any ideas?
In general there is quite some amount of overhead incurred when opening a connection. Depending on how often you expect this to happen it might be ok, but if you are writing any kind of application that executes more than just a very few commands per program run, I would recommend a connection pool (for server type apps) or at least a single or very few connections from your standalone app to be kept open for some time and reused for multiple transactions.
That way you have better control over how many connections get opened at the application level, even before the database server gets involved. This is a service an application server offers you, but it can also be rolled up rather easily if you want to keep it smaller.
Apart from performance reasons a pool is also a good idea to be prepared for peaks in demand. When a lot of requests come in and each of them tries to open a separate connection to the database - or as you suggested even more (per transaction) - you are quickly going to run out of resources. Keep in mind that every connection consumes memory inside MySQL!
Also you want to make sure to use a non-root user to connect, because if you don't (I think it is tied to the MySQL SUPER privilege), you might find yourself locked out. MySQL reserves at least one connection for an administrator for problem fixing, but if your app connects with that privilege, all connections would already be used up when you try to put out the fire manually.
Unless you are worried about having too many connections open (i.e. over 1,000), you she leave the connection open. There is overhead in connecting/reconnecting that will only slow things down. If you know you are going to need the connection to stay open for a while, run this query instead of pinging periodically:
SET SESSION wait_timeout=#
Where # is the number of seconds to leave an idle connection open.
What kind of application are you writing? If it's a webscript: keep it open. If it's an executable, pool your connections (if necessary, most of the times a singleton will do).

If a secondary database crashes, will it crash the primary coldfusion server?

Right now we are dealing with a bit of a conundrum in my corporate environment where we are being blamed for a server crash, but I'm not 100% sure we are the culprit. Here's the server environment: We have a primary Coldfusion and its MSSQL database. We then also have a secondary database (MySQL) hosted on a different cloud which is used for miscellaneous tasks. The main reason the system is structured this way is because the primary server is operated by our Content Management System thus we are not allowed to modify it, add tables, or any operations like that, so we use the alternate database for that. By design there are no mission critical items on it, and pages are built in a way such that if the alternate DB returns no rows, the pages will continue to render properly.
Basically, I am being told that when the alternate MySQL server goes down, or stops accepting connections, that it is taking the entire primary cloud with it, including 5 other sites hosted on it. I do not have access to the primary Coldfusion or database logs, because the CMS provider will not give them to me. Thus I can only judge based on the validity of the explanation they are giving me.
The explanation for this behavior coming from our CMS provider is that when Coldfusion queries a database it creates a thread, and that if the DB doesn't respond the threads continue to stack. Eventually the processor is capped, and the server goes down. Is that an accurate explanation of how Coldfusion operates? If so, is there anyway to prevent it, possibly with shorter DB timeouts and the like? Or is the entire explanation posed by our CMS a red herring and something else is really causing the crashes.
Any guidance would be greatly appreciated.
Question answered - Documents found
http://kb2.adobe.com/cps/180/tn_18061.html
http://www.adobe.com/devnet/server_archive/articles/cf_timeouts_and_unresponsive_requests.html
Setting timeout requests globally does not timeout internal processes waiting on external resources (cfquery/cfhttp etc). The only way to time those out is by manually setting the timeout attribute. Not setting this could result in thread overload and a crashed server as was occurring with us.
http://kb2.adobe.com/cps/180/tn_18061.html
From reading bullet point 3 and depending on your traffic, your CMS guy might be right.
Also from the link above:
If the database is down and unresponsive, how many times will ColdFusion Server try to reconnect to the database? Will it eventually restart the ColdFusion Server?
If the database is down or the network link to the database goes down when a query request occurs, the connection will timeout (you can customize the timeout period with the timeout attribute in the cfquery tag) and return an error to the user. Please note that the ability to set the timeout for the connection depends on which driver you are using. You can trap this error and handle it programmatically with thecftry/cfcatch tags.
The catch here is that the timeout variable on the cfquery tags are not compatable with the MySQL ODBC driver. Could not find what the default timeout is. Let's say 5 minutes. If you get more than one request in those 5 minutes, it does appear that the connections will start to 'pile up'.

Preventing Mongrel/Mysql Errno::EPIPE exceptions

I have a rails app that I have serving up XML on an infrequent basis.
This is being run with mongrel and mysql.
I've found that if I don't exercise the app for longer than a few hours it goes dead and starts throwing Errno::EPIPE errors. It seems that the mysql connection get timed out for inactivity or something like that.
It can be restarted with 'mongrel_rails restart -P /path/to/the/mongrel.pid' ... but that's not really a solution.
My collaborator expects the app to be there when he is working on his part (and I am most likely not around).
My question is:
What can I do to prevent this problem from occurring in the 1st place? (e.g. don't time me out!!).
Failing that, is there some code I can insert somewhere to automatically remake the Db connection?
Here's a solution:
https://boxpanel.blueboxgrp.com/public/the_vault/index.php/Mongrel_/_MySQL_Timeout
The timeouts on the above solution seem a little high to me. You don't want your DB timeouts to be too low, because of the amount of memory a connection can use. If a connection is orphaned, you want it to time out reasonably (like not in one week.)
In other places, I also got the following suggestions:
Try setting
config.active_record.verification_timeout to something lower than whatever
your mysql connection timeout setting is.
There's a gem to work around this problem: mysql_retry_lost_connection
http://rubyforge.org/projects/zventstools/
"Reconnect to the MySQL server when you hit a lost connection error".