How to find source of Too Many Connections exception - mysql

I am getting intermittent "Too Many Connections" exceptions in my Django web-app. Having looked at other Stackoverflow questions regarding "Too many connections", it generally seems like it is an error in coding (ex. spawning a bunch of threads, etc.) that causes many sleeping connections.
I have used select * from information_schema.processlist order by host; to check for such connections and I can see maybe 1 or 2 at most but most sleeping connections lifespan are under 10s.
So, I am wondering 2 things:
Is there a (relatively easy) method for tracking down what in Django may be causing large number of connections?
If it is really a matter too many people accessing the site at a particular time, what is the standard method to scale the number of connections up?

I found the source of the problem: we have a RESTful API and someone was running a periodic script that spawned 700 or so threads to make "reading through the API faster".
However, more important than the cause (which is very specific) is the method for finding it. So I will provide the details in hopes it helps someone else if they encounter a similar situation.
Some further details about my setup:
In a cloud environment
Multiple Django web servers behind a loadbalancer
Steps for troubleshooting:
Use a monitoring system to alert to alert you if you go over the max connections. Failing that rig a script that polls the MySQL database and uses the following select statement to get info:
select substring_index(host, ':', 1), count(*) from information_schema.processlist group by substring_index(host, ':', 1);
From the polling/monitoring check to see if there are any patterns of the system going over at regular intervals. (ex. in my case I saw it go over the max every 5 minutes or so).
Check webserver (apache/nginx/etc.) or Django logs to see which URLs/views where being accessed at the time of an overage. (This was harder to determine in my case due to the load balancer causing only a few of the offending URL accesses to happen on each server but based on the time pattern I was eventually able to figure it out).
Have a friendly chat with the person causing this grief :)
As for the 2nd part of my original question, because I am in a hosted cloud environment the operators control the max number of connections and often scale it based on the DB instance size. I attempted to upsize once but due to the many threads, the more connections I had the more the process used up.

Related

Does Symfony3 / Doctrine open one MySQL connection per visitor?

So I have developped this website with Symfony3 and Doctrine. I have one major concern about performance with MySQL and more specifically the number of simultaneous open connexions.
For the moment, one to five users are online on the website. What happens if, let's say, 1,500 users connect within one minute? Does Symfony3 or Doctrine handle this kind of situations? How can I be sure the website doesn't go down providing me with the Too many connections MySQL error?
And if I go up to 5,000? And 10,000? The server has 4GB of RAM and a 2.40Ghz mono-core processor but I wouldn't worry about the hardware as I'm more concerned about MySQL.
These situations already happened in the past but I was running the website with Wordpress and W3 Total Cache plugin. Should I consider using a cache manager such as memcached or else?
In short, I'm concerned about the website becoming unavailable in case of sudden high trafic (and thought of the MysQL Too many connections error in first but I might be missing something even more important).
Thanks for lightening me out on this one as I'm not fully aware about performance issues with Symfony.
I believe it does open one connection per visitor. Regardless of whether it does or not however neither Symfony or Doctrine has a magic bullet to handle every load/connection scenario.
Why don't you use a load testing tool (there are many) and see how it actually pans out? In my experience predicting a bottleneck is useless, as they will always crop up where you least expect it.
For example, the MySQL connection limit is only one part of the optimisation puzzle. It's no good just worrying about connection limits, you need to respond to web requests as quickly and efficiently as possible to free up MySQL connection resources (and other resources your app is using). So if your server is slow you will run out of connections (or some other resource) almost immediately under significant load, regardless of MySQL connection limits.
That said, those server specifications seem a little low for 5-10k users per minute. I wouldn't expect a machine like that to handle that kind of load without some serious optimisation/caching/etc.
The symfony performance page is a good starter, and there is also a good article on caching - there's a ton of available material on the subject. Good luck! :)
If you use php-fpm it depends on pm.max_children in fpm/pool.d/www.conf.
pm.max_children refers to the maximum number of concurrent PHP-FPM processes allowed to exist in such a pool. If the volume of incoming requests requires the creation of more PHP-FPM processes than the number allowed by the max_children limit, those additional requests are backlogged in a queue to await service.
So when pm.max_children > max_connections (my.cnf) and active users > max_connections you will get "Too many connections".

Prevent 'too many connections'(ConnectionPool is not the answer, looking for mysql server side solution)

A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).

How does mysql handle massive connections in real world?

I have been researching this for a while but got no convinced answer.
From mysql tutorial, the default connections number is less than two hundred, and it says max_connection_num can be set to 2000 in Linux box as long as you have enough resource. I think this number is far from enough in real world deployment as there might be millions people visit your website at the same time.
There are couple of articles talking about how to optimize to reduce time cost by each query. But none of them tells me how this issue is root caused. I think there must be some mechanism like queue to prevent massive connections from happening simultaneously. otherwise you will finally get "too connection" exception.
anyone has some expertise in this area? thank you.
There are several options.
Connection pooling
As you mentionned: queuing. If too many clients connect at the same time, then the application layer should handle this exception, put the request to sleep for a short period of time and try again. Requests lasting more than a couple of seconds should usually be banned in such a high traffic environment.
Load balancing through replication and/or clustering
Normally, your application is supposed to reuse connections already established. However, the language you chose to implement your application introduces limitations. If you use Java or .Net you can have pool of connections. For PHP it is not the case, you can check this discussion
If you exceed the max_connection_num, you do get a too many connections error. But if you really have 1 million users at your web server at the exact same time, you can't handle that with one server anyway, 1 million concurrent connections really requires a very big farm to handle.
However, the clients to your database is a webapp, that webapp usually connects to the database through abstractions called a connection pool, which does limit the number of connections to the database on the client side as long as all the database connections goes through that same pool.

How are "simultaneous client connections" quantified in mysql

Sorry for the newb factor, but I was reading about "Too many connections" to mysql.
http://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html
How are "simultaneous client connections" quantified in mysql?
For example if 20 million people are on gmail (let's say they use mysql with only 1 table to store everything just for sake of example) and all those people simultaneously all click on an email to open up, does that mean there are 20 million simultaneous connections or just one connection since all the users are connecting to the same table?
EDIT: I'm trying to understand what the term 'client' means. Is a 'client' someone who is using the application, or is a 'client' the part of the application (ex. php script) that is connecting to the database?
When a visitor goes to your website and the server-side script connects to the database it is 1 connection - you can make as many queries as necessary during that connection to any number of tables/databases - and on termination of the script the connection ends. If 31 people request a page (and hence a db connection) and your limit is 30, then the 31st person will get an error.
You can upgrade server hardware so MySQL can efficiently handle loads of connections or spread the load across multiple database servers. It is possible to have your server-side scripting environment maintain a persistent connection to MySQL in which case all scripts make queries through that single connection. This will probably have adverse effects on the correct queuing of queries and their order to maintain usable speeds under high load, and ultimately doesn't solve the CPU/memory/disk bottlenecks with handling large numbers of queries.
In the case of a webmail application, the query to check for new messages runs so fast (in the milliseconds) that hitting server limits isn't likely unless it's on a large scale.
Google's applications scale on a level previously unheard of. Check out the docs on MapReduce, GoogleFS, etc. It's awesome.
In answer to your edit - anything that connects directly to MySQL is considered a client in this case. Each PHP script that connects to MySQL is a client, as is the MySQL console on the command line, or anything else.
Hope that helps
The connections mentioned are server connection. Every client has one or more. For example if your php script connects mysql, there may be more web requests at a time and thus more connections to db.
Sometimes you can ran out of them, because they are not closed properly after they become useless.
And I thing Gmail is stored different way than in one large mysql db :]

mySQL "Too many connections" error influenced by number of mongrel instances?

Recently I have started getting mySQL "too many connection" errors at times of high traffic. My rails app runs on a mongrel cluster with 2 instances on a shared host. Some recent changes that might be driving it:
Traffic to my site has increased. I
am now averaging about 4K pages a
day.
Database size has increased. My largest table has ~ 100K rows.
Some associations could return
several hundred instances in the
worst case, though most are far less.
I have added some features that
increased the number and size of
database calls in some actions.
I have done a code review to reduce database calls, optimize SQL queries, add missing indexes, and use :include for eager loading. However, many of my methods still make 5-10 separate SQL calls. Most of my actions have a response time of around 100ms, but one of my most common actions averages 300-400ms, and some actions randomly peak at over 1000ms.
The logs are of little help, as the errors seem to occur randomly, or at least the pattern does not appear related to the actions being called or data being accessed.
Could I alleviate the error by adding additional mongrel instances? Or are the mySQL connections limited by the server, and thus unrelated to the number of processes I divide my traffic across?
Is this most likely a problem with my coding, or should I be pressing my host for more capacity/less load on the shared server?
ActiveRecord has pooled database connections since Rails 2.2, and it's likely that that's what's causing your excess connections here. Try turning down the value of pool in your database.yml for that environment (it defaults to 5).
Docs can be found here.
Are you caching anything? It's an important part of alleviating application and database load. The Rails Guides have a section on caching.
Something is wrong. A Mongrel instance processes 1 request at a time so if you have 2 Mongrel instances then you should not be seeing more than 2 active MySQL connections (from the mongrels at least)
You could log or graph the output of SHOW STATUS LIKE 'Threads_connected' over time.
PS: this is not very many Mongrels. if you want to be able to service more than 2 simultaneous requests then you'll want more. ...if memory is tight, you can switch to Phusion Passenger and REE.