I made a program that receives user input and stores it on a MySQL database. I want to implement this program on several computers so users can upload information to the same database simoultaneously. The database is very simple, it has just seven columns and the user will only enter four of them.
There would be around two-three hundred computers uploading information (not always at the same time but it can happen). How reliable is this? Is that even possible?
It's my first script ever so I appreciate if you could point me in the right direction. Thanks in advance.
Having simultaneous connections from the same script depends on how you're processing the requests. The typical choices are by forking a new Python process (usually handled by a webserver), or by handling all the requests with a single process.
If you're forking processes (new process each request):
A single MySQL connection should be perfectly fine (since the total number of active connections will be equal to the number of requests you're handling).
You typically shouldn't worry about multiple connections since a single MySQL connection (and the server), can handle loads much higher than that (completely dependent upon the hardware of course). In which case, as #GeorgeDaniel said, it's more important that you focus on controlling how many active processes you have and making sure they don't strain your computer.
If you're running a single process:
Yet again, a single MySQL connection should be fast enough for all of those requests. If you want, you can look into grouping the inserts together, as well as multiple connections.
MySQL is fast and should be able to easily handle 200+ simultaneous connections that are writing/reading, regardless of how many active connections you have open. And yet again, the performance you get from MySQL is completely dependent upon your hardware.
Yes, it is possible to have up to that many number of mySQL connectins. It depends on a few variables. The maximum number of connections MySQL can support depends on the quality of the thread library on a given platform, the amount of RAM available, how much RAM is used for each connection, the workload from each connection, and the desired response time.
The number of connections permitted is controlled by the max_connections system variable. The default value is 151 to improve performance when MySQL is used with the Apache Web server.
The important part is to properly handle the connections and closing them appropriately. You do not want redundant connections occurring, as it can cause slow-down issues in the long run. Make sure when coding that you properly close connections.
Related
So I have developped this website with Symfony3 and Doctrine. I have one major concern about performance with MySQL and more specifically the number of simultaneous open connexions.
For the moment, one to five users are online on the website. What happens if, let's say, 1,500 users connect within one minute? Does Symfony3 or Doctrine handle this kind of situations? How can I be sure the website doesn't go down providing me with the Too many connections MySQL error?
And if I go up to 5,000? And 10,000? The server has 4GB of RAM and a 2.40Ghz mono-core processor but I wouldn't worry about the hardware as I'm more concerned about MySQL.
These situations already happened in the past but I was running the website with Wordpress and W3 Total Cache plugin. Should I consider using a cache manager such as memcached or else?
In short, I'm concerned about the website becoming unavailable in case of sudden high trafic (and thought of the MysQL Too many connections error in first but I might be missing something even more important).
Thanks for lightening me out on this one as I'm not fully aware about performance issues with Symfony.
I believe it does open one connection per visitor. Regardless of whether it does or not however neither Symfony or Doctrine has a magic bullet to handle every load/connection scenario.
Why don't you use a load testing tool (there are many) and see how it actually pans out? In my experience predicting a bottleneck is useless, as they will always crop up where you least expect it.
For example, the MySQL connection limit is only one part of the optimisation puzzle. It's no good just worrying about connection limits, you need to respond to web requests as quickly and efficiently as possible to free up MySQL connection resources (and other resources your app is using). So if your server is slow you will run out of connections (or some other resource) almost immediately under significant load, regardless of MySQL connection limits.
That said, those server specifications seem a little low for 5-10k users per minute. I wouldn't expect a machine like that to handle that kind of load without some serious optimisation/caching/etc.
The symfony performance page is a good starter, and there is also a good article on caching - there's a ton of available material on the subject. Good luck! :)
If you use php-fpm it depends on pm.max_children in fpm/pool.d/www.conf.
pm.max_children refers to the maximum number of concurrent PHP-FPM processes allowed to exist in such a pool. If the volume of incoming requests requires the creation of more PHP-FPM processes than the number allowed by the max_children limit, those additional requests are backlogged in a queue to await service.
So when pm.max_children > max_connections (my.cnf) and active users > max_connections you will get "Too many connections".
A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).
I have been researching this for a while but got no convinced answer.
From mysql tutorial, the default connections number is less than two hundred, and it says max_connection_num can be set to 2000 in Linux box as long as you have enough resource. I think this number is far from enough in real world deployment as there might be millions people visit your website at the same time.
There are couple of articles talking about how to optimize to reduce time cost by each query. But none of them tells me how this issue is root caused. I think there must be some mechanism like queue to prevent massive connections from happening simultaneously. otherwise you will finally get "too connection" exception.
anyone has some expertise in this area? thank you.
There are several options.
Connection pooling
As you mentionned: queuing. If too many clients connect at the same time, then the application layer should handle this exception, put the request to sleep for a short period of time and try again. Requests lasting more than a couple of seconds should usually be banned in such a high traffic environment.
Load balancing through replication and/or clustering
Normally, your application is supposed to reuse connections already established. However, the language you chose to implement your application introduces limitations. If you use Java or .Net you can have pool of connections. For PHP it is not the case, you can check this discussion
If you exceed the max_connection_num, you do get a too many connections error. But if you really have 1 million users at your web server at the exact same time, you can't handle that with one server anyway, 1 million concurrent connections really requires a very big farm to handle.
However, the clients to your database is a webapp, that webapp usually connects to the database through abstractions called a connection pool, which does limit the number of connections to the database on the client side as long as all the database connections goes through that same pool.
My site has always used persistent connections, based on my understanding of them there's no reason not to. Why close the connection when it can be reused? I have a site that in total accesses about 7 databases. It's not a huge traffic site, but it's big enough. What's your take on persistent, should I use them?
With persistent connections:
You cannot build transaction processing effectively
impossible user sessions on the same connection
app are not scalable. With time you may need to extend it and it will require management/tracking of persistent connections
if the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server. Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc.
Persistent connections do not bring anything you can do with non-persistent connections.
Then, why to use them, at all?
The only possible reason is performance, to use them when overhead of creating a link to your SQL Server is high. And this depends on many factors like:
database type
whether MySQl server is on the same machine and, if not, how far? might be out of your local network /domain?
how much overloaded by other processes the machine on which MySQL sits
One always can replace persistent connections with non-persistent connections. It might change the performance of the script, but not its behavior!
Commercial RDMS might be licensed by the number of concurrent opened connections and here the persistent connections can misserve
My knowledge on the area is kinda limited so I can't give you many details on the subject but, as far as I know, the process of creating connections and handing them to a thread really costs resources, so I would avoid it if I were you. Anyhow I think that most of this decisions can't be generalized and depend on the business.
If, for instance, your application communicates continuously with the Database and will only stop when the application is closed, then perhaps persistent connections are the way to go, for you avoid the process mentioned before.
However, if your application only communicates with the Database sporadically to get minor information then closing the connection might be more sane, for you won't waste resources on opened connections that are not being used.
Also there is a technique called "Connection Pooling", in which you create a series of connections a priori and keep them there for other applications to consume. In this case connections are persistent to the database but non-persistent to the applications.
Note: Connections in MSSQL are always persistent to the database because connection pooling is the default behavior.
I have read every possible answer to this question and searched via Google in order to find the correct answer to the following question, but I am rather a novice and don't seem to get a clear understanding.
A lot I've read has to do with web servers, but I don't have a web server, but an intranet database.
I have a MySQL dsatabase in a Windows server at work.
I will have many users accessing this database constantly to perform simple queries and writting back to it new records.
The read/write will not be that heavy (chances are 50-100 users will do so exactly at the same time, even if 1000's could be connected).
The GUI will be either via Excel forms and/or Access.
What I need to know is the maximum number of active connections I can have at any given time to the database.
I know I can change the number on Mysql Admin however I really need to know what will really work...
I don't want to put 1000 users if the system will really handle 100 correctly (after that, although connected, the performance will be too slow, for example)
Any ideas or own experiences will be appreciated
This depends mainly on your server hardware (RAM, cpu, networking) and server load for other processes if not dedicated to the database. I think you won't have an absolute answer and the best way is testing.
I think something like 1000 should work ok, as long as you use 64 bit MySQL server. With 32 bit, too many connections may create virtual memory pressure - a connection has an own thread, and every thread needs a stack, so the stack memory will reduce possible size of the buffer pool and other buffers.
MySQL generally does not slow down if you have many idle connections, however special commands e.g "show processlist" or "kill", that enumerate every connection will be somewhat slower.
If idle connection stays idle for too long (idle time exceeds wait_timeout parameter), it is dropped by the server. If this is the case in your possible scenario, you might want to increase wait_timeout (its default value is 8 hours)