MySQL frequent queries and persistent connections - mysql

I am going to be hosting a gaming website for a bunch of card games that are pretty popular in my country. Since there is really no other convenient way to do this apart from web sockets I decided that's the path I'm going to take. Anyhow there are a bunch of concerns that I have.
I plan on having multiple servers for each type of game and each server is going to host for about 100-200 people. With that being said it is necessary for players to see information about the server before joining, such as how many players are connected, what is the average wait time and such. To do this I could either use files or a database. I would very much like to go with the database but I'd like to ask a few questions about MySQL
I know that MySQL was not built for real-time applications but what is an acceptable interval for each server to update its status in the database?
Are there any problems I may run into when having persistent connections from each server to the MySQL server?
Are there any benefits to preparing the statement that is going to update the database and then execute it every N seconds or I should prepare it each time? I am asking this because I don't know what happens when a statement is prepared so having a prepared statement in a persistent connection may not be a good idea.
Using InnoDB, is there any need to create a separate MySQL server solely for this purpose or I could use the server that is used for the site. Not really sure if those updates every N seconds would affect anything.

Related

Amazon RDS MySQL/Aurora query sometimes hangs forever. Any 2 cents on the metrics and approaches we can triage it and prevent it from happening?

Just some contexts: In our old data pipeline system, we are running MySQL 5.6. or Aurora on Amazon rds. Bad thing about our old data pipeline is running a lot of heavy computations on the database servers because we are handcuffed by what was designed: treating transactional databases as data warehouse and our backend API directly “fishing” the databases heavily in our old system. We are currently patching this old data pipeline, while re-design the new data warehouse in SnowFlake.
In our old data pipeline system, the data pipeline calculation is a series of sequential MySQL queries. As our data grows bigger and bigger in the old data pipeline, what the problem now is the calculation might just hang forever at, for example, the step 3 MySQL query, while all metrics in Amazon CloudWatch/ grafana we are monitoring (CPU, database connections, freeable memory, network throughput, swap usages, read latency, available storage, write latency, etc. ) looks normal. The MySQL slow query log is not really helpful here because each of our query in the data pipeline is essentially slow anyway (can takes hours to run a query because the old data pipeline is running a lot of heavy computations on the database servers). The way we usually solve these problems is to “blindly” upgrade the MySQL/Aurora Amazon rds service and hoping it will solve the issue. I am wondering
(1) What are the recommended database metrics in MySQL 5.6. or Aurora on Amazon rds we should monitor real-time to help us identify why a query freezes forever? Like innodb_buffer_pool_size?
(2) Is there any existing tool and/or in-house approach where we can predict how many hardware resources we need before we can confidently execute a query and know it will succeed? Could someone share some 2 cents?
One thought: Since Amazon rds sometimes is a bit blackbox, one possible way is to host our own MySQL server on an Amazon EC2 instance in parallel to our Amazon MySQL 5.6/Aurora rds production server, so we can ssh into MySQL server and run a lot of command tools like mytop (https://www.tecmint.com/mysql-performance-monitoring/) to gather a lot more real time MySQL metrics which can help us triage the issue. Open to any 2 cents from gurus. Thank you!
None of the tools mentioned at that link should need to run on the database server itself, and to the extent that this is true, there should be no difference in their behavior if they aren't. Run them on any Linux server, giving the appropriate --host and --user and --password arguments (in whatever form they may expect). Even mysqladmin works remotely. Most of the MySQL command line tools do (such as the mysql cli, mysqldump, mysqlbinlog, and even mysqlcheck).
There is no magic coupling that most administrative utilities can gain by running on the same server as MySQL Server itself -- this is a common misconception but, in fact, even when running on the same machine, they still have to make a connection to the server, just like any other client. They may connect to the unix socket locally rather than using TCP, but it's still an ordinary client connection, and provides no extra capabilities.
It is also possible to run an external replica of an RDS/MySQL or Aurora/MySQL server on your own EC2 instance (or in your own data center, even). But this isn't likely to tell you a whole lot that you can't learn from the RDS metrics, particularly in light of the above. (Note also, that even replica servers acquire their replication streams using an ordinary client connection back to the master server.)
Avoid the temptation to tweak server parameters. On RDS, most of the defaults are quite sane, and unless you know specifically and precisely why you want to adjust a parameter... don't do it.
The most likely explanation for slow queries... is poorly written queries and/or poorly designed indexes.
If you are not familiar with EXPLAIN SELECT, then you need to learn it, live it, an love it. SQL is declarative, not procedural. That is, SQL tells the server what you want -- not specifically how to obtain it internall. For example: SELECT ... FROM x JOIN y tells the server to match up the rows from table x and y ON a certain criteria, but does not tell the server whether to read from x then find the matching rows in y... or read from y and find the matching rows in x. The net result is the same either way -- it doesn't matter which table the server examines first, internally -- but if the query or the indexes don't allow the server to correctly deduce the optimum path to the results you've requested, it can spend countless hours churning through unnecessary effort.
Take for an extreme and overly-simplified example, a table with millions of rows and table with 1 row. It would make sense to read the small table first, so you know what 1 value you're trying to join in the large table. It would make no sense to read throuh each row in the large table, then go over and check the small table for a match for each of the millions of rows. The order in which you join tables can be different than the order in which the actual joining is done.
And that's where EXPLAIN comes in. This allows you to inspect the query plan -- the strategy the internal query optimizer has concluded will get it to the answer you need with the least amount of effort. This is the core of the magic of relational database systems -- finding the correct solution in the optimal time, based on what it knows about the data. EXPLAIN shows you the order in which the tables are being accessed, how they're being joined, which indexes are being used, and an estimate of the number of rows from each table are involved -- and these numbers multiply together to give you an estimate of the number of permutations involved in resolving your query. Two small tables, each with 50,000 rows, joined without a proper index, means an entirely unreasonable 2,500,000,000 unique combinations between the two tables that must be evaluated; every row must be compared to every other row. In short, if this turns out to be the kind of thing that you are (unknowingly) asking the server to do, then you are definitely doing something wrong. Inspecting your query plan should be second nature any time you write a complex query, to ensure that the server is using a sensible strategy to resolve it.
The output is cryptic, but secret decoder rings are available.
https://dev.mysql.com/doc/refman/5.7/en/explain.html#explain-execution-plan

SQLite faster than MySQL?

I want to set up a teamspeak 3 server. I can choose between SQLite and MySQL as database. Well I usually tend to "do not use SQLite in production". But on the other hand, it's a teamspeak server. Well okay, just let me google this... I found this:
Speed
SQLite3 is much faster than MySQL database. It's because file database is always faster than unix socket. When I requested edit of channel it took about 0.5-1 sec on MySQL database (127.0.0.1) and almost instantly (0.1 sec) on SQLite 3. [...]
http://forum.teamspeak.com/showthread.php/77126-SQLite-vs-MySQL-Answer-is-here
I don't want to start a SQLite vs MySQL debate. I just want to ask: Is his argument even valid? I can't imagine it's true what he says. But unfortunately I'm not expert enough to answer this question myself.
Maybe TeamSpeak dev's have some major differences in their db architecture between SQLite and MySQL which explains a huge difference in speed (I can't imagine this).
At First Access Time will Appear Faster in SQLite
The access time for SQLite will appear faster at first instance, but this is with a small number of users online. SQLite uses a very simplistic access algorithm, its fast but does not handle concurrency.
As the database starts to grow, and the amount of simultaneous access it will start to suffer. The way servers handle multiple requests is completely different and way more complex and optimized for high concurrency. For example, SQLite will lock the whole table if an update is going on, and queue the orders.
RDBMS's Makes a lot of extra work that make them more Scalable
MySQL for example, even with a single user will create an access QUEUE, lock tables partially instead of allowing only single user-per time executions, and other pretty complex tasks in order to make sure the database is still accessible for any other simultaneous access.
This will make a single user connection slower, but pays off in the future, when 100's of users are online, and in this case, the simple
"LOCK THE WHOLE TABLE AND EXECUTE A SINGLE QUERY EACH TIME"
procedure of SQLite will hog the server.
SQLite is made for simplicity and Self Contained Database Applications.
If you are expecting to have 10 simultaneous access writing at the database at a time SQLite may perform well, but you won't want an 100 user application that constant writes and reads data to the database using SQLite. It wasn't designed for such scenario, and it will trash resources.
Considering your TeamSpeak scenario you are likely to be ok with SQLite, even for some business it is OK, some websites need databases that will be read only unless when adding new content.
For this kind of uses SQLite is a cheap, easy to implement, self contained, perfect solution that will get the job done.
The relevant difference is that SQLite uses a much simpler locking algorithm (a simple global database lock).
Using fine-grained locking (as MySQL and most other DB servers do) is much more complex, and slower if there is only a single database user, but required if you want to allow more concurrency.
I have not personally tested SQLite vs MySQL, but it is easy to find examples on the web that say the opposite (for instance). You do ask a question that is not quite so religious: is that argument valid?
First, the essence of the argument is somewhat specious. A Unix socket would be used to communicate to a database server. A "file database" seems to refer to the fact that communication is through a compiled-in interface. In the terminology of SQLite, it is server-less. Most databases store data in files, so the terminology "file database" is a little misleading.
Performance of a database involves multiple factors, such as:
Communication of query to the database.
Speed of compilation (ability to store pre-compiled queries is a plus here).
Speed of processing.
Ability to handle complex processing.
Compiler optimizations and execution engine algorithms.
Communication of results back to the application.
Having the interface be compiled-in affects the first and last of these. There is nothing that prevents a server-less database from excelling at the rest. However, database servers are typically millions of lines of code -- much larger than SQLite. A lot of this supports extra functionality. Some of it supports improved optimizations and better algorithms.
As with most performance questions, the answer is to test the systems yourself on your data in your environment. Being server-less is not an automatic performance gain. Having a server doesn't make a database "better". They are different applications designed for different optimization points.
In short:
For Local application databses, single user applications, and little simple projects keeping small data SQLite is winner.
For Network database applications, multiuser and concurrency, load balancing and growing data managements, security and roll based authentications, big projects and widely used services you should choose MySql.
In your question I do not know much about teamspeak servers and what kind of data it actually needs to keep in its database but if it just needs a local DBMS and not needs to proccess lots of concurrency and managements SQLite will be my choice.

Is it better to hold database connections?

I'm currently developing plugins for bukkit and a lot of them need a database connection. Now I'm thinking about if could be better to have just one plugin that handles the connection for all plugins.
The question behind that is if it is good or not to keep a connection up even if there are no queries for some minutes (that may happen). Otherwise I would need to establish a new connection for each query?
It is a good idea to have one class/plugin for handling database, but the connection state should not be open all the time,make sure the connection is opened only for the time taken by the query.
Many applications use connection pools to have a number of connections readily available to run queries over. It reduces the number of protocol re-negotiations that the database driver has to do. This is especially useful for applications that need fast access times to the underlying data, yet have larger downtimes between requests. E-Commerce applications like webshops are a good example.

Prevent 'too many connections'(ConnectionPool is not the answer, looking for mysql server side solution)

A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).

How are "simultaneous client connections" quantified in mysql

Sorry for the newb factor, but I was reading about "Too many connections" to mysql.
http://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html
How are "simultaneous client connections" quantified in mysql?
For example if 20 million people are on gmail (let's say they use mysql with only 1 table to store everything just for sake of example) and all those people simultaneously all click on an email to open up, does that mean there are 20 million simultaneous connections or just one connection since all the users are connecting to the same table?
EDIT: I'm trying to understand what the term 'client' means. Is a 'client' someone who is using the application, or is a 'client' the part of the application (ex. php script) that is connecting to the database?
When a visitor goes to your website and the server-side script connects to the database it is 1 connection - you can make as many queries as necessary during that connection to any number of tables/databases - and on termination of the script the connection ends. If 31 people request a page (and hence a db connection) and your limit is 30, then the 31st person will get an error.
You can upgrade server hardware so MySQL can efficiently handle loads of connections or spread the load across multiple database servers. It is possible to have your server-side scripting environment maintain a persistent connection to MySQL in which case all scripts make queries through that single connection. This will probably have adverse effects on the correct queuing of queries and their order to maintain usable speeds under high load, and ultimately doesn't solve the CPU/memory/disk bottlenecks with handling large numbers of queries.
In the case of a webmail application, the query to check for new messages runs so fast (in the milliseconds) that hitting server limits isn't likely unless it's on a large scale.
Google's applications scale on a level previously unheard of. Check out the docs on MapReduce, GoogleFS, etc. It's awesome.
In answer to your edit - anything that connects directly to MySQL is considered a client in this case. Each PHP script that connects to MySQL is a client, as is the MySQL console on the command line, or anything else.
Hope that helps
The connections mentioned are server connection. Every client has one or more. For example if your php script connects mysql, there may be more web requests at a time and thus more connections to db.
Sometimes you can ran out of them, because they are not closed properly after they become useless.
And I thing Gmail is stored different way than in one large mysql db :]