MySQL Scaling on GCP - mysql

I created a instance (8 core) of MySQL on GCP. And a simple database in it. When I run a load of 40000+ concurrent users (1500 req/sec), the response times come out very high (10 seconds+). However I can see the hardware cpu utilization only at 15% or so.
What can I do to get the response time in msec?
Cheers!
Deepak

Imagine cramming 40000 shoppers in a grocery store. How many hours would it take for a shopper to buy just one carton of milk?
Seriously, there is a limit to how many connections can be made to any device. Database computers will top out at a few hundred. After that, latency will suffer severely as all the connections are waiting for their turn at various shared resources.
Another approach
Let's say these are achievable:
10ms to connect, fetch info for a page, and disconnect.
1500 pages built per second. (By the way, make sure the web server can achieve this.)
15 concurrent connections, each running for 10 ms. That equals 1500 pages per second.
1500 pages per second = 90000 pages per minute.
So, let's specify "40000 pages delivered to different (or same) users in one minute". I suggest that will be easy. And it won't require much more than 15 concurrent users. (Traffic is never smooth [except in a benchmark], so 50 concurrent connections may happen.)

[thousands of database servers is] where i would like to go eventually...however i need to solve a basic problem of mine which I have posted above!
Right, you have to expand the number of database servers now if you are serving 40,000 concurrent queries. Not eventually.
But let's be clear about what comprises concurrent users. Here's an example:
mysql> show global status like 'threads%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_connected | 1266 |
| Threads_running | 9 |
+-------------------+-------+
I've analyzed high-scale production sites for dozens of internet companies. It's typical to see hundreds or thousands of concurrent connections but few of these are executing an SQL query at any given moment. When a given thread is between executing queries, you can view it SHOW PROCESSLIST but it is only doing "Sleep".
This is fine, and it's normal.
I give the analogy to an ssh session: you may be connected to a shell on a linux server, but if you're doing nothing, just sitting at a shell prompt, you aren't taxing the server resources much. You could have hundreds of users connected with ssh to the same server at once. But if they all begin running applications at the same time, you're in trouble. The server might not handle that load well. At least, all of the users will experience slow performance.
It's the same with a MySQL database. If you need a server that can support 40,000 Threads_running, then you need to spread that load over many MySQL servers. There isn't any single server that exists today that can handle that.
But you might mean something different when you say 40,000 concurrent users. It might be that you have 40,000 users who are looking at some page on your website at the same time. But that's not resulting in continuous SQL queries in 40,000 database sessions all at the same time. Each person spends some time reading the web page they just loaded, and scrolling up and down, and perhaps typing into a form. While they are doing that, the website is waiting for their next request, and the web server and database server is not doing any work for that user while it's waiting. It can do some work for other users.
In this way, a single database server can support 40,000 (or more) users who are by some definition using the site, even though only a handful are invoking any code to run SQL queries at any given moment.
This is normal and most websites can handle that traffic with no problems.
If that's the reality of your application, and you still have problems scaling it, then you might have inefficient application code or unoptimized SQL queries. That is, the website could serve the requests easily if you wrote the code to be more efficient.
Inefficient code cannot be fixed by changing your server. The cost of inefficient code scales up faster than you can hope to handle it by upgrading the server. So you must solve performance problems by writing better code.
This is the point of an old tweet of mine:
The subject of scalable internet architecture is very complex. You need to do a lot of study and a lot of testing to grow a website and make it scalable.
You can start by reading. My favorite is Theo Schlossnagle's book Scalable Internet Architectures. Here is a video of Theo speaking about the same subject: https://www.youtube.com/watch?v=2WuT2rdLK5A
The book is from quite a few years ago. Perhaps the scale websites need to support is greater than it was back then, but the methods of achieving scalability are the same today.
Test
Identify bottlenecks
Rearchitect your web app code to relieve those bottlenecks
Test again

Related

How much data can MySQL hold before having issues?

This is a bit of a general question but, I recently started paying for a webhost for my website and, once I noticed that the MySQL servers are shared(checked phpmyadmin, it has sent 9GB of data on a total of 15k requests the past hour alone, and has been running for 39 days without a restart) and I only have admin control over my own databases, I started wondering:
How big can a MySQL database become before having issues?(Say, long delays, errors, crashes, etc?) Does anyone have experience with that?
Mind you mine runs on MySQL 5.7
The load you describe should be easy for MySQL on a reasonably powerful server. But it depends on what the queries are, and how well you have optimized.
At my last DBA job, we "gently encouraged" the developers (i.e. alerted them with PagerDuty) if the database size grew over 1TB, or if a single table grew over 512GB, or the queries per second rose over 50k. That's when things started to go south for a typical application workload, if the database and queries were not designed by developers who were especially mindful about optimization. Also the servers were pretty powerful hardware, circa 2020 (48 core Xeon, 256GB RAM, 3TB NVMe storage typically with 2 or 3 physical devices in a RAID).
With more care, it's possible to increase the scale of MySQL a lot higher. With variants like PlanetScale, it can support even more. Cf. https://planetscale.com/blog/one-million-queries-per-second-with-mysql
On the other hand, I saw a photo of one of our NVMe drive literally melted from serving 6k writes per second on a database with half a TB of data. It depends on the types of queries (perhaps it was a faulty NVMe drive).
The only answer that could be useful to you is:
You must load-test your queries on your server.
Also, besides query performance (the only thing most developers care about), you also have to consider database operations. I supported some databases that grew without bound, despite my team's urging the developers to split them up. The result was it would take more than 24 hours to make a backup, which is a problem if there's a policy requirement that backups must be made daily. Also if they wanted to alter a table (e.g. add a column), it could take up to 4 weeks to run.

SQL Server back-end with MS Access 2007 front end. Number of users

I have a .mdb MS Access 2007 which is connected to a SQL Server backend. The tables are linked using a system DSN.
I need to do a stress test on the system and I would like to know the maximum number of users who can use the system at the same time.
The access .mdb file is done through the WTS.
Thanks for the help
The max number of users is going to depend on how well the application is written, and how good the network each user has.
You might have a great application, but if they are connecting to SQL server over a slow network, then the application will be slow with 2 users, and will be slow with 250 users.
If the network is good, and the application is well written to respect bandwidth requirements, then the application will likely run the SAME speed with 2, 10, 20 or 100 users.
And deepening on how large and powerful the SQL server box is? Then you can easy scale out to 500 users at the same time.
So this question is much difficult to answer. The network from the Access application to the SQL server is an important factor.
And some applications perform poorly with 5 users and SQL server, and thus such applications will perform even WORSE with 100, or 200 users.
So how well does the application work with 5 users, and then with say 25. If it written well, you likely not notice the difference. On the other hand if it slow with 1 user, then you downhill all the way from that point on as you add more users.
So it better run REALLY well with one user if you planning to scale out to many users.
So it certainly possible to have a 1000 users at the same time without much effort. As noted this depends on how well the application was designed with SQL in mind. So the quality of the work the developers done will be the LARGEST factor in how many users you can scale out to. As noted the capacity of the server and SQL will also determine the max number of users.
With a typical application that respects SQL server, then running 50 or 100 users should hardly break SQL server into a sweat is should be easy obtainable.
In fact for those 50 users, your HUGE resource HOG will be WTS.
Assuming you mean windows terminal services, then that setup requires HUGE resources, and far more then your SQL server will. This system will require much more attention and resources then SQL server. As noted, if the application runs rather well with 1-2 users, then usually such applications will run easy with 25. If the application runs slow with only 1 or 2 users, then you going to have scaling problems as you add more users.
At the end of the day, there are FAR FAR too many factors to give an answer without a case by case knowledge of the server involved, the network bandwidth, the capacity of the WTS, and MOST important how well the application was designed (this factor is #1).

Best Approach to Storing Mean Uptime Data

We have 500+ remote locations. Each location has a linux router which checks in to our management system (homemade using RoR3) every 15 minutes.
We need to log and calculate mean uptime of each boxes Internet connectivity.
Each router posts a request every 15 minutes to a script on the server. (Currently this just records the last checkin time and the uptime.)
If we want to plot the historical uptime of each box, what is the most efficient way to do this without clogging our db up.
500 boxes checking in every 15 minutes would (according to my calculations) result in 17,520,000 inserts. Quite a hefty amount of data that I don't think we need.
Could anyone help solve this riddle for us?
Why not take a look at RRDTool (Wiki-entry). It's just the tool for this kind of situation.
It works as a sort of a round-robin self-averaging database, and it's used in many logging applications just for similar purposes to your situation.
As an example take a look at Cacti which is a data-logging / network monitoring and graphing front-end app built around RRDTool (implemented in PHP).

How To Interpret Siege and or Apache Bench Results

We have a MySQL driven site that will occasionally get 100K users in the space of 48 hours, all logging into the site and making purchases.
We are attempting to simulate this kind of load using tools like Apache Bench and Siege.
While the key metric seems to me number of concurrent users, and we've got our report results, we still feel like we're in the dark.
What I want to ask is: What kinds of things should we be testing to anticipate this kind of traffic?
50 concurrent users 1000 Times? 500 concurrent users 10 times?
We're looking at DB errors, apache timeouts, and response times. What else should we be looking at?
This is a vague question and I know there is no "right" answer, we're just looking for some general thoughts on how to determine what our infrastructure can realistically handle.
Thanks in advance!
Simultaneous users is certainly one of the key factors - especially as that applies to DB connection pools, etc. But you will also want to verify that the page rate (pages/sec) of your tests is also in the range you expect. If the the think-time in your testcases is off by much, you can accidentally simulate a much higher (or lower) page rate than your real-world traffic. Think time is the amount of time the user spends between page requests - reading the page, filling out a form, etc.
Depending on what other information you have on hand, this might help you calculate the number of simultaneous users to simulate:
Virtual User Calculators
The complete page load time seen by the end-user is usually the most important metric to evaluate system performance. You'll also want to look for failure rates on all transactions. You should also be on the lookout for transactions that never complete. Some testing tools do not report these very well, allowing simulated users to hang indefinitely when the server doesn't respond...and not reporting this condition. Look for tools that report the number of users waiting on a given page or transaction and the average amount of time those users are waiting.
As for the server-side metrics to look for, what other technologies is your app built on? You'll want to look at different things for a .NET app vs. a PHP app.
Lastly, we have found it very valuable to look at how the system responds to increasing load, rather than looking at just a single level of load. This article goes into more detail.
Ideally you are going to want to model your usage to the user, but creating simulated concurrent sessions for 100k users is usually not easily accomplished.
The best source would be to check out your logs for the busiest hour and try and figure out a way to model that load level.
The database is usually a critical piece of infrastructure, so I would look at recording the number and length of lock waits as well as the number and duration of db statements.
Another key item to look at is disk queue lengths.
Mostly the process is to look for slow responses either in across the whole site or for specific pages and then hone in on the cause.
The biggest problem for load testing is that is quite hard to test your network and if you have (as most public sites do) a limited bandwidth through your ISP, that may create a performance issue that is not reflected in the load tests.

MySQL active connections at once, Windows Server

I have read every possible answer to this question and searched via Google in order to find the correct answer to the following question, but I am rather a novice and don't seem to get a clear understanding.
A lot I've read has to do with web servers, but I don't have a web server, but an intranet database.
I have a MySQL dsatabase in a Windows server at work.
I will have many users accessing this database constantly to perform simple queries and writting back to it new records.
The read/write will not be that heavy (chances are 50-100 users will do so exactly at the same time, even if 1000's could be connected).
The GUI will be either via Excel forms and/or Access.
What I need to know is the maximum number of active connections I can have at any given time to the database.
I know I can change the number on Mysql Admin however I really need to know what will really work...
I don't want to put 1000 users if the system will really handle 100 correctly (after that, although connected, the performance will be too slow, for example)
Any ideas or own experiences will be appreciated
This depends mainly on your server hardware (RAM, cpu, networking) and server load for other processes if not dedicated to the database. I think you won't have an absolute answer and the best way is testing.
I think something like 1000 should work ok, as long as you use 64 bit MySQL server. With 32 bit, too many connections may create virtual memory pressure - a connection has an own thread, and every thread needs a stack, so the stack memory will reduce possible size of the buffer pool and other buffers.
MySQL generally does not slow down if you have many idle connections, however special commands e.g "show processlist" or "kill", that enumerate every connection will be somewhat slower.
If idle connection stays idle for too long (idle time exceeds wait_timeout parameter), it is dropped by the server. If this is the case in your possible scenario, you might want to increase wait_timeout (its default value is 8 hours)