Gatling. Live users. Remotely closed / Connection timeout - stress-testing

I'm trying to exec
scn.inject(constantUsersPerSec(300) during (300 seconds) randomized),
When running, I get number of active users growing fast to around 2000.
Then connection either 'timeout' or 'remotely closed'
Few questions:
Why even though I define 300 users per second I have my 'active users' grow to 2k.
I'm testing it on local environment mostly to test MySQL database performance. But I guess it fails because of system requirements and/or configuration.
Anything I can do about it ?

If your users don't terminate fast enough (long test journey, long response times), you'll get 2k concurrent users within 7 sec (2000 / 7).

Related

MySQL 8 why is initial response to query slow, speeds up thereafter

After several minutes of inactivity (no use of the website) MySQL 8 slow right down. An initial query after non-activity can take a minute but thereafter seconds. The same query (like logging in) would take a second or two if there was activity already on the server.
Has anyone encountered this or know how to correct this behavior? The machine itself has a significant amount of resources, its just the first "warm up" call that is slow.

How to diagnose extremely slow AWS RDS MySQL Performance?

My DB has around 15 tables, each with 40 columns, with 10.000 rows each.
Most of it with VARCHAR, some indexes and foreign keys.
Sometime I need to reconstruct my database (design flaw, working on it), which takes about 40 seconds locally. Now I'm trying to do the same to a AWS RDS MySQL 5.75 instance, but it takes forever, something like 40-50 minutes. The last time I had to do this same process it took no more than 5 minutes, still way more than the local 40 seconds, but I'm happy with it.
My internet speed is at about 35 Mbps Download / 5 Mbps Upload.
I know it's not fast, but it's consistent, and it hasn't changed since my last rebuilt.
I enabled General Logs, but all I can see are the INSERT queries, occasionally some "SELECT 1".
I do have same space for improvements on my code, but still, from 00:40:00 to 50:00:00, it seems that there's something else going on.
Any ideas on how to diagnose and find the bottleneck?
Thanks
--
Additional relevant information:
It is a Micro instance from AWS, all of the relevant monitoring indicators are basically flat: CPU at 4%, Free Storage Space at 20.000 MB, Freeable Memory at 200 MB, Write IOPS at around 2,5, the server runs a 5.7.25 MySQL, 1vCPU, 1Gb of RAM and 20GB of SSD. This is the same as 3 months ago when I last rebuilt the database.
SHOW GLOBAL STATUS: https://pastebin.com/jSrAzYZP
SHOW GLOBAL VARIABLES: https://pastebin.com/YxD7dVhR
SHOW ENGINE INNODB STATUS: https://pastebin.com/r5wffB5t
SHOW PROCESS LIST: https://pastebin.com/kWwiyGwf
SELECT * FROM information_schema...: https://pastebin.com/eXGBmetP
I haven't made any big changes to the server configuration, except enabling logs, e maxing out max_allowed_packets and saving logs to file.
In my backend I have a Flask app running, when it receives the API call, it takes a bunch of pickled objects and adds them all to the database (appending the Flask SQLAlchemy class to a list) and then running db.session.add_all(entries), trying to run a bulk operation. The code is the same, both for localhost and my remote server.
It does get slower in three specific tables, most of them with VARCHAR columns, but nothing different from my last inserts - it seems odd that the problem would be data, or the way the code is structured, or at least doesn't seem reasonable that this would result in a 20 second (localhost) to 40 minutes (hosted server) time, specially when the rest of the tables work mostly the same.
Enable the slow log, set long_query_time=0, run your code, then put the resulting log through mysqldumpslow.
Establish which queries contribute most to slowness and take it from there.
Compare the config between your old server and your new one.
Also, are they the same version of MySQL? 5.6, 5.7 and 8.0 can produce very different execution plans (with 5.6 usually coming up with the sane one if they differ).
Rate Per Second = RPS
Suggestions to consider for your AWS RDS Parameters group
thread_cache_size=24 # from 8 to reduce threads_created count
innodb_io_capacity=1900 # from 200 to enable more use of SSD IOPS capacity
read_rnd_buffer_size=128K # from 512K to reduce handler_read_rnd_next RPS of 21
query_cache_size=0 # from 1M since you have QC turned off with query_cache_typ=OFF
Determine why com_flush is running 13 times per hour and get it stopped to avoid table open thrashing.
I found that after migrating to RDS all my database Indexes are gone! They weren't migrated along with the schema and data. Make sure you're indexes are there.
Also, MySQL query cache is OFF by default in RDS. This won't help the performance of your initial query, but it may speed things up in general.
You can set query_cache_type to 1 and define a value for query_cache_size. I also changed the thread_cache_size from 8 to 24 and innodb_io_capacity from 200 to 1900 don't know if it helps you.
Also creating AWS DB Parameter Groups helped me a lot with configuring and tuning DB variables. Here you can read more:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html

Optimizing mysql database

I am running a flask app with mysql db.
I have 400000 records in a table and there is a query (insert with select) that takes around 3 seconds if run single.
But when I tried to load test it by hitting the api with multiple requests at a time (like 20 hits at a time, 50 and 100 hits at a time), the response for all requests coming at once. For example, if total 100 concurrent requests takes around 3 mins, then all those individual requests are starting immediately but giving response after 3 mins only (instead of 3 or 4 seconds).
Also, I tried with 1 gb ram server, 4 gb ram server and also 32 gb ram server with 16 cpus. Here is the response as below:
# 4GB RAM, 2 CPUS server with only Mysql installed in it
Total time is: 0:05:29.752275 (all 100 requests getting response after 5 mins(total time), not 3 or 4 seconds)
successful: 89
Failed: 11
Tried: 100
# 32 GB RAM, 16 CPUs server with only Mysql installed in it
Total time is: 0:05:17.119773 (all 100 requests getting response after 5 mins(total time), not 3 or 4 seconds)
successful: 86
Failed: 14
Tried: 100
So, if you see, both 4gb and 32gb servers has almost no difference in performance. So it seems like something totally wrong with my setup/configuration/query.
More details:
I ran another script where I directly hit db in the script without any api. This way, even though server has 4gb RAM, mysql server dying(segmentation fault and also mysql server dies) for just 3 concurrent requests.
But then I gave like 0.2, 0.3 and 0.5 milli seconds delay between each hit/thread so the results were slighly meaningful. Each request used to take total time but with 0.5 ms delay between each hit, each request completing in less than 10 seconds.
Can I do anything so my server easily returns response fast for atleast 100 concurrent requests without any gap between requests(and is that necessary?)
Any thoughts on what to do here?
I think the root cause is flask. Flask is not good in multi-process/thread at all.
I meet this problem before, then change to Tornado and use supervisord to keep Tornado as daemon mode.
another solution is Gunicorn => https://intellipaat.com/community/12737/how-to-run-flask-with-gunicorn-in-multithreaded-mode
Simply put, "atleast 100 concurrent requests without any gap" is not realistic. The user goes to the client, which connects to the database, which takes queries rapidly, but not really simultaneously. That is, in real life queries rarely start simultaneously.
Also, if you have the configuration (MySQL's max_connections) and/or the corresponding setting in the client too high, then you are asking for the "thundering herd" syndrome. It's like being in an over-crowded grocery store and you can't move your cart because all the space is taken.
More specifically, 16 CPUs will stumble over each other vying for resources when you throw 100 queries into the mix "concurrently".
As for inserting a lot of rows, there are several techniques.
LOAD DATA is very fast.
"Batched INSERT" is fast. This is where a single INSERT has lots of records. I often see 10x speedup with 100 rows at a time. (versus single-row inserts)
BEGIN...COMMIT around a bunch of single-row inserts. This avoids some of the "transaction" overhead.
Avoid UNIQUE indexes (other than the PRIMARY KEY) on the table you are loading.
Ping-ponging staging tables: http://mysql.rjweb.org/doc.php/staging_table -- this allows multiple clients to rapidly feed data in.

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

MySQL Connection Limit Advice

I've hit a problem with my MySQL queries and was hoping someone could offer some help/advice.
I'm developing a PHP-based system which combines quite a lot of data in different tabs on one page (tab1 = profile, tab2 = address, tab3 = payments etc.) and as a result, one page can have up to 34/40 MySQL queries pulling from different tables or with different criteria.
The page load became really slow and I asked my web host if they knew what was wrong and they advised it's because of slow MySQL queries (some over 2 seconds). They also said that my MySQL user is only allowed 15 connections at a time.
If my page has 40 queries and only 15 connections are allowed at a time, does this mean they effectively queue and wait for one to complete? If this was the case then I can understand why the page is taking a while to load but i'm not sure of the solution. Is 15 MySQL queries considered a lot or is this quite a tight restriction by my host (HostMonster)?
Also, if there were 15 users accessing the system at the same time, would this 15 connections be split between each of them or is it 15 connections per user logged into the site? I assume they mean per database user but all people who access the system will be using the same database user so it seems impossible to create a system in which several users can access at one?
The whole connections thing has confused me a little.
Thanks in advance for any help!
Have one connection per page. Put the queries into sequence
Optimize those queries - see explain and use indexes
Perhaps combine queries to reduce the through put.
BTW 10+ queries per page is excessive IMHO.
If having max connect error problem then use this command from command line
mysqladmin flush-hosts -uuser -p'password'
this will flush hosts that MySQL has recorded and will build the list again. In the newer version of MYSQL 5.6 you get more information on this but not on previous version.
You can set the following
max_connect_errors 10000
to avoid the message to appear again.
15 queries or connection is not a problem at all, in busy databases we have seen thousand connections and tens of thousands of queries per second.