Optimizing mysql database - mysql

I am running a flask app with mysql db.
I have 400000 records in a table and there is a query (insert with select) that takes around 3 seconds if run single.
But when I tried to load test it by hitting the api with multiple requests at a time (like 20 hits at a time, 50 and 100 hits at a time), the response for all requests coming at once. For example, if total 100 concurrent requests takes around 3 mins, then all those individual requests are starting immediately but giving response after 3 mins only (instead of 3 or 4 seconds).
Also, I tried with 1 gb ram server, 4 gb ram server and also 32 gb ram server with 16 cpus. Here is the response as below:
# 4GB RAM, 2 CPUS server with only Mysql installed in it
Total time is: 0:05:29.752275 (all 100 requests getting response after 5 mins(total time), not 3 or 4 seconds)
successful: 89
Failed: 11
Tried: 100
# 32 GB RAM, 16 CPUs server with only Mysql installed in it
Total time is: 0:05:17.119773 (all 100 requests getting response after 5 mins(total time), not 3 or 4 seconds)
successful: 86
Failed: 14
Tried: 100
So, if you see, both 4gb and 32gb servers has almost no difference in performance. So it seems like something totally wrong with my setup/configuration/query.
More details:
I ran another script where I directly hit db in the script without any api. This way, even though server has 4gb RAM, mysql server dying(segmentation fault and also mysql server dies) for just 3 concurrent requests.
But then I gave like 0.2, 0.3 and 0.5 milli seconds delay between each hit/thread so the results were slighly meaningful. Each request used to take total time but with 0.5 ms delay between each hit, each request completing in less than 10 seconds.
Can I do anything so my server easily returns response fast for atleast 100 concurrent requests without any gap between requests(and is that necessary?)
Any thoughts on what to do here?

I think the root cause is flask. Flask is not good in multi-process/thread at all.
I meet this problem before, then change to Tornado and use supervisord to keep Tornado as daemon mode.
another solution is Gunicorn => https://intellipaat.com/community/12737/how-to-run-flask-with-gunicorn-in-multithreaded-mode

Simply put, "atleast 100 concurrent requests without any gap" is not realistic. The user goes to the client, which connects to the database, which takes queries rapidly, but not really simultaneously. That is, in real life queries rarely start simultaneously.
Also, if you have the configuration (MySQL's max_connections) and/or the corresponding setting in the client too high, then you are asking for the "thundering herd" syndrome. It's like being in an over-crowded grocery store and you can't move your cart because all the space is taken.
More specifically, 16 CPUs will stumble over each other vying for resources when you throw 100 queries into the mix "concurrently".
As for inserting a lot of rows, there are several techniques.
LOAD DATA is very fast.
"Batched INSERT" is fast. This is where a single INSERT has lots of records. I often see 10x speedup with 100 rows at a time. (versus single-row inserts)
BEGIN...COMMIT around a bunch of single-row inserts. This avoids some of the "transaction" overhead.
Avoid UNIQUE indexes (other than the PRIMARY KEY) on the table you are loading.
Ping-ponging staging tables: http://mysql.rjweb.org/doc.php/staging_table -- this allows multiple clients to rapidly feed data in.

Related

How to diagnose extremely slow AWS RDS MySQL Performance?

My DB has around 15 tables, each with 40 columns, with 10.000 rows each.
Most of it with VARCHAR, some indexes and foreign keys.
Sometime I need to reconstruct my database (design flaw, working on it), which takes about 40 seconds locally. Now I'm trying to do the same to a AWS RDS MySQL 5.75 instance, but it takes forever, something like 40-50 minutes. The last time I had to do this same process it took no more than 5 minutes, still way more than the local 40 seconds, but I'm happy with it.
My internet speed is at about 35 Mbps Download / 5 Mbps Upload.
I know it's not fast, but it's consistent, and it hasn't changed since my last rebuilt.
I enabled General Logs, but all I can see are the INSERT queries, occasionally some "SELECT 1".
I do have same space for improvements on my code, but still, from 00:40:00 to 50:00:00, it seems that there's something else going on.
Any ideas on how to diagnose and find the bottleneck?
Thanks
--
Additional relevant information:
It is a Micro instance from AWS, all of the relevant monitoring indicators are basically flat: CPU at 4%, Free Storage Space at 20.000 MB, Freeable Memory at 200 MB, Write IOPS at around 2,5, the server runs a 5.7.25 MySQL, 1vCPU, 1Gb of RAM and 20GB of SSD. This is the same as 3 months ago when I last rebuilt the database.
SHOW GLOBAL STATUS: https://pastebin.com/jSrAzYZP
SHOW GLOBAL VARIABLES: https://pastebin.com/YxD7dVhR
SHOW ENGINE INNODB STATUS: https://pastebin.com/r5wffB5t
SHOW PROCESS LIST: https://pastebin.com/kWwiyGwf
SELECT * FROM information_schema...: https://pastebin.com/eXGBmetP
I haven't made any big changes to the server configuration, except enabling logs, e maxing out max_allowed_packets and saving logs to file.
In my backend I have a Flask app running, when it receives the API call, it takes a bunch of pickled objects and adds them all to the database (appending the Flask SQLAlchemy class to a list) and then running db.session.add_all(entries), trying to run a bulk operation. The code is the same, both for localhost and my remote server.
It does get slower in three specific tables, most of them with VARCHAR columns, but nothing different from my last inserts - it seems odd that the problem would be data, or the way the code is structured, or at least doesn't seem reasonable that this would result in a 20 second (localhost) to 40 minutes (hosted server) time, specially when the rest of the tables work mostly the same.
Enable the slow log, set long_query_time=0, run your code, then put the resulting log through mysqldumpslow.
Establish which queries contribute most to slowness and take it from there.
Compare the config between your old server and your new one.
Also, are they the same version of MySQL? 5.6, 5.7 and 8.0 can produce very different execution plans (with 5.6 usually coming up with the sane one if they differ).
Rate Per Second = RPS
Suggestions to consider for your AWS RDS Parameters group
thread_cache_size=24 # from 8 to reduce threads_created count
innodb_io_capacity=1900 # from 200 to enable more use of SSD IOPS capacity
read_rnd_buffer_size=128K # from 512K to reduce handler_read_rnd_next RPS of 21
query_cache_size=0 # from 1M since you have QC turned off with query_cache_typ=OFF
Determine why com_flush is running 13 times per hour and get it stopped to avoid table open thrashing.
I found that after migrating to RDS all my database Indexes are gone! They weren't migrated along with the schema and data. Make sure you're indexes are there.
Also, MySQL query cache is OFF by default in RDS. This won't help the performance of your initial query, but it may speed things up in general.
You can set query_cache_type to 1 and define a value for query_cache_size. I also changed the thread_cache_size from 8 to 24 and innodb_io_capacity from 200 to 1900 don't know if it helps you.
Also creating AWS DB Parameter Groups helped me a lot with configuring and tuning DB variables. Here you can read more:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

Set wait_timout to 30 ,still sleep connections are increasing with a constant rate

[RESOLVED]
IT SEEMS PHP MYADMIN VARIABLES DID THE TRICK.
I SET wait_timeout to 30 , and Lock_wait_timeout to 50
Took almost 6 hours to get back to stable,including several restarts,may be time needs to read those changes.
PROBLEM :
Site Address http://topyaps.com
Num of queries on homepage 322 in 2 seconds
Nothing in Slow query log file.
I am using BITNAMI LAMP SERVER[amazon ec2] to run my wordpress based heavy site.
Problem is,my server crashes every 10 minutes.
Reason,as i guess,
when i check processlist using putty,it seems to be increasing constantly at very high rate ,all specifies sleep command.
I tried setting variables like :
wait_timeout=30
interactive_timeout=30
connect_timeout=15
max_user_connections=25
max_connections=999
but it doesn't seem to do any help.
LINK TO THE PICTURE OF PHPMYADMIN MONITOR,FOR A SINGLE PAGE LOAD[homepage]: http://postimg.org/image/5qqgb30xb/
HERE IS WHAT I GOT FROM PHPMYADMIN STATUS:
Questions since startup: 50,617 Documentation
ø per hour: 138,361
ø per minute: 2,306
ø per second: 38
Statements # ø per hour %
select 46,128 126.1 k 91.13
set option 2,452 6,702.5 4.84
change db 1,226 3,351.3 2.42
update 516 1,410.5 1.02
insert 135 369 0.27
delete 116 317.1 0.23
show binlogs 13 35.5 0.03
show variables 5 13.7 0.01
show processlist 5 13.7 0.01
show master status 4 10.9 0.01
show slave status 4 10.9 0.01
show databases 4 10.9 0.01
show tables 3 8.2 0.01
show status 3 8.2 0.01
show grants 1 2.7 <0.01
kill 1 2.7 <0.01
show table status 1 2.7 <0.01
select
set option
change db
update
insert
delete
show binlogs
Other
91%5%
I checked out my page queries it shows "328 queries in 2 seconds".
How to actually stop these connections ???
I'd look to see what these queries are first. Why are you having that many. If they are all repetitive queries and they are called on every connection, I would consider caching. You can install memcached and a plugin like W3TotalCache to cache your database queries. You can also cache frontend using W3TotalCache.
Memcached is a key-value store that stores the result of queries in the memory. So your database does not get hits on the same queries every time.
Also if you are using innodb as the storage engine, change the innodb_buffer_pool_size to 60% of your RAM
Although I can barely access yout site, the query to page ratio seems very high for a common wordpress install. My best guess is that you have too many plugins which in turn run too many custom queries on each page load.
You need to identify the source of this queries and find a way to offload them to a cache mechanism such as memcache. There are some caching plugins such as W3TC that offer to do this for you, but in the case of custom tailored queries it might not be too effective.
You say you're running on a bitnami lamp configuration, so your mysql runs in the same machine as your webserver. Have you considered running a separate mysql server, such as Amazon RDS? This won't be a magical solution, but will allow you to better diagnose the bottleneck while at the same time stopping you from tampering with mysql configuration, which is rarely the way to go.
What DNS are you using? I've found out that using Cloudflare's free dns service does provide a basic caching layer for static files, so user might be served a static copy of your html when they hit the frontpage. You'd have to configure cloudflare to perform an aggresive html caching. But as I said, it's free and it helps.

SQL queries break our game! (Back-end server is at capacity)

We have a Facebook game that stores all persistent data in a MySQL database that is running on a large Amazon RDS instance. One of our tables is 2GB in size. If I run any queries on that table that take more than a couple of seconds, any SQL actions performed by our game will fail with the error:
HTTP/1.1 503 Service Unavailable: Back-end server is at capacity
This obviously brings down our game!
I've monitored CPU usage on the RDS instance during these periods, and though it does spike, it doesn't go much over 50%. Previously we were on a smaller instance size and it did hit 100%, so I'd hoped just throwing more CPU capacity at the problem would solve it. I now think it's an issue with the number of open connections. However, I've only been working with SQL for 8 months or so, so I'm no expert on MySQL configuration.
Is there perhaps some configuration setting I can change to prevent these queries from overloading the server, or should I just not be running them whilst our game is up?
I'm using MySQL Workbench to run the queries.
Any help would be very much appreciated - Thanks!
EDIT:
Here's an example....
SELECT *
FROM BlueBoxEngineDB.Transfer
WHERE Amount = 1000
AND FromUserId = 4
AND Status='Complete';
The table looks like this:
TransferId Started Status Expires FromUserId ToUserId CurrencyId Amount SessionId
1177 2012-06-04 21:43:18 Added 150001 2 4 1 12156
1179 2012-06-04 21:48:50 ISF 150001 2 4 1 12156
1181 2012-06-04 22:08:33 Added 150001 2 4 25 12156
1183 2012-06-04 22:08:41 Complete 150001 2 4 50 12156
1185 2012-06-04 22:08:46 Added 150001 2 4 200 12156
You should REALLY consider running a high availability RDS and setting up a read replica off of it. That way you can run complex queries on the replica to your heart's content and not interfere with the production database.
A 2GB (in size) database is really not all that large. If you have the proper indexes on the tables you are trying to query, you should not be locking your DB up.
Above all - don't be running queries on a high capacity production database if you don't know what it is going to do. From the comments above it seems clear that you are not a very experienced DB admin. That's ok. Working on a high volume server will definitely be a learning experience for you, just try not to make your lessons ones where you crash your service. Again, this is why having a replica, or creating a DB snapshot and setting up a test DB before trying queries on large tables is a very good idea.
An index on (FromUserId, Amount, Status) would probably help this query a lot.
You may have though a lot more variations of queries that hit this table. Adding an index for every one of them, will end you in having tens of indexes in the table and this may bring other problems.
Try to analyze the slow query log and then optimize the slowest queries (and the ones that use more percentage of the CPU).
You probably need to tweak your schema (adding indexes is the immediate step).
To analyze your situation you can access the MySQL slow query logs for your database to determine if there are slow-running SQL queries and, if so, the performance characteristics of each. You could set the "slow_query_log" DB Parameter and query the mysql.slow_log table to review the slow-running SQL queries. Please refer to the Amazon RDS User Guide to learn more.
There are probably some of your tables that you should consider offloading to DynamoDB or Redis. Both of them will give a latency of single-digit milliseconds, and therefore are very popular among game developers. You just need to think about your data structure.

Apache not killing threads for more than 20 minutes after reaching max thread number

I am running a Linux server with 2 Xeon Quad core, with RAID10 and 8Gb of RAM.
I serve some products that dont use to overload the server, but 3 o 4 times a month, for the nature of the products, I get a huge amount of clients conecting to it.
For example, I got 1.000.000 requests in a lapse time of 3 hours.
The server has 2Gb asigned to the MySQL an Apache2 limited to 1500 threads using mpm.
The problem is that when the server reaches max number of threads (server reached MaxClients setting, consider raising the MaxClients setting) the load avg of the server gets really high, it goes from around 15 to 400 or so, and even after the clientes stops accessing, the number of apache threads keep really high and it may take more than 20 minutes to reducen de humber of them. By doing a 'ps aux' I can see that the process are created a the same time that the server reached the max number of threads.
If I do a strace of any of them I, it is doing just nothing.
The MSQL have no pending querys.
<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
ServerLimit 2000
#ListenBacklog 2000 #have just commented this out
#MaxClients 150
MaxClients 1500
#MaxRequestsPerChild 0
MaxRequestsPerChild 10000 # have just setted this value
</IfModule>
How can I prevent this from happening? Why are that threads no dieing or finishing?
I am expecting 1.000.000 hits today again.
Thanks in advance
Apache has the option KeepAliveTimeout that specify how many second to wait for a request before closing the connection. This require the option KeepAlive to be set to On.
There is also an option Timeout that specify the max number of second to wait for a GET, POST, PUT request
Probably you must experiment with these option until you find the best balance for your work load and you application. Probably a good solution can be a KeepAlive On with a low Timeout and KeepAliveTimeout value.