A Most Puzzling MySQL Problem: Queries Sporadically Slow

A Most Puzzling MySQL Problem: Queries Sporadically Slow - mysql

This is the most puzzling MySQL problem that I've encountered in my career as an administrator. Can anyone with MySQL mastery help me a bit on this?:
Right now, I run an application that queries my MySQL/InnoDB tables many times a second. These queries are simple and optimized -- either single row inserts or selects with an index.
Usually, the queries are super fast, running under 10 ms. However, once every hour or so, all the queries slow down. For example, at 5:04:39 today, a bunch of simple queries all took more than 1-3 seconds to run, as shown in my slow query log.
Why is this the case, and what do you think the solution is?
I have some ideas of my own: maybe the hard drive is busy during that time? I do run a cloud server (rackspace) But I have flush_log_at_trx_commit set to 0 and tons of buffer memory (10x the table size on disk). So the inserts and selects should be done from memory right?
Has anyone else experience something like this before? I've searched all over this forum and others, and it really seems like no other MySQL problem I've seen before.

There are many reasons for sudden stalls. For example - even if you are using flush_log_at_trx_commit=0, InnoDB will need to pause briefly as it extends the size of data files.
My experience with the smaller instance types on Rackspace is that IO is completely awful. I've seen random writes (which should take 10ms) take 500ms.
There is nothing in built-in MySQL that will help you identify the problem easier. What you might want to do is take a look at Percona Server's slow query log enhancements. There's a specific feature called "profiling_server" which can break down time:
http://www.percona.com/docs/wiki/percona-server:features:slow_extended#changes_to_the_log_format

Related

Aurora database exceeding capacity due to "/io/file/myisam/kfile" Wait

This morning, one of our Aurora clusters suddenly began experiencing high latency, slower running queries and was being reported as exceeding exceeding capacity - with up to 20 sessions for the db.r5.large instance which has only 2 CPUs.
There we no code changes, no deploy, no background process or any other cause we can identify. The higher latency is intermittent, occurring every 10 minutes and lasting for about as long. The Aurora monitoring isn't helping much, the only change of note being higher latency on the all queries (selects, updates and deletes).
Under Performance Metrics, the cases where usage spikes - we're seeing that of the total 20 sessions, these are attributed almost solely to the io/file/myisam/kfile Wait. Researching online has yielded very little and so I'm somewhat stumped as to what this means, and how to go about getting to the cause of the issue. Looking at the SQL queries ran during spikes, their slow run time appears more caused by the intermittent issue - as opposed to the being the cause of it.
So my question is: can anyone explain what the 'myisam/kfile' Wait is, and how I can use this knowledge to help diagnose the cause of the problem here?
My feeling is that it's one of those rare occurrences where an AWS instance unexplainably goes rogue at a level below which we can directly control and is only solved by spinning up a new instance (even where all else is equal from a configuration and code perspective). All the same, I'd love to better understand the issue here, especially when none of our DB table are MyISAM, all being innoDB.

Is there a table called kfile? How big is it? What operations are being performed?
While the problem is occurring, do SHOW FULL PROCESSLIST; to see what is running. That may give a good clue.
If the slowlog is turned on, look at it shortly after the problem has subsided; probably the naughty query will be at the end of the list. Publish pt-query-digest path_to_slowlog. The first one or two queries are very likely to be the villains.
Check SHOW ENGINE INNODB STATUS;. Near the front will be the "latest deadlock". That may be a clue.
In most situations most of the graphs don't provide any useful information. When something does go wrong, it is not obvious which graph to look at. I would look for graphs that look "different" in sync with the problem. The one you show us perhaps indicates that there is a 20-second timeout, and everyone is stuck until they hit that timeout.
Do you run any ALTERs? When do backups occur?
Are you using MyISAM? Don't. That ENGINE does not allow concurrency, hence could lead to a bunch of queries piling up. More on converting to InnoDB: http://mysql.rjweb.org/doc.php/myisam2innodb

How to troubleshoot intermittently slow queries in MySQL?

We have a query which most of the time takes about 150ms to complete. Thousands go through the system each day without any issue. But every few days, something happens, and one of these queries suddenly takes roughly 30 minutes to complete. Any subsequent queries also run slow. The only way we have been able to recover is to kill ever single one of these queries. Soon as we do that, any subsequent queries again run at usual 150ms speed. For security reasons, I am not allowed to post the query itself. But it is nothing special.
The dB is MySQL 5.5.40 and using the innodb engine. During this period, all the usual system resources look fine - memory, cpu, disk space, disk i/o, network i/o.
Can someone give me some ideas about how I can troubleshoot the nature of this issue? I do not believe it is the query, since it seems to work just great 99% of the time. So I am thinking there is some kind of MySQL bug or a weird race condition going on.

This appears to have been some kind of issue with the planner in MySQL 5.5. Since our upgrade to 5.6, the problem has gone away. Also the explain of the query is completely different.

MySQL sudden performance drop

One of the projects I'm working on is suffering from a recent slowdown in the DB (since last week).
Code hasn't changed, data may have changed a little but not significantly so at this stage I'm just exploring DB configuration (as we are on a managed hosting platform, end have had some similar issues in the past).
Unfortunately I'm out of my depth a bit... could anyone please take a look at the output from SHOW STATUS below and see if any of it sets alarm bells off? The only thing I've spotted so far is that key_reads vs key_read_requests don't seem quite right.
Our setup is 2 servers replicated, with all reads done from the slave. Queries which run in 0.01 secs on the master are taking up to 7 secs on the slave... and this has only started recently.
All tables are MyIsam and inserts/updates are negligible (updates happen out of hours). Front end is an ASP .NET website (.NET 4) running on IIS8 with a devart component for data access.
Thanks!
SHOW STATUS output is here: http://pastebin.com/w6xDeD48

Other factors can impact MySQL performance:
virus scanning software -> I had a issue with McAfee bogging out peformance due to it scanning temporary table files
Other services running on server?
Have you tried a EXPLAIN SELECT on the query? This would given you an indication of the index size. As #Liath indicated the indexes may be out of date on the slave but find on the master.

Just an update in case it ever helps anyone else in future - it looks like the culprit might be the query cache for now, as we are seeing better performance with it turned off (still not quite as good as we had before the issue).
So we will try to tune it a little and get back to great performance!

what is causing random spikes in local mysql server query speeds?

So while playing around on my localhost in phpMyAdmin and doing some stuff with SQL, I realized that I would randomly get huge spikes in the time it took to perform a database query. I have a database table with about 3000 entries, and I was running a very simple query to display the first 2500 of them.
On average, running this query was taking around 0.003 to 0.004 seconds. (Of course, loading the phpMyAdmin page took much longer, but we're just looking at the query times.) However, I noticed that occasionally the query times would go up past 0.01. Once it even shot up to 0.04. So, my curiosity getting the better of me, I decided to repeatedly run the same query, and produced a graph of my results:
I'm not running anything else on my computer that may be interacting with MySQL, and because it's my localhost I'm the only one that's doing anything to mess with my database (right?). Slight outliers are understandable, but what's causing the load times to go up anywhere from 3 to 30 times, completely randomly it seems?
Can anyone help me satiate my curiosity?

I'm not running anything else on my computer that may be interacting with MySQL
But is there anything else running on your computer that might be interacting with your hard drive /CPU on a regular basis? Because that would explain the spikes. Maybe have a scan of running processes, and compare the cpu/disk activity against the spikes.

Even though your database is running on your local host, it's not running in complete isolation. It is competing for your system's resources with every other process you have running.

How to track down problematic MySQL queries?

I use MySQL (Percona ExtraDB 5.1 to be exact) as my database of choice. Overall, very impressed with performance. The applications that use it are quite large.
We believe that a query is sometimes causing a backup of threads on the database for whatever reason (i.e., memory/buffers). The server has been tweaked countless times to prevent this so it's literally a 1% problem now, but still very annoying. Unless you are monitoring the database server 24/7 you are unlikely to ever see the cause of the backup.
Is there any recommendation (apart from going through the slow query log) which anyone can suggest to track the problematic queries (i.e., reporting via the application)?

Percona Server with XtraDB actually logs both the timestamp and the execution time in microsecond resolution, so you can find the start and the end of the queries precisely. However, log analysis is probably the wrong approach. You probably need to use Aspersa's stalk+collect tools.

As you point out in your question, your best bet will be the slow query log:
http://dev.mysql.com/doc/refman/5.5/en/slow-query-log.html
You might also want to log this at the app level:
At the beginning of your scripts, keep a note of what you're about to do and when it started. At the end of it, log this information if the time spent processing the request is higher than a certain threshold.
That way, you'll be able to identify problematic sequences of queries rather than individual queries. (Which, incidentally, might reveal that no individual query is slow but some requests might fire gazillions of small queries.)

Have a look at this script which allows you to extract a more abstracted representations of the queries causing the problems.
I usually sort the list by the product of frequency and runtime to get the queries causing the most problems.
NB recording the actual start and end of the queries is irrelevant to measuring the queries actually causing locks - from the manual "The time to acquire the initial table locks is not counted as execution time"
You just need to fix the slow stuff.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008