I'm running a website on a cloud server. The website is functioning completely around a (rather large) database. Over the last two weeks I've noticed steady rise in the CPU load of the MYSQL and I'm not sure why. It has been 15-16% for a while and then it started climbing by 1-2% a day. Currently we are at 27% and thought there has been a rise in traffic, it wasn't that big.... What could be causing this?
Thanks!
Check you MySQL slow log. Don't forget to add the queries not using indexes in the log.
Fix any queries you find in there.
The problem actually was the build-up of several caches. If someone else encounters this problem, I suggest to look at the red values under 'Status'-> 'All statusvariables' in your PHPmyAdmin. Enlarging the tmp_table_size and flushing the query cache did wonders for me.
Good luck!
Related
This morning, one of our Aurora clusters suddenly began experiencing high latency, slower running queries and was being reported as exceeding exceeding capacity - with up to 20 sessions for the db.r5.large instance which has only 2 CPUs.
There we no code changes, no deploy, no background process or any other cause we can identify. The higher latency is intermittent, occurring every 10 minutes and lasting for about as long. The Aurora monitoring isn't helping much, the only change of note being higher latency on the all queries (selects, updates and deletes).
Under Performance Metrics, the cases where usage spikes - we're seeing that of the total 20 sessions, these are attributed almost solely to the io/file/myisam/kfile Wait. Researching online has yielded very little and so I'm somewhat stumped as to what this means, and how to go about getting to the cause of the issue. Looking at the SQL queries ran during spikes, their slow run time appears more caused by the intermittent issue - as opposed to the being the cause of it.
So my question is: can anyone explain what the 'myisam/kfile' Wait is, and how I can use this knowledge to help diagnose the cause of the problem here?
My feeling is that it's one of those rare occurrences where an AWS instance unexplainably goes rogue at a level below which we can directly control and is only solved by spinning up a new instance (even where all else is equal from a configuration and code perspective). All the same, I'd love to better understand the issue here, especially when none of our DB table are MyISAM, all being innoDB.
Is there a table called kfile? How big is it? What operations are being performed?
While the problem is occurring, do SHOW FULL PROCESSLIST; to see what is running. That may give a good clue.
If the slowlog is turned on, look at it shortly after the problem has subsided; probably the naughty query will be at the end of the list. Publish pt-query-digest path_to_slowlog. The first one or two queries are very likely to be the villains.
Check SHOW ENGINE INNODB STATUS;. Near the front will be the "latest deadlock". That may be a clue.
In most situations most of the graphs don't provide any useful information. When something does go wrong, it is not obvious which graph to look at. I would look for graphs that look "different" in sync with the problem. The one you show us perhaps indicates that there is a 20-second timeout, and everyone is stuck until they hit that timeout.
Do you run any ALTERs? When do backups occur?
Are you using MyISAM? Don't. That ENGINE does not allow concurrency, hence could lead to a bunch of queries piling up. More on converting to InnoDB: http://mysql.rjweb.org/doc.php/myisam2innodb
As you can see from the screenshot attached, we have experienced sudden cpu spike and storage loss. We have nearly lost all storage and had to increase it manually.
When we check database size, it still has the size before this occured, so it seems its not database related. We have checked a lot of stuff (slow logs etc.) but couldn't find the problem.
Is it possible there has been an attack, or any other ideas why this happened and how to recover our free storage?
Thank you.
It's hard to say what the exact issue is but looking at the graph, it looks like you have some huge query running against your database that is filling any temp space up. When it runs itself out of room it is killing the query and that is then flushing a bunch of writes to the disk which could either be related to the query/statement or simply be unrelated queued inserts/updates.
You need to look at the slow query log if it's enabled to see if there's anything unusual there and also check your application(s) to see if they were trying to execute a ridiculous query/statement that was hammering the database.
One of the projects I'm working on is suffering from a recent slowdown in the DB (since last week).
Code hasn't changed, data may have changed a little but not significantly so at this stage I'm just exploring DB configuration (as we are on a managed hosting platform, end have had some similar issues in the past).
Unfortunately I'm out of my depth a bit... could anyone please take a look at the output from SHOW STATUS below and see if any of it sets alarm bells off? The only thing I've spotted so far is that key_reads vs key_read_requests don't seem quite right.
Our setup is 2 servers replicated, with all reads done from the slave. Queries which run in 0.01 secs on the master are taking up to 7 secs on the slave... and this has only started recently.
All tables are MyIsam and inserts/updates are negligible (updates happen out of hours). Front end is an ASP .NET website (.NET 4) running on IIS8 with a devart component for data access.
Thanks!
SHOW STATUS output is here: http://pastebin.com/w6xDeD48
Other factors can impact MySQL performance:
virus scanning software -> I had a issue with McAfee bogging out peformance due to it scanning temporary table files
Other services running on server?
Have you tried a EXPLAIN SELECT on the query? This would given you an indication of the index size. As #Liath indicated the indexes may be out of date on the slave but find on the master.
Just an update in case it ever helps anyone else in future - it looks like the culprit might be the query cache for now, as we are seeing better performance with it turned off (still not quite as good as we had before the issue).
So we will try to tune it a little and get back to great performance!
Im working with SQL Server 2008 R2. In development environment time to time i can see that CPU is under load for couple of minutes (around 55%-80% while normal is 1-2%. yes really- normal load in my development environment is almost none!). During this CPU pressure time automated tests sometimes gets timeout errors.
Just experienced timeout in activity monitor. looks like this:
Tipicaly during these pressure moments its looks like this:
Problem is that i cant understand why it is happening! There is continuously executed automated tests, but they are not making heavy workload. During performance tests system works good and if it slows down there is always good explanation for that.
Im trying to resolve issue by
Running trace, but during those CPU spikes there is "nothing special" going on. No expensive queries.
Using SQL Activity Monitor- everything seems normal, except CPU (just like 1-2 waiting tasks, low I/O, ~5 requests/sec). Recent expensive queries are not that expensive.
Querying data. Using famous sp_WhoIsActive and sys.dm_exec_requests. As far i understand- nothing unusual again..
About my server
There is small number of databases and i do know them good.
Using Service Broker.
Trace is running most of the time.
I do suspect that it is some background process that is making problems. But i dont really get it.. Can you please give some hints/ideas how to resolve this?
It could be some internal SQL Server job. Like a big index rebuild.
Wait for the spike and run sp_who2 'active'. Check the column CPU Time.
Actually, how are you 100% sure that SQL is the responsible? Couldn't it be a SO issue?
I have faced the same issues, and have raised case with Microsoft.
Microsoft guy told there is no issue from SQL DB side , if cpu is spiked. Finally issue is resolved by Microsoft , actually issue was that on IIS, not SQL Server.
Every 29 days IIS need to be restart, So that you will get better performance on Application.
This is the most puzzling MySQL problem that I've encountered in my career as an administrator. Can anyone with MySQL mastery help me a bit on this?:
Right now, I run an application that queries my MySQL/InnoDB tables many times a second. These queries are simple and optimized -- either single row inserts or selects with an index.
Usually, the queries are super fast, running under 10 ms. However, once every hour or so, all the queries slow down. For example, at 5:04:39 today, a bunch of simple queries all took more than 1-3 seconds to run, as shown in my slow query log.
Why is this the case, and what do you think the solution is?
I have some ideas of my own: maybe the hard drive is busy during that time? I do run a cloud server (rackspace) But I have flush_log_at_trx_commit set to 0 and tons of buffer memory (10x the table size on disk). So the inserts and selects should be done from memory right?
Has anyone else experience something like this before? I've searched all over this forum and others, and it really seems like no other MySQL problem I've seen before.
There are many reasons for sudden stalls. For example - even if you are using flush_log_at_trx_commit=0, InnoDB will need to pause briefly as it extends the size of data files.
My experience with the smaller instance types on Rackspace is that IO is completely awful. I've seen random writes (which should take 10ms) take 500ms.
There is nothing in built-in MySQL that will help you identify the problem easier. What you might want to do is take a look at Percona Server's slow query log enhancements. There's a specific feature called "profiling_server" which can break down time:
http://www.percona.com/docs/wiki/percona-server:features:slow_extended#changes_to_the_log_format