"Waiting for table flush" Against Analyze Command - mysql

I was trying to run analyze command on a table out of 900 tables in mysql 5.7.30. Its stuck my all db process-list and connections spike immediate and lot of commands found with state "Waiting for table flush" even our max_connection parameter reaches at 2500. We are running the analyze table command from last 3 years but from last 1 month we notice this issue 4th time. If we didn't analyze our tables then we see severe performance issues and lot of queries enter into state "statistics". Whats your thoughts on it

You most definitely shouldn't be running ANALYZE regularly or automatically. It sounds like you were dodging the bullet of queries stuck in the waiting for able flush state purely because the load on your servers was sufficiently low that you didn't notice it before. You should only ever run this on a table sparingly when you have clear, definitive evidence that the index statistics on that table are sufficiently detached from reality to cause the query optimiser to regularly come up with egregiously poor execution plan.

Related

MySQL queries very slow - occasionally

I'm running MariaDB 10.2.31 on Ubuntu 18.4.4 LTS.
On a regular basis I encounter the following conundrum - especially when starting out in the morning, that is when my DEV environment has been idle for the night - but also during the day from time to time.
I have a table (this applies to other tables as well) with approx. 15.000 rows and (amongst others) an index on a VARCHAR column containing on average 5 to 10 characters.
Notably, most columns including this one are GENERATED ALWAYS AS (JSON_EXTRACT(....)) STORED since 99% of my data comes from a REST API as JSON-encoded strings (and conveniently I simply store those in one column and extract everything else).
When running a query on that column WHERE colname LIKE 'text%' I find query-result durations of i.e. 0.006 seconds. Nice. When I have my query EXPLAINed, I can see that the index is being used.
However, as I have mentioned, when I start out in the morning, this takes way longer (14 seconds this morning). I know about the query cache and I tried this with query cache turned off (both via SET GLOBAL query_cache_type=OFF and RESET QUERY CACHE). In this case I get consistent times of approx. 0.3 seconds - as expected.
So, what would you recommend I should look into? Is my DB sleeping? Is there such a thing?
There are two things that could be going on:
1) Cold caches (overnight backup, mysqld restart, or large processing job results in this particular index and table data being evicted from memory).
2) Statistics on the table go stale and the query planner gets confused until you run some queries against the table and the statistics get refreshed. You can force an update using ANALYZE TABLE table_name.
3) Query planner heisenbug. Very common in MySQL 5.7 and later, never seen it before on MariaDB so this is rather unlikely.
You can get to the bottom of this by enablign the following in the config:
log_output='FILE'
log_slow_queries=1
log_slow_verbosity='query_plan,explain'
long_query_time=1
Then review what is in the slow log just after you see a slow occurrence. If the logged explain plan looks the same for both slow and fast cases, you have a cold caches issue. If they are different, you have a table stats issue and you need to cron ANALYZE TABLE at the end of the over night task that reads/writes a lot to that table. If that doesn't help, as a last resort, hard code an index hint into your query with FORCE INDEX (index_name).
Enable your slow query log with log_slow_verbosity=query_plan,explain and the long_query_time sufficient to catch the results. See if occasionally its using a different (or no) index.
Before you start your next day, look at SHOW GLOBAL STATUS LIKE "innodb_buffer_pool%" and after your query look at the values again. See how many buffer pool reads vs read requests are in this status output to see if all are coming off disk.
As #Solarflare mentioned, backups and nightly activity might be purging the innodb buffer pool of cached data and reverting bad to disk to make it slow again. As part of your nightly activites you could set innodb_buffer_pool_dump_now=1 to save the pages being hot before scripted activity and innodb_buffer_pool_load_now=1 to restore it.
Shout-out and Thank you to everyone giving valuable insight!
From all the tips you guys gave I think I am starting to understand the problem better and beginning to narrow it down:
First thing I found was my default innodb_buffer_pool_size of 134 MB. With the sort and amount of data I'm processing this is ridiculously low - so I was able to increase it.
Very helpful post: https://dba.stackexchange.com/a/27341
And from the docs: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Now that I have increased it to close to 2GB and am able to monitor its usage and RAM usage in general (cli: cat /proc/meminfo) I realize that my 4GB RAM is in fact on the low side of things. I am nowhere near seeing any unused overhead (buffer usage still at 99% and free RAM around 100MB).
I will start to optimize RAM usage of my daemon next and see where this leads - but this will not free enough RAM altogether.
#danblack mentioned innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. This is an interesting approach to maybe use whenever the daemon accesses the DB as I would love to separate my daemon's buffer usage from the front end's (apparently this is not possible!). I will look into this further but as my daemon is running all the time (not only at night) this might not be feasible.
#Gordan Bobic mentioned "refreshing" DBtables by using ANALYZE TABLE tableName. I found this to be quite fast and incorporated it into the daemon after each time it does an extensive read/write. This increases daemon run times by a few seconds but this is no issue at all. And I figure I can't go wrong with it :)
So, in the end I believe my issue to be a combination of things: Too small buffer size, too small RAM, too many read/write operations for that environment (evicting buffered indexes etc.).
Also I will have to learn more about memory allocation etc and optimize this better (large-pages=1 etc).

Perfectly good queries on my sql taking more than 5 seconds

Lately we are seeing some queries in mysql(master) logs but no idea why they are shown there:
Queries are select/update table where id = <some integer>.
There is index on id
table size is below 100 000
Rows scanned are in hundreds (sometimes < 100)
Server is running on extremely good hardware
there are no joins involved
We do not see any heavy activity running on database at that time
tables are innodb
the same queries generally don’t even take 50ms, but sometimes all the execution of these queries take about 4-8 seconds
One observation was all the similar "non-slow-but-weirdly-taking-high-time" queries take almost the same amount of time for some duration . I.e. queries like stated in the top will all take about 4.35 seconds with variation of 0.05 seconds.
Does the network latency/packets-drop affect mysql query timing?
show processlist;
show global status like '%onnect%';
show global status like '%open%';
Is anything backed up? Is it waiting in a queue? waiting for file handles? What are your max_connections, open-files-limit, thread_concurrency ?
One side question: does the network latency/packets-drop affect mysql query timing?
Yes, the timeout must occur before the query is resent by the client
Do you see these problems locally or over the network? If the latter, then obviously packet drops can affect your performance if you are measuring from the client.
Is it running in a virtual machine that can affect the performance?
Disk problems?
How is serialization set up? Can it be a contention problem by many processes accessing the same row?
You may want to enable the query/slow query logs to see if there is any sort of pattern that causes this.
Mysql slow log is not representative source to learn about your slow queries. If something makes server work slow all queries usually go to slow log.
E.g. if you have some slow blocking select on MyISAM a lot of PK updates will go to slow log.
You need to search for other slow queries or server problems. What about load average on this particular machine? Isn't mysql displaced into swap memory? Other applications? Queries per second?

MySQL query slowing down until restart

I have a service that sits on top of a MySQL 5.5 database (INNODB). The service has a background job that is supposed to run every week or so. On a high level the background job does the following:
Do some initial DB read and write in one transaction
Execute UMQ (described below) with a set of parameters in one transaction.
If no records are returned we are done!
Process the result from UMQ (this is a bit heavy so it is done outside of any DB
transaction)
Write the outcome of the previous step to DB in one transaction (this
writes to tables queried by UMQ and ensures that the same records are not found again by UMQ).
Goto step 2.
UMQ - Ugly Monster Query: This is a nasty database query that joins a bunch of tables, has conditions on columns in several of these tables and includes a NOT EXISTS subquery with some more joins and conditions. UMQ includes ORDER BY also has LIMIT 1000. Even though the query is bad I have done what I can here - there are indexes on all columns filtered on and the joins are all over foreign key relations.
I do expect UMQ to be heavy and take some time, which is why it's executed in a background job. However, what I'm seeing is rapidly degrading performance until it eventually causes a timeout in my service (maybe 50 times slower after 10 iterations).
First I thought that it was because the data queried by UMQ changes (see step 4 above) but that wasn't it because if I took the last query (the one that caused the timeout) from the slow query log and executed it myself directly I got the same behavior only until I restated the MySQL service. After restart the exact query on the exact same data that took >30 seconds before restart now took <0.5 seconds. I can reproduce this behavior every time by restoring the database to it's initial state and restarting the process.
Also, using the trick described in this question I could see that the query scans around 60K rows after restart as opposed to 18M rows before. EXPLAIN tells me that around 10K rows should be scanned and the result of EXPLAIN is always the same. No other processes are accessing the database at the same time and the lock_time in the slow query log is always 0. SHOW ENGINE INNODB STATUS before and after restart gives me no hints.
So finally the question: Does anybody have any clue of why I'm seeing this behavior? And how can I analyze this further?
I have the feeling that I need to configure MySQL differently in some way but I have searched and tested like crazy without coming up with anything that makes a difference.
Turns out that the behavior I saw was the result of how the MySQL optimizer uses InnoDB statistics to decide on an execution plan. This article put me on the right track (even though it does not exactly discuss my problem). The most important thing I learned from this is that MySQL calculates statistics on startup and then once in a while. This statistics is then used to optimize queries.
The way I had set up the test data the table T where most writes are done in step 4 started out as empty. After each iteration T would contain more and more records but the InnoDB statistics had not yet been updated to reflect this. Because of this the MySQL optimizer always chose an execution plan for UMQ (which includes a JOIN with T) that worked well when T was empty but worse and worse the more records T contained.
To verify this I added an ANALYZE TABLE T; before every execution of UMQ and the rapid degradation disappeared. No lightning performance but acceptable. I also saw that leaving the database for half an hour or so (maybe a bit shorter but at least more than a couple of minutes) would allow the InnoDB statistics to refresh automatically.
In a real scenario the relative difference in index cardinality for the tables involved in UMQ will look quite different and will not change as rapidly so I have decided that I don't really need to do anything about it.
thank you very much for the analysis and answer. I've been searching this issue for several days during ci on mariadb 10.1 and bacula server 9.4 (debian buster).
The situation was that after fresh server installation during a CI cycle, the first two tests (backup and restore) runs smoothly on unrestarted mariadb server and only the third test showed that one particular UMQ took about 20 minutes (building directory tree during restore process from the table with about 30k rows).
Unless the mardiadb server was restarted or table has been analyzed the problem would not go away. ANALYZE TABLE or the restart changed the cardinality of the fields and internal query processing exactly as stated in the linked article.

How much time does MySQL need to build an index

How much time does MySQL need to build an index of a table with 30,000,000 entries that are strings of length 256?
At the moment it seems to take hours and I don't know how long I should wait till I conclude that MySql simply failed at building an index.
You may run SHOW PROCESSLIST \G in mysql console to watch its state. I had a similar problem just a couple of hours ago, but my table was much smaller.
Here a list of thread states you will definitely need. After an hour of waiting I realized that ALTER TABLE CREATE INDEX is in Locked state, I needed to restart mysqld and run the statement once again. That time I had index built in 15 minutes.
By the way, I recommend to run index creation from mysql console, GUI tools may add some spices to the process.
it could easily take hours. it all depends on the machine specs, load, etc etc. to see whether it's failed, check something like top or watch your hard drives - if they're going mad it's still indexing.
Depending on your OS you may check for disk activity (i.e. does it reads/writes DB files) to find out if it failed or not.

Log killed queries in MySQL

I have some strange bug into a application(or is it the MySQL build?) that causes queries to remain in "locked" state forever, filling up the max number of threads.
I read about setting the wait_timeout variable to kill the "bogus" threads after a period of time. This works ok, but I would like to log the killed queries for further inspection/making sure backup scripts are not killed.
Is there any possibility to do that?
Thanks.
You might be able to use the slow log, but I'm not sure if the problem is that they never complete. Worth a shot.
Also, you may be able to see what's going on by running SHOW FULL PROCESSLIST while you've got dead threads. It should show you what the problem is and what the query was.
If you can simulate this in a development environment, you could also turn on general query logging (which records every statement) and then just tail the log after it crashes.
In the past, I have tagged queries with a unique comment (per query type):
/* Query_12345 */ SELECT ... FROM ... WHERE ... LIMIT ...
A background process would poll SHOW FULL PROCESSLIST and look for any queries that were more than X seconds long, and tagged with Query_NNNNN.
Finally, it would kill them if they went on too long. This allowed the server to breath while we figured out how to optimize the 80,000,000 record table that was slowing things down.