how to clear/flush mysql innodb buffer pool? - mysql

I'm perf tuning a large query, and want to run it from the same baseline before and after, for comparison.
I know about the mysql query cache, but its not relevant to me, since the 2 queries would not be cached anyway.
What is being cached, is the innodb pages, in the buffer pool.
Is there a way to clear the entire buffer pool so I can compare the two queries from the same starting point?
Whilst restarting the mysql server after running each query would no doubt work, Id like to avoid this if possible

WARNING : The following only works for MySQL 5.5 and MySQL 5.1.41+ (InnoDB Plugin)
Tweak the duration of entries in the InnoDB Buffer Pool with these settings:
// This is 0.25 seconds
SET GLOBAL innodb_old_blocks_time=250;
SET GLOBAL innodb_old_blocks_pct=5;
SET GLOBAL innodb_max_dirty_pages_pct=0;
When you are done testing, setting them back to the defaults:
SET GLOBAL innodb_old_blocks_time=0;
SET GLOBAL innodb_old_blocks_pct=37;
SET GLOBAL innodb_max_dirty_pages_pct=90;
// 75 for MySQL 5.5/MySQL 5.1 InnoDB Plugin
Check out the definition of these settings
MySQL 5.5
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
MySQL 5.1.41+
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct

Much simpler... Run this twice
SELECT SQL_NO_CACHE ...;
And look at the second timing.
The first one warms up the buffer_pool; the second one avoids the QC by having SQL_NO_CACHE. (In MySQL 8.0, leave off SQL_NO_CACHE; it is gone.)
So the second timing is a good indication of how long it takes in a production system with a warm cache.
Further, Look at Handler counts
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handlers%';
gives a reasonably clear picture of how many rows are touched. That, in turn, gives you a good feel for how much effort the query takes. Note that this can be run quite successfully (and quickly) on small datasets. Then you can (often) extrapolate to larger datasets.
A "Handler_read" might be reading an index row or a data row. It might be the 'next' row (hence probably cached in the block that was read for the previous row), or it might be random (hence possibly subject to another disk hit). That is, the technique fails to help much with "how many blocks are needed".
This Handler technique is impervious to what else is going on; it gives consistent results.
"Handler_write" indicates that a tmp table was needed.
Numbers that approximate the number of rows in the table (or a multiple of such), probably indicate a table scan(s). A number that is the same as LIMIT might mean that you build such a good index that it consumed the LIMIT into itself.
If you do flush the buffer_pool, you could watch for changes in Innodb_buffer_pool_reads to give a precise(?) count of the number of pages read in a cold system. This would include non-leaf index pages, which are almost always cached. If anything else is going on in the system, this STATUS value should not be trusted because it is 'global', not 'session'.

Related

Slow Update, Delete and Insert Queries under MariaDB

our server was updated from Ubuntu 16 to Ubuntu 20 with MariaDB. Unfortunately, the loading time of the website has become slower. Normally MariaDB should be faster than Mysql. I've found that, quite simply, update commands on the website take about 7 seconds sometimes. However, if I enter these update commands directly into the database via myphpadmin, they only take 0.0005ms.
It seems to me that MariaDB has a problem with update commands when they occur frequently. This was never a problem with mysql. Here's an query example:
UPDATE LOW_PRIORITY users
SET user_video_count = user_video_count + 1
WHERE user_id = 12345
The database format is MyISAM.
I have no idea what could be the reason. Do you?
Thank you very much.
It may be something as simple as a SELECT searching for something in users. Note, InnoDB would not suffer this problem.
MyISAM necessarily does a table lock when doing UPDATE, INSERT, or DELETE. (Also ALTER and other DDL statements.) If there are a lot of connections doing any mixture of writes and even SELECTs, the locks can cascade for a surprisingly long time.
The real solution, whether in MariaDB or [especially] in MySQL, is to switch to InnoDB.
If this is a case of high volume counting of "likes" or "views", then a partial solution (in either Engine) is to put such counters in a separate, parallel, table. This avoids those simple and fast updates fighting with other actions on the main table. In an extremely high traffic area, gathering such increments and applying them in batches is warranted. I don't think your volume needs that radical solution.
MySQL has all-but-eliminated MyISAM. MariaDB may follow suit in a few years.
To address this:
the same query in myphpadmin its really fast
The problem is not with how you run it, but what else happens to be going on at the same time.
(LOW PRIORITY is a MyISAM-specific kludge that sometimes works.)
MyISAM does "table locking"; InnoDB does "row locking". Hence, Innodb can do a lot of "simultaneous" actions on a table, whereas MyISAM becomes serialized as soon as a write occurs.
More (Now focusing on InnoDB.)
Some other things that may be involved.
If two UPDATEs are trying to modify the same row at the same time, one will have to wait (due to the row locking).
If there is a really large number of things going on, delays can cascade. If 20 connections are actively running at one instance, they are each slowing down each other. Each connection is given a fair share, but that means that they all are slowed down.
SHOW PROCESSLIST to see what is running -- not "Sleep". The process with the highest "Time" (except for system threads) is likely to be the instigator of the fracas.
The slowlog can help in diving deeper. I turn it on, with a low enough long_query_time and wait for the 'event' to happen. Then I use pt-query-digest (or mydumpslow -s t) to find out the slowest queries. With some more effort, one might notice that there were a lot of queries that were "slow" at one instant -- possibly even "point queries" (like UPDATE ... WHERE id=constant) unexpectedly running slower than long_query_time. This indicates too many queries and/or some query that is locking rows unexpectedly. (Note: the "timestamp" of the queries is when the query ended; subtract Query_time to get the start.) SlowLog
More
innodb_flush_log_at_trx_commit = 2, as you found out, is a good fix when rapidly doing lots of single-query transactions. If the frequency becomes too large for that fix, then my comments above may become necessary.
There won't be much performance difference between =2 and =0.
As for innodb_flush_log_at_timeout. Please provide `SHOW GLOBAL STATUS LIKE 'Binlog%commits'
As for innodb_lock_wait_timeout... I don't think that changing that will help you. If one of your queries aborts due to that timeout, you should record that it happened and retry the transaction.
It sounds like you are running with autocommit = ON and not using explicit transactions? That's fine (for non-money activity). There are cases where using a transaction can help performance -- such as artificially batching several queries together to avoid some I/O. The drawback is an increased chance of conflicts with other connections. Still, if you are always checking for errors and rerunning the 'transaction', all should be well.
innodb_flush_log_at_trx_commit
When that setting is "1", which is probably what you originally had, each Update did an extra write to disk to assure the data integrity. If the disk is HDD (not SDD), that adds about 10ms to each Update, hence leading to a max of somewhere around 100 updates/second. There are several ways around it.
innodb_flush_log_at_trx_commit = 0 or 2, sacrificing some data integrity.
Artificially combining several Updates into a single transaction, thereby spreading out the 10ms over multiple queries.
Explicitly combining several Updates based on what they are doing and/or which rows they touch. (In really busy systems, this could involve other servers and/or other tables.)
Moving the counter to another table (see above) -- this allows interference from more time-consuming operations on the main table. (I did not hear a clear example of this, but the slowlog might have pointed out such.)
Switch to SSD drives -- perhaps 10x increase in capacity of Updates.
I suspect the social media giants do all of the above.
As you are using MariaDB, you can use tools like EverSQL to find missing indexes or discover redundant indexes (e.g. you have an index on user_video_count that you don't really need)
First of all I would like to thank everyone who helped me. I really appreciate that people try to invest their precious time.
I would like to tell you how I managed to fix the problem with the slow update, insert and delete queries.
I added this value to the my.cnf file:
innodb_flush_log_at_trx_commit = 2
After I restarted the mysql server, the server load dropped suddenly and the update, insert and delete queries also dropped from about 0.22222 - 0.91922 seconds to 0.000013 under load. Just like it was before with Myisam and Mysql and how it should be for so simple updates with a index.
I have to mention that I have set all tables that receive frequent insert or update commands to INNODB and those with many selects to ARIA.
Since we don't handle money transactions, it's no a problem for me if we lose last seconds due to
innodb_flush_log_at_trx_commit = 2
I go even further. I can also live with it if we lose the last 30 seconds in a failure.
So I have also set:
innodb_flush_log_at_timeout = 30
I'm currently testing
innodb_flush_log_at_trx_commit = 0
But so far, I do not see a significant improvement with
innodb_flush_log_at_timeout = 30
innodb_flush_log_at_trx_commit = 0
instead of
innodb_flush_log_at_timeout = 1 (default)
innodb_flush_log_at_trx_commit = 2
So the main goal was:
innodb_flush_log_at_trx_commit = 2
or
innodb_flush_log_at_trx_commit = 0
Does anyone know, why:
innodb_flush_log_at_timeout = 30
innodb_flush_log_at_trx_commit = 0
is not faster then just
innodb_flush_log_at_trx_commit = 2
?
I also dont understand, why this settings are not more popular because many websites could have big improvements in case of speed, if they dont mind of loosing a second or more.
Thank you very much.

MySQL queries very slow - occasionally

I'm running MariaDB 10.2.31 on Ubuntu 18.4.4 LTS.
On a regular basis I encounter the following conundrum - especially when starting out in the morning, that is when my DEV environment has been idle for the night - but also during the day from time to time.
I have a table (this applies to other tables as well) with approx. 15.000 rows and (amongst others) an index on a VARCHAR column containing on average 5 to 10 characters.
Notably, most columns including this one are GENERATED ALWAYS AS (JSON_EXTRACT(....)) STORED since 99% of my data comes from a REST API as JSON-encoded strings (and conveniently I simply store those in one column and extract everything else).
When running a query on that column WHERE colname LIKE 'text%' I find query-result durations of i.e. 0.006 seconds. Nice. When I have my query EXPLAINed, I can see that the index is being used.
However, as I have mentioned, when I start out in the morning, this takes way longer (14 seconds this morning). I know about the query cache and I tried this with query cache turned off (both via SET GLOBAL query_cache_type=OFF and RESET QUERY CACHE). In this case I get consistent times of approx. 0.3 seconds - as expected.
So, what would you recommend I should look into? Is my DB sleeping? Is there such a thing?
There are two things that could be going on:
1) Cold caches (overnight backup, mysqld restart, or large processing job results in this particular index and table data being evicted from memory).
2) Statistics on the table go stale and the query planner gets confused until you run some queries against the table and the statistics get refreshed. You can force an update using ANALYZE TABLE table_name.
3) Query planner heisenbug. Very common in MySQL 5.7 and later, never seen it before on MariaDB so this is rather unlikely.
You can get to the bottom of this by enablign the following in the config:
log_output='FILE'
log_slow_queries=1
log_slow_verbosity='query_plan,explain'
long_query_time=1
Then review what is in the slow log just after you see a slow occurrence. If the logged explain plan looks the same for both slow and fast cases, you have a cold caches issue. If they are different, you have a table stats issue and you need to cron ANALYZE TABLE at the end of the over night task that reads/writes a lot to that table. If that doesn't help, as a last resort, hard code an index hint into your query with FORCE INDEX (index_name).
Enable your slow query log with log_slow_verbosity=query_plan,explain and the long_query_time sufficient to catch the results. See if occasionally its using a different (or no) index.
Before you start your next day, look at SHOW GLOBAL STATUS LIKE "innodb_buffer_pool%" and after your query look at the values again. See how many buffer pool reads vs read requests are in this status output to see if all are coming off disk.
As #Solarflare mentioned, backups and nightly activity might be purging the innodb buffer pool of cached data and reverting bad to disk to make it slow again. As part of your nightly activites you could set innodb_buffer_pool_dump_now=1 to save the pages being hot before scripted activity and innodb_buffer_pool_load_now=1 to restore it.
Shout-out and Thank you to everyone giving valuable insight!
From all the tips you guys gave I think I am starting to understand the problem better and beginning to narrow it down:
First thing I found was my default innodb_buffer_pool_size of 134 MB. With the sort and amount of data I'm processing this is ridiculously low - so I was able to increase it.
Very helpful post: https://dba.stackexchange.com/a/27341
And from the docs: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Now that I have increased it to close to 2GB and am able to monitor its usage and RAM usage in general (cli: cat /proc/meminfo) I realize that my 4GB RAM is in fact on the low side of things. I am nowhere near seeing any unused overhead (buffer usage still at 99% and free RAM around 100MB).
I will start to optimize RAM usage of my daemon next and see where this leads - but this will not free enough RAM altogether.
#danblack mentioned innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. This is an interesting approach to maybe use whenever the daemon accesses the DB as I would love to separate my daemon's buffer usage from the front end's (apparently this is not possible!). I will look into this further but as my daemon is running all the time (not only at night) this might not be feasible.
#Gordan Bobic mentioned "refreshing" DBtables by using ANALYZE TABLE tableName. I found this to be quite fast and incorporated it into the daemon after each time it does an extensive read/write. This increases daemon run times by a few seconds but this is no issue at all. And I figure I can't go wrong with it :)
So, in the end I believe my issue to be a combination of things: Too small buffer size, too small RAM, too many read/write operations for that environment (evicting buffered indexes etc.).
Also I will have to learn more about memory allocation etc and optimize this better (large-pages=1 etc).

Check if buffer was used in last query

I was wondering whether there is any query/config/trick/etc to know if the innodb_buffer was used in the fetching of result for last query.
PS: This is in context of performance tuning, and I don't want to keep things to best guesses, so is there any way to provide a concrete evidence if buffer_pool was used or a normal db lookup was used.
PPS: I already searched for related terms like
check if buffer was used mysql
Innodb buffer used check
Verify if results loaded from buffer pool or datastore. etc..
Watch for changes to GLOBAL STATUS values of Innodb%. Ditto for Handler%.
I like to do this for figuring out what is going on in a query:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
It tells me exactly how many rows are touched how many times. And whether a temp table is being used and how big it is (in rows). Etc.
The only way the buffer_pool won't be "used" in an InnoDB query is if the Query cache is used instead.
Probably what you are fishing for is not whether the buffer_pool is "used", but whether a block had to be fetched from disk before it could be used?

MySQL cache of subset queries

I am attempting to make a query run on a large database in acceptable time. I'm looking at optimizing the query itself (e.g. Clarification of join order for creation of temporary tables), which took me from not being able to complete the query at all (with a 20 hr cap) to completing it but with time that's still not acceptable.
In experimenting, I found the following strange behavior that I'd like to understand: I want to do the query over a time range of 2 years. If I try to run it like that directly, then it still will not complete within the 10 min I'm allowing for the test. If I reduce it to the first 6 months of the range, it will complete pretty quickly. If I then incrementally re-run the query by adding a couple of months to the range (i.e. run it for 8 months, then 10 months, up to the full 2 yrs), each successive attempt will complete and I can bootstrap my way up to being able to get the full two years that I want.
I suspected that this might be possible due to caching of results by the MySQL server, but that does not seem to match the documentation:
If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.7/en/query-cache.html
The key word there seems to be "identical," and the apparent requirement that the queries be identical was reenforced by other reading that I did. (The docs even indicate that the comparison on the query is literal to the point that logically equivalent queries written with "SELECT" vs. "select" would not match.) In my case, each subsequent query contains the full range of the previous query, but no two of them are identical.
Additionally, the tables are updated overnight. So at the end of the day yesterday we had the full, 2-yr query running in 19 sec when, presumably, it was cached since we had by that point obtained the full result at least once. Today we cannot make the query run anymore, which would seem to be consistent with the cache having been invalidated when the table was updated last night.
So the questions: Is there some special case that allows the server to cache in this case? If yes, where is that documented? If not, any suggestion on what else would lead to this behavior?
Yes, there is a cache that optimizes (general) access to the harddrive. It is actually a very important part of every storage based database system, because reading data from (or writing e.g. temporary data to) the harddrive is usually the most relevant bottleneck for most queries.
For InnoDB, this is called the InnoDB Buffer Pool:
InnoDB maintains a storage area called the buffer pool for caching data and indexes in memory. Knowing how the InnoDB buffer pool works, and taking advantage of it to keep frequently accessed data in memory, is an important aspect of MySQL tuning. For information about how the InnoDB buffer pool works, see InnoDB Buffer Pool LRU Algorithm.
You can configure the various aspects of the InnoDB buffer pool to improve performance.
Ideally, you set the size of the buffer pool to as large a value as practical, leaving enough memory for other processes on the server to run without excessive paging. The larger the buffer pool, the more InnoDB acts like an in-memory database, reading data from disk once and then accessing the data from memory during subsequent reads. See Section 15.6.3.2, “Configuring InnoDB Buffer Pool Size”.
There can be (and have been) written books about the buffer pool, how it works and how to optimize it, so I will stop there and just leave you with this keyword and refer you to the documentation.
Basically, your subsequent reads add data to the cache that can be reused until it has been replaced by other data (which in your case has happened the next day). Since (for MySQL) this can be any read of the involved tables and doesn't have to be your maybe complicated query, it might make the "prefetching" easier for you.
Although the following comes with a disclaimer because it obviously can have a negative impact on your server if you change your configuration: the default MySQL configuration is very (very) conservative, and e.g. the innodb_buffer_pool_size system setting is way too low for most servers younger than 15 years, so maybe have a look at your configuration (or let your system administrator check it).
We did some experimentation, including checking the effect from the system noted in the answer by #Solarflare. In our case, we concluded that the apparent caching was real, but it had nothing to do with MySQL at all. It was instead caused by the Linux disk cache. We were able to verify this in our case by manually flushing that cache after and before getting a result and comparing times.

Does executing a statement always take in memory for the result set?

I was told by a colleague that executing an SQL statement always puts the data into RAM/swap by the database server. Thus it is not practical to select large result sets.
I thought that such code
my $sth = $dbh->prepare('SELECT million_rows FROM table');
while (my #data = $sth->fetchrow) {
# process the row
}
retrieves the result set row by row, without it being loaded to RAM.
But I can't find any reference to this in DBI or MySQL docs. How is the result set really created and retrieved? Does it work the same for simple selects and joins?
Your colleague is right.
By default, the perl module DBD::mysql uses mysql_store_result which does indeed read in all SELECT data and cache it in RAM. Unless you change that default, when you fetch row-by-row in DBI, it's just reading them out of that memory buffer.
This is usually what you want unless you have very very large result sets. Otherwise, until you get the last data back from mysqld, it has to hold that data ready and my understanding is that it causes blocks on writes to the same rows (blocks? tables?).
Keep in mind, modern machines have a lot of RAM. A million-row result set is usually not a big deal. Even if each row is quite large at 1 KB, that's only 1 GB RAM plus overhead.
If you're going to process millions of rows of BLOBs, maybe you do want mysql_use_result -- or you want to SELECT those rows in chunks with progressive uses of LIMIT x,y.
See mysql_use_result and mysql_store_result in perldoc DBD::mysql for details.
This is not true (if we are talking about the database server itself, not client layers).
MySQL can buffer the whole resultset, but this is not necessarily done, and if done, not necessarily in RAM.
The resultset is buffered if you are using inline views (SELECT FROM (SELECT …)), the query needs to sort (which is shown as using filesort), or the plan requires creating a temporary table (which is shown as using temporary in the query plan).
Even if using temporary, MySQL only keeps the table in memory when its size does not exceed the limit set in tmp_table. When the table grows over this limit, it is converted from memory into MyISAM and stored on disk.
You, though, may explicitly instruct MySQL to buffer the resultset by appending SQL_BUFFER_RESULT instruction to the outermost SELECT.
See the docs for more detail.
No, that is not how it works.
Database will not hold rows in RAM/swap.
However, it will try, and mysql tries hard here, to cache as much as possible (indexes, results, etc...). Your mysql configuration gives values for the available memory buffers for different kinds of caches (for different kinds of storage engines) - you should not allow this cache to swap.
Test it
Bottom line - it should be very easy to test this using client only (I don't know perl's dbi, it might, but I doubt it, be doing something that forces mysql to load everything on prepare). Anyway... test it:
If you actually issue a prepare on SELECT SQL_NO_CACHE million_rows FROM table and then fetch only few rows out of millions.
You should then compare performance with SELECT SQL_NO_CACHE only_fetched_rows FROM table and see how that fares.
If the performance is comparable (and fast) then I believe that you can call your colleague's bluff.
Also if you enable log of the statements actually issued to mysql and give us a transcript of that then we (non perl folks) can give more definitive answer on what would mysql do.
I am not super familiar with this, but it looks to me like DBD::mysql can either fetch everything up front or only as needed, based on the mysql_use_result attribute. Consult the DBD::mysql and MySQL documentation.