I am attempting to make a query run on a large database in acceptable time. I'm looking at optimizing the query itself (e.g. Clarification of join order for creation of temporary tables), which took me from not being able to complete the query at all (with a 20 hr cap) to completing it but with time that's still not acceptable.
In experimenting, I found the following strange behavior that I'd like to understand: I want to do the query over a time range of 2 years. If I try to run it like that directly, then it still will not complete within the 10 min I'm allowing for the test. If I reduce it to the first 6 months of the range, it will complete pretty quickly. If I then incrementally re-run the query by adding a couple of months to the range (i.e. run it for 8 months, then 10 months, up to the full 2 yrs), each successive attempt will complete and I can bootstrap my way up to being able to get the full two years that I want.
I suspected that this might be possible due to caching of results by the MySQL server, but that does not seem to match the documentation:
If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.7/en/query-cache.html
The key word there seems to be "identical," and the apparent requirement that the queries be identical was reenforced by other reading that I did. (The docs even indicate that the comparison on the query is literal to the point that logically equivalent queries written with "SELECT" vs. "select" would not match.) In my case, each subsequent query contains the full range of the previous query, but no two of them are identical.
Additionally, the tables are updated overnight. So at the end of the day yesterday we had the full, 2-yr query running in 19 sec when, presumably, it was cached since we had by that point obtained the full result at least once. Today we cannot make the query run anymore, which would seem to be consistent with the cache having been invalidated when the table was updated last night.
So the questions: Is there some special case that allows the server to cache in this case? If yes, where is that documented? If not, any suggestion on what else would lead to this behavior?
Yes, there is a cache that optimizes (general) access to the harddrive. It is actually a very important part of every storage based database system, because reading data from (or writing e.g. temporary data to) the harddrive is usually the most relevant bottleneck for most queries.
For InnoDB, this is called the InnoDB Buffer Pool:
InnoDB maintains a storage area called the buffer pool for caching data and indexes in memory. Knowing how the InnoDB buffer pool works, and taking advantage of it to keep frequently accessed data in memory, is an important aspect of MySQL tuning. For information about how the InnoDB buffer pool works, see InnoDB Buffer Pool LRU Algorithm.
You can configure the various aspects of the InnoDB buffer pool to improve performance.
Ideally, you set the size of the buffer pool to as large a value as practical, leaving enough memory for other processes on the server to run without excessive paging. The larger the buffer pool, the more InnoDB acts like an in-memory database, reading data from disk once and then accessing the data from memory during subsequent reads. See Section 15.6.3.2, “Configuring InnoDB Buffer Pool Size”.
There can be (and have been) written books about the buffer pool, how it works and how to optimize it, so I will stop there and just leave you with this keyword and refer you to the documentation.
Basically, your subsequent reads add data to the cache that can be reused until it has been replaced by other data (which in your case has happened the next day). Since (for MySQL) this can be any read of the involved tables and doesn't have to be your maybe complicated query, it might make the "prefetching" easier for you.
Although the following comes with a disclaimer because it obviously can have a negative impact on your server if you change your configuration: the default MySQL configuration is very (very) conservative, and e.g. the innodb_buffer_pool_size system setting is way too low for most servers younger than 15 years, so maybe have a look at your configuration (or let your system administrator check it).
We did some experimentation, including checking the effect from the system noted in the answer by #Solarflare. In our case, we concluded that the apparent caching was real, but it had nothing to do with MySQL at all. It was instead caused by the Linux disk cache. We were able to verify this in our case by manually flushing that cache after and before getting a result and comparing times.
Related
There a few large tables in one of the databases of a customer (each table is ~50M rows in size and is not too wide). The intent is to infrequently read these tables (completely). As there are no reasonable CDC indices present, the plan is to read the tables by querying them
SELECT * from large_table;
The reads will be performed using a jdbc driver. With the following fetch configuration present, the intent is to read the data approximately one record at a time (it may require a significant amount of time) so that the client code is never overwhelmed.
PreparedStatement stmt = connection.prepareStatement(queryString, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
I was going through the execution path of a query in High Performance MySQL, however some questions seemed unanswered:
Without the temp tables being explicitly created and the query cache being made use of, "how" are the stream reads tracked on the server?
Is any temporary data created (in main memory or files on disk) whatsoever? If so, where is it created and how much?
If temporary data is not created, how are the rows to be returned tracked? Does the query engine keep track of all the page files to be read for this query on this connection? In case there are several such queries running on the server, are the earliest "Tracked" files purged in favor of queries submitted recently?
PS: I want to understand the effect of this approach on the MySql server (not saying that there aren't better ways of reading the tables)
That simple query will not use a temp table. It will simply fetch the rows and transfer them to the client until it finishes. Nor would any possible index be useful. (If the real query is more complex, let's see it.)
The client may wait for all the rows (faster, but memory intensive) before it hands any to the user code, or it may hand them off one at a time (much slower).
I don't know the details in JDBC on specifying it.
You may want to page through the table. If so, don't use OFFSET, but use the PRIMARY KEY and "remember where you left off". More discussion: http://mysql.rjweb.org/doc.php/pagination
Your Question #3 leads to a complex answer...
Every query brings all the relevant data (and index entries) into RAM. The data/index is read in chunks ("blocks") of 16KB from the BTree structure that is persisted on disk. For a simple select like that, it will read the blocks 'sequentially' until finished.
But, be aware of "caching":
If a block is already in RAM, no I/O is needed.
If a block is not in the cache ("buffer_pool"), it will, if necessary, bump some block out and read the desired block in. This is very normal, and very common. Do not fear it.
Because of the simplicity of the query, only a few blocks ever need to be in RAM at any moment. Hence, if your buffer pool were only a few megabytes, it could still handle, say, a 1TB table. There would be a lot of I/O, and that would impact other operations.
As for "tracking", let me use the analogy of reading a long book in a single sitting. There is nothing to track, you are simply turning pages ('blocks'). You don't even need a 'bookmark' for tracking, it is next-next-next...
Another note: InnoDB uses "B+Tree", which includes a link from one block to the "next", thereby making the page turning efficient.
Another interpretation of tracking... "Transactions" and "ACID". When any query (read or write) touches a table, there is some form of lock applied to each row touched. For SELECT the lock is rather light-weight. For writes it can cause delays or even a "deadlock". The locks are unavoidable, but sometimes actions can be taken to minimize their impact.
Logically (but not actually), a "snapshot" of all rows in all tables is taken at the instant you start a transaction. This allows you to see a consistent view of everything, even if other connections are changing rows. The underlying mechanism is very lightweight on reading, but heavier for writes. Writes will make a copy of the row so that each connection sees the snapshot that it 'should' see. Also, the copy allows for ROLLBACK and recovery from a crash (eg power failure).
(Transaction "isolation" mode allows some control over the snapshot.) To get the optimal performance for your case, do nothing special.
Here's a way to conceptualize the handling of transactions: Each row has a timestamp associated with it. Each query saves the start time of the query. The query can "see" only rows that are older than that start time. A subsequent write in another connection will be creating copies of rows with a later timestamp, hence not visible to the SELECT. Hence, the onus is on writes to do extra work; reads are cheap.
I'm running MariaDB 10.2.31 on Ubuntu 18.4.4 LTS.
On a regular basis I encounter the following conundrum - especially when starting out in the morning, that is when my DEV environment has been idle for the night - but also during the day from time to time.
I have a table (this applies to other tables as well) with approx. 15.000 rows and (amongst others) an index on a VARCHAR column containing on average 5 to 10 characters.
Notably, most columns including this one are GENERATED ALWAYS AS (JSON_EXTRACT(....)) STORED since 99% of my data comes from a REST API as JSON-encoded strings (and conveniently I simply store those in one column and extract everything else).
When running a query on that column WHERE colname LIKE 'text%' I find query-result durations of i.e. 0.006 seconds. Nice. When I have my query EXPLAINed, I can see that the index is being used.
However, as I have mentioned, when I start out in the morning, this takes way longer (14 seconds this morning). I know about the query cache and I tried this with query cache turned off (both via SET GLOBAL query_cache_type=OFF and RESET QUERY CACHE). In this case I get consistent times of approx. 0.3 seconds - as expected.
So, what would you recommend I should look into? Is my DB sleeping? Is there such a thing?
There are two things that could be going on:
1) Cold caches (overnight backup, mysqld restart, or large processing job results in this particular index and table data being evicted from memory).
2) Statistics on the table go stale and the query planner gets confused until you run some queries against the table and the statistics get refreshed. You can force an update using ANALYZE TABLE table_name.
3) Query planner heisenbug. Very common in MySQL 5.7 and later, never seen it before on MariaDB so this is rather unlikely.
You can get to the bottom of this by enablign the following in the config:
log_output='FILE'
log_slow_queries=1
log_slow_verbosity='query_plan,explain'
long_query_time=1
Then review what is in the slow log just after you see a slow occurrence. If the logged explain plan looks the same for both slow and fast cases, you have a cold caches issue. If they are different, you have a table stats issue and you need to cron ANALYZE TABLE at the end of the over night task that reads/writes a lot to that table. If that doesn't help, as a last resort, hard code an index hint into your query with FORCE INDEX (index_name).
Enable your slow query log with log_slow_verbosity=query_plan,explain and the long_query_time sufficient to catch the results. See if occasionally its using a different (or no) index.
Before you start your next day, look at SHOW GLOBAL STATUS LIKE "innodb_buffer_pool%" and after your query look at the values again. See how many buffer pool reads vs read requests are in this status output to see if all are coming off disk.
As #Solarflare mentioned, backups and nightly activity might be purging the innodb buffer pool of cached data and reverting bad to disk to make it slow again. As part of your nightly activites you could set innodb_buffer_pool_dump_now=1 to save the pages being hot before scripted activity and innodb_buffer_pool_load_now=1 to restore it.
Shout-out and Thank you to everyone giving valuable insight!
From all the tips you guys gave I think I am starting to understand the problem better and beginning to narrow it down:
First thing I found was my default innodb_buffer_pool_size of 134 MB. With the sort and amount of data I'm processing this is ridiculously low - so I was able to increase it.
Very helpful post: https://dba.stackexchange.com/a/27341
And from the docs: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html
Now that I have increased it to close to 2GB and am able to monitor its usage and RAM usage in general (cli: cat /proc/meminfo) I realize that my 4GB RAM is in fact on the low side of things. I am nowhere near seeing any unused overhead (buffer usage still at 99% and free RAM around 100MB).
I will start to optimize RAM usage of my daemon next and see where this leads - but this will not free enough RAM altogether.
#danblack mentioned innodb_buffer_pool_dump_now and innodb_buffer_pool_load_now. This is an interesting approach to maybe use whenever the daemon accesses the DB as I would love to separate my daemon's buffer usage from the front end's (apparently this is not possible!). I will look into this further but as my daemon is running all the time (not only at night) this might not be feasible.
#Gordan Bobic mentioned "refreshing" DBtables by using ANALYZE TABLE tableName. I found this to be quite fast and incorporated it into the daemon after each time it does an extensive read/write. This increases daemon run times by a few seconds but this is no issue at all. And I figure I can't go wrong with it :)
So, in the end I believe my issue to be a combination of things: Too small buffer size, too small RAM, too many read/write operations for that environment (evicting buffered indexes etc.).
Also I will have to learn more about memory allocation etc and optimize this better (large-pages=1 etc).
İ am trying to understand the mysql architecture and I came acrosa two notions.
The first one is query cache, which I understood that it stores the queries that were run at least once, and if the query processor sees the query cached there, it no longer goes to the parser and takes the results directly to the cache.
But then, there is also the buffer pool, part of the Storage Engine buffer manager, which kinda does the same thing from my understanding.
So my question would be, if there is a cache in the logical layer, why do we need one in the physical layer also? İ am thinking that if a query is found in the query cache it will never be searched in the buffer pool, and if the query is not found in cache, then it will never be also retreived from the buffer pool. Am I missing something?
For query cache, you got it spot on. Its based on the raw text of the query mapping to the exact query results. It has major scaling problems which is why MySQL-8.0 removed it.
innodb buffer pool, is a storage of the low level data and index pages of the database. It ensures that all the recently used data is off disk and able to be queried without resorting to the much slower (by comparison to ram) storage.
So buffer pools serve all queries on the same data, while query caches only serve a particular query (at a large scaleability cost).
Adding some context to #danblack's answer, query cache stores the query and actual data associated with the query. But in buffer pool which we call as innodb_buffer_pool stores the physical (01,10) or low-level data or say pages. Whenever query executes it checks in the buffer pool and if required data is not present then it will proceed towards the disk (i.e. your secondary storage) and puts data in the buffer pool.
With query cache, there is a disadvantage of invalidating query cache if query cache size being set quite high without analyzing the situations. By "invalidating query cache" I mean marking the data or entry in query cache as invalid because the underlying table has been changed by DML statements. I have personally experienced many times for example under "show processlist" when replication is stuck for long at this particular state i.e. invalidation query cache and once it invalidates all the entries, things start catching up.
"Why do we need one in the physical layer?"
It is because having data in query cache can seriously impact the performance IF underlying table changes quite often which can affect the overall database performance. So if your table is not changing frequently query cache is useful. But now the concept of query cache has been removed in MySQL 8 (which is not a part of the discussion).
Bufferpool is only used to store pages coming from the secondary store.
CPU can not fetch data from secondary storage so the Database management system makes a pool in RAM and then CPU keeps access data from this buffer pool from RAM.
and DBMS uses a replacement algorithm to replace pages from this buffer pool.
Cache of data is something else.
There are other data structs and techniques for data cache.
the database size is only 200MB. I tried repairing it with build in tools in phpmyadmin it does help but it goes back to a higher disk usage.
OS: Windows 10
The most likely explanation is that you're running a lot of queries that use temporary space. But this is just a guess, because you have provided no information about what's running in your MySQL Server.
It's common for MySQL queries that use GROUP BY or ORDER BY or DISTINCT to use temporary space. Sometimes a lot of temporary space, or sometimes a little space but you have a lot of queries per second. This could account for the rate of disk I/O reported.
You can reduce this by optimizing your queries carefully so they use indexes instead of sorting or grouping in temporary space on disk. The topic of query optimization is large, and too much to cover in a Stack Overflow post.
If you need help optimizing a specific query, you should ask a question with that query, along with the output of SHOW CREATE TABLE and SHOW TABLE STATUS for each table in the query.
There are other uses of disk I/O that can happen rapidly.
For example, if your buffer pool is too small, and your queries scan your full 200MB of data repeatedly, then the MySQL Server is forced to replace pages in the buffer pool over and over again, reading the same data from disk every time. If you can allocate more RAM to the buffer pool, you can mitigate this rate of page recycling, and probably speed up your overall performance a lot.
Another example is if you are using UPDATE to change data over and over again, you could be writing a lot of I/O to the transaction log and the binary log, even though the total size of your data doesn't grow.
Tuning a database server is a complex process. If you need help tuning your MySQL options, it would be best to ask on dba.stackexchange.com. Include your MySQL version, and your current options in mysql.ini.
I'm perf tuning a large query, and want to run it from the same baseline before and after, for comparison.
I know about the mysql query cache, but its not relevant to me, since the 2 queries would not be cached anyway.
What is being cached, is the innodb pages, in the buffer pool.
Is there a way to clear the entire buffer pool so I can compare the two queries from the same starting point?
Whilst restarting the mysql server after running each query would no doubt work, Id like to avoid this if possible
WARNING : The following only works for MySQL 5.5 and MySQL 5.1.41+ (InnoDB Plugin)
Tweak the duration of entries in the InnoDB Buffer Pool with these settings:
// This is 0.25 seconds
SET GLOBAL innodb_old_blocks_time=250;
SET GLOBAL innodb_old_blocks_pct=5;
SET GLOBAL innodb_max_dirty_pages_pct=0;
When you are done testing, setting them back to the defaults:
SET GLOBAL innodb_old_blocks_time=0;
SET GLOBAL innodb_old_blocks_pct=37;
SET GLOBAL innodb_max_dirty_pages_pct=90;
// 75 for MySQL 5.5/MySQL 5.1 InnoDB Plugin
Check out the definition of these settings
MySQL 5.5
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
MySQL 5.1.41+
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
Much simpler... Run this twice
SELECT SQL_NO_CACHE ...;
And look at the second timing.
The first one warms up the buffer_pool; the second one avoids the QC by having SQL_NO_CACHE. (In MySQL 8.0, leave off SQL_NO_CACHE; it is gone.)
So the second timing is a good indication of how long it takes in a production system with a warm cache.
Further, Look at Handler counts
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handlers%';
gives a reasonably clear picture of how many rows are touched. That, in turn, gives you a good feel for how much effort the query takes. Note that this can be run quite successfully (and quickly) on small datasets. Then you can (often) extrapolate to larger datasets.
A "Handler_read" might be reading an index row or a data row. It might be the 'next' row (hence probably cached in the block that was read for the previous row), or it might be random (hence possibly subject to another disk hit). That is, the technique fails to help much with "how many blocks are needed".
This Handler technique is impervious to what else is going on; it gives consistent results.
"Handler_write" indicates that a tmp table was needed.
Numbers that approximate the number of rows in the table (or a multiple of such), probably indicate a table scan(s). A number that is the same as LIMIT might mean that you build such a good index that it consumed the LIMIT into itself.
If you do flush the buffer_pool, you could watch for changes in Innodb_buffer_pool_reads to give a precise(?) count of the number of pages read in a cold system. This would include non-leaf index pages, which are almost always cached. If anything else is going on in the system, this STATUS value should not be trusted because it is 'global', not 'session'.