MySQL's default storage engine, InnoDB, maintains an internal buffer pool of database pages. In newer versions of MySQL (e.g. 5.7+) the space and page IDs of the pages in the buffer pool are persisted to disk in the "ib_buffer_pool" file.
I'm curious about how this file is constructed, and in particular if the relative young-ness/old-ness of the pages in the buffer pool persists across restarts. In other words, if some page in the pool is younger than some other page, will that relationship hold after the file is written to, and then read from, the disk?
A broader question is the following: how much of the state of the InnoDB buffer pool persists across restarts?
Most of what you ask does not matter.
That file contains pointers, not data blocks. Each "pointer" probably contain the tablespace id (ibdata1 versus individual .ibd files) and block number. It would be handy, but not absolutely necessary to include the LRU info.
The goal is to quickly refill the RAM-based "buffer pool" after a restart. The buffer pool is a cache; in the past is was simply not reloaded. During normal activity, the blocks in the buffer pool are organized based (somewhat) on "least recently used". This helps prevent bumping out a block "too soon".
If all the block pointers are stored in that file before shutting down, then the buffer pool can be restored to essentially where it was. At restart, this takes some disk activity, but after that, each query should be as fast as if the restart had not occurred..
If, because of whatever, some block is inappropriately reloaded, it will be a minor performance hit, but nothing will be "wrong". That block will soon be bumped out of the buffer pool.
How much state persists across a restart? Well, the absolute requirement is that the integrity of the data in the database be maintained -- even across a power failure. Anything beyond that is just performance optimizations. So, to fully answer the question, one needs to understand iblog* (needed after a crash; not needed after clean shutdown), the new tmp table file (not needed), the "double buffer" (used to recover from 'torn page' after abrupt crash), etc.
Related
I have a multi-threaded (OpenMP) application using an embedded MariaDB database. Sometimes I use Aria and other times I use InnoDB. When I set the page cache buffer size for Aria (aria_pagecache_buffer_size) or the InnoDB buffer pool size (innodb-buffer-pool-size), will this memory be shared across all of the OpenMP threads for the embedded database? Or are they private for each connection?
InnoDB:
The buffer_pool is shared across all connections. (From MariaDB's point of view, "connections" are significant; it does not care about what is going on in the client.)
How much RAM? Keep in mind that Aria and InnoDB don't share much. So you must split most of the memory between the buffer_pool and the pagecache. A lot of the smaller things are 'common'.
The general rule of thumb I've observed is configuring this property to use 70% of available RAM for dedicated SQL servers with over 4GB RAM. However I'm working on what basically amounts to a shared hosting environment experiencing a ton of traffic lately and I want to optimize this - these are dedicated mySQL servers, but have databases for 200-1000 different sites. Should I still configure using this rule?
You may have many tables and schemas in your MySQL instance, but a single buffer pool is used for all of them. It makes no difference what they're used for — one website or many websites or some database that is not for a website at all. Basically everything that is stored in a page of an InnoDB tablespace on that MySQL instance must be loaded into the buffer pool before it can be read or updated.
The recommendation of 70% of available RAM is not a magic number.
For example, it assumes you have a lot more data on storage than can fit in RAM. If you had 100GB of RAM and 2GB of data on storage, it would be unnecessary overkill to make a 70GB buffer pool. The pages from storage will only be copied into the buffer pool once, therefore for 2GB of data, your 70GB buffer pool would be mostly empty.
It also assumes that the remaining 30% of RAM is enough to support your operating system and other processes besides MySQL.
70% is just a starting suggestion. You need to understand your memory needs to size it properly.
By database pages i mean :
https://dev.mysql.com/doc/internals/en/innodb-page-structure.html
Now these pages get loaded to memory when we issue a query against it, and it gets changed there only and get marked as dirty
I'm not sure whether this depends on O.S or Database, but my question is how long do these pages usually stay dirty in memory?
Lets say we have a database for a high load web server with a lot traffic, and the buffer size is like 1gb or something(not sure how much database servers usually have), now how much of these 1gb could be dirty pages?
and if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct? (Basically i want to know if a power outage occurs, if there is no power backup and there are a lot of inserts and queries happening, what is the estimated percentage of dirty data inside memory that is going to get lost?)
For example is there a chance that these dirty pages ever stay more than 12 or 24 hours on busy servers?
EDIT: by dirty pages i mean the page is modified in memory, for example one row inside it is updated or deleted
how long do these pages usually stay dirty in memory?
It's variable. InnoDB has a background thread that flushed dirty pages to disk. It flushes a modest number of pages, then does it again after 1 second.
So if you do a lot of updates in a short space of time, you would make a lot of pages dirty. Then the flushing thread would gradually flush them to disk. The idea is that this helps to stretch the work out over time, so a sudden spike of updates doesn't overwhelm your disk.
But it means that "how long do these pages stay dirty in memory" can vary quite a bit. I think in typical cases, it would be done in a few minutes.
Different versions of MySQL flush in different ways. Years ago, the main background thread flushed a fixed number of pages every 1 second. Then they came up with adaptive flushing, so it would increase the flush rate automatically if it detected you were making a lot of changes. Then they came up with a dedicated thread called the page cleaner. I think it's even possible to configure MySQL to run multiple page cleaner threads, but that's not necessary for most applications.
You might also be interested in my answers to these past questions:
How to calculate amount of work performed by the page cleaner thread each second?
How to solve mysql warning: "InnoDB: page_cleaner: 1000ms intended loop took XXX ms. The settings might not be optimal "?
Lets say ... the buffer size is like 1gb or something(not sure how much database servers usually have)
It really varies and depends on the app. The default innodb buffer pool size out of the box is 128MB, but that's too small for most applications unless it's a test instance.
At my company, we try to maintain the buffer pool at least 10% of the size of data on disk. Some apps need more. The most common size we have is 24GB, but the smallest is 1GB and the largest is 200GB. We manage over 4,000 production MySQL instances.
how much of these 1gb could be dirty pages?
All of them, in theory. MySQL has a config variable calls innodb_max_dirty_pages_pct which you might assume blocks any further dirty pages if you have too many. But it doesn't. You can still modify more pages even if the buffer pool is more dirty (percentage-wise) than that variable.
What the variable really does is if the buffer pool is more than that percent full of dirty pages, the rate of flushing dirty pages is increased (IIRC, it doubles the number of pages it flushes per cycle), until the number falls below that percentage threshold again.
if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct?
Yes, but you won't lose the changes, because they can be reconstructed from the InnoDB redo log -- those two files iblogfile_0 and iblogfile_1 you may have seen in your data dir. Any transaction that created a dirty page must be logged in the redo log during commit.
If you have a power loss (or other kind of restart of the mysqld process), the first thing InnoDB does is scan the redo log to check that every change logged was either flushed before the crash, or else if not, load the original page and reapply the change from the log to make the dirty page again. That's what InnoDB calls crash recovery.
You can watch this happening. Tail the error log on a test instance of MySQL Server, while you kill -9 the mysqld process. mysqld_safe will restart the mysqld process, which will spew a bunch of information into the error log as it performs crash recovery.
If there was only a small amount of dirty pages to recover, this will be pretty quick, perhaps only seconds. If the buffer pool was large and had a lot of dirty pages, it'll take longer. The MySQL Server isn't fully started, and cannot take new client connections, until crash recovery is complete. This has caused many MySQL DBA's many minutes of anxiety while watching the progress of the crash recovery. There's no way to predict how long it takes after a crash.
Since the redo log is needed for crash recovery, if the redo log fills up, MySQL must flush some dirty pages. It won't allow dirty pages to be un-flushed and also unrecoverable from the redo log. If this happens, you'll actually see writes paused by InnoDB until it can do a kind of "emergency flush" of the oldest dirty pages. This used to be a problem for MySQL, but with improvements like adaptive flushing and the page cleaner, it can keep up with the pace of changes much better. You'd have to have a really extraordinary number of writes, and an undersized redo log to experience a hard stop on InnoDB while it does a sync flush.
Here's a good blog about flushing: https://www.percona.com/blog/2011/04/04/innodb-flushing-theory-and-solutions/
P.S.: For an obligatory bash against MyISAM, I'll point out that MyISAM doesn't have a redo log, doesn't have crash recovery, and relies on the host OS file buffer during writes to its data files. If your host has a power failure while there are pending writes in the file buffer and not yet written to disk, you will lose them. MyISAM does not have any real support for the Durability property of ACID.
Re your comment:
A page will probably be flushed by the time the redo log recycles. That is, if you have 2x 48MB redo log files (the default size), and you write enough transactions to it to cycle completely through it and start over at the beginning, any pages in the buffer pool made dirty during that time will need to be flushed. A page cannot remain dirty in the BP if the respective transaction in the redo log is overwritten with new transactions.
As far as I understand, it would be virtually impossible for a dirty page to stay dirty in the buffer pool without being flushed for 12-24 hours.
The possible exception, and I'm just speculating about this, is that a given page gets updated again and again before it's flushed. Therefore it remains a recent dirty page for a long time. Again, I don't know for sure if this overcomes the need to flush a page when the redo log recycles.
Regardless, I think it's highly unlikely.
Also, I'm not sure what you mean by forensic. There's no direct way to examine page versions from the buffer pool. To get info about recent changes from InnoDB, you'd need to examine the undo segment to find previous versions of pages, and correlated them with redo log entries. The dirty page and its previous versions can both be in the buffer pool, or on disk. There's no commands or API or any data structure to do any of that correlation. So you'd be doing manual dumps of both disk images and memory images, and following pointers manually.
A much easier way of tracing data changes is by examining the stream of changes in the binary log. That's independent of InnoDB.
I've tried reading some of the material out there but they are a bit over my head. However, from what I understand, if you have a lot of memory allocated to the buffer pool, then the writes to memory are happening faster than the disk can keep up with, and therefore, there are "pages to be flushed" still? Additionally, if I restart the mySQL server, will that cause any issues?
InnoDB performs certain tasks in the background, including flushing of dirty pages (those pages that have been changed but are not yet written to the database files) from the buffer pool, a task performed by the master thread.
For more information you can refer:
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html#innodb-performance-adaptive_flushing
Having dirty pages is something normal. When you update a row, MySQL updates it in the buffer pool, marking the page as dirty. The change is written in the binary log as well, so in case of crash, MySQL will replay the log and data won't be lost. Writting to the binary log is a append-only operation, while the actual update involve random writes, and random write is slower. MySQL flushes dirty pages to disk when it needs to load new data in the buffer pool. So, having dirty pages in InnoDB is something normal - it's how it works and it's done to improve the overall performance.But if you really want to get rid of them, set innodb_max_dirty_pages_pct value to 0
If you are using MySQL v5.6 then you can enable this variable innodb_buffer_pool_dump_at_shutdown which
Specifies whether to record the pages cached in the InnoDB buffer pool when the MySQL server is shut down, to shorten the warmup process at the next restart. you must use this variable in conjunction with innodb_buffer_pool_load_at_startup.
I've noticed that mysql (5.0.60) often freezes for up to minutes at a time under load, during which time the server is completely non-responsive. I've been able to isolate this to only happening when innotop shows the main thread state as "making checkpoint".
What is the server actually doing at this point?
Checkpointing is when the database server commits all in-memory changes to disk.
This one answers your question and also has some solutions to your problems:
As you might know Innodb storage engines uses Fuzzy Checkpointing technique as part of it recovery strategy. It is very nice approach which means database never needs to “stall” to perform total modified pages flush but instead flushing of dirty pages happens gradually in small chunks so database load is very even.
http://www.mysqlperformanceblog.com/2006/05/10/innodb-fuzzy-checkpointing-woes/
I thought I'd expand on Pradeep's comment:
InnoDB's (default) fuzzy checkpointing can get behind on flushing dirty pages to disk very easily*. This can create a problem when the end of a log file is approached, and a checkpoint is forced.
This is a well known InnoDB problem - and there are third party patches available to help. In XtraDB, innodb_adaptive_checkpoint speeds up the page flushing as the end becomes nearer:
http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_io
The explanation why is a little more complex. The main thread has some hardcoded constants to determine if the server is to "busy" to flush pages, such as if more than 100 IO operations/second have already occurred ("IOPS"). 100 IOPS is of course about the number of operations a single 7200RPM disk can complete, and might not make sense if your server has RAID, and multiple disks. XtraDB also has the option for describing innodb_io_capacity.
A similar patch has been released by InnoBase in the 1.0.4 Plugin (not in an official MySQL release yet).