MySQL Redo log slower than Buffer pool - mysql

I have a question to ask friend about MySQL.
Whether the redo log buffer is persisted to disks more slowly than the buffer pool?
When the transaction is not committed, the system gose down, Is it possible that the events in the redo log buffer are not persisted yet? but the dirty pages in the buffer poll are already persisted to disk.
I did not find the relevant documentation, please forgive me for being a novice.
Thanks.

InnoDB is designed to favor COMMIT over ROLLBACK. That is, when a transaction finishes, there is very little more to do if you are COMMITting. But a lot to do if you are ROLLBACKing.
Rollback must undo all the inserts/deletes/updates that were optimistically performed and essentially completed.
For COMMIT, the redo log is eventually thrown away. For ROLLBACK it is read and actions are taken based on what is in the log.
Also, note that an UPDATE or DELETE of a million rows generates a lot of redo log entries, hence will take a really long time to undo. Perhaps we should discuss what query you were ROLLBACKing. There may be a more efficient way to design the data flow.
Another thing to note is that all changes to the data and indexes happen in the buffer_pool. If the big query needed to change so much that it overflowed the buffer_pool, those dumped blocks will need to be reloaded to undo.

Related

Why we still need innodb redo log when mysql binlog has been enabled?

In my understanding, mysql binlog can fully function as InnoDB's redo log.
So, after the binlog is enabled, why does InnoDB have to write a redo log at the same time instead of just switching to use the binlog? Doesn't this significantly slow down database write performance?
In addition to simplifying design and implementation, is there any benefit to doing this?
AFAIK, to enable two logs at the same time as the ACID compliance is guaranteed, the following problems will occur:
Each log record with the same meaning must be written twice separately.
Flush two logs each time a transaction or transaction group commits.
To ensure consistency between the two log files, a complex and inefficient way such as XA (2PC) is used.
Therefore, all other products seem to use only one set of logs (SQL Server called Transaction log, ORACLE called redo log, and PostgreSQL called WAL) to do all the relevant work. Is it only MySQL that must open two sets of logs at the same time to ensure both ACID compliance and strong consistent master-slave replication?
Is there a way to implement ACID compliance and strong consistent semi-synchronous replication while only one of them is enabled?
This is an interesting topic. For a long time, I have been advocating the idea of merging the InnoDB write-ahead log and the binlog. The biggest motivation for that would be that the need to synchronize two separate logs would go away. But, I am afraid that this might not happen any time soon.
At MariaDB, we are taking some steps to reduce the fsync() overhead. The idea of MDEV-18959 Engine transaction recovery through persistent binlog is to guarantee that the binlog is never behind the InnoDB redo log, and by this, to allow a durable, crash-safe transaction commit with only one fsync() call, on the binlog file.
While the binlog implements logical logging, the InnoDB redo log implements physical logging (covering changes to persistent data pages that implement undo logs and index trees). As I explained in M|18 Deep Dive: InnoDB Transactions and Write Paths, a user transaction is divided into multiple mini-transactions, each of which can atomically modify multiple data pages.
The redo log is the ‘glue’ that makes changes to multiple data pages atomic. I think that the redo log is absolutely essential for implementing atomic changes of update-in-place data structures. Append-only data file structures, such as LSM trees, could be logs by themselves and would not necessarily need a separate log.
For an InnoDB table that contains secondary indexes, every single row operation is actually divided into multiple mini-transactions, operating on each index separately. Thus, the transaction layer requires more ‘glue’ that makes the indexes of a table consistent with each other. That ‘glue’ is provided by the undo log, which is implemented in persistent data pages.
InnoDB performs changes to the index pages upfront, and commit is a quick operation, merely changing the state of the transaction in the undo log header. But rollback is very expensive, because the undo log will have to be replayed backwards (and more redo log will be written to cover those index page changes).
In MariaDB Server, MyRocks is another transactional storage engine, which does the opposite: Buffer changes in memory until the very end, and at commit, apply them to the data files. This makes rollback very cheap, but the size of a transaction is limited by the amount of available memory. I have understood that MyRocks could be made to work in the way that you propose.

How long do dirty database pages usually stay inside memory before getting flushed back to disk in InnoDB MySQL?

By database pages i mean :
https://dev.mysql.com/doc/internals/en/innodb-page-structure.html
Now these pages get loaded to memory when we issue a query against it, and it gets changed there only and get marked as dirty
I'm not sure whether this depends on O.S or Database, but my question is how long do these pages usually stay dirty in memory?
Lets say we have a database for a high load web server with a lot traffic, and the buffer size is like 1gb or something(not sure how much database servers usually have), now how much of these 1gb could be dirty pages?
and if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct? (Basically i want to know if a power outage occurs, if there is no power backup and there are a lot of inserts and queries happening, what is the estimated percentage of dirty data inside memory that is going to get lost?)
For example is there a chance that these dirty pages ever stay more than 12 or 24 hours on busy servers?
EDIT: by dirty pages i mean the page is modified in memory, for example one row inside it is updated or deleted
how long do these pages usually stay dirty in memory?
It's variable. InnoDB has a background thread that flushed dirty pages to disk. It flushes a modest number of pages, then does it again after 1 second.
So if you do a lot of updates in a short space of time, you would make a lot of pages dirty. Then the flushing thread would gradually flush them to disk. The idea is that this helps to stretch the work out over time, so a sudden spike of updates doesn't overwhelm your disk.
But it means that "how long do these pages stay dirty in memory" can vary quite a bit. I think in typical cases, it would be done in a few minutes.
Different versions of MySQL flush in different ways. Years ago, the main background thread flushed a fixed number of pages every 1 second. Then they came up with adaptive flushing, so it would increase the flush rate automatically if it detected you were making a lot of changes. Then they came up with a dedicated thread called the page cleaner. I think it's even possible to configure MySQL to run multiple page cleaner threads, but that's not necessary for most applications.
You might also be interested in my answers to these past questions:
How to calculate amount of work performed by the page cleaner thread each second?
How to solve mysql warning: "InnoDB: page_cleaner: 1000ms intended loop took XXX ms. The settings might not be optimal "?
Lets say ... the buffer size is like 1gb or something(not sure how much database servers usually have)
It really varies and depends on the app. The default innodb buffer pool size out of the box is 128MB, but that's too small for most applications unless it's a test instance.
At my company, we try to maintain the buffer pool at least 10% of the size of data on disk. Some apps need more. The most common size we have is 24GB, but the smallest is 1GB and the largest is 200GB. We manage over 4,000 production MySQL instances.
how much of these 1gb could be dirty pages?
All of them, in theory. MySQL has a config variable calls innodb_max_dirty_pages_pct which you might assume blocks any further dirty pages if you have too many. But it doesn't. You can still modify more pages even if the buffer pool is more dirty (percentage-wise) than that variable.
What the variable really does is if the buffer pool is more than that percent full of dirty pages, the rate of flushing dirty pages is increased (IIRC, it doubles the number of pages it flushes per cycle), until the number falls below that percentage threshold again.
if the power is lost with no backup power, then all of the changes to these dirty pages get lost correct?
Yes, but you won't lose the changes, because they can be reconstructed from the InnoDB redo log -- those two files iblogfile_0 and iblogfile_1 you may have seen in your data dir. Any transaction that created a dirty page must be logged in the redo log during commit.
If you have a power loss (or other kind of restart of the mysqld process), the first thing InnoDB does is scan the redo log to check that every change logged was either flushed before the crash, or else if not, load the original page and reapply the change from the log to make the dirty page again. That's what InnoDB calls crash recovery.
You can watch this happening. Tail the error log on a test instance of MySQL Server, while you kill -9 the mysqld process. mysqld_safe will restart the mysqld process, which will spew a bunch of information into the error log as it performs crash recovery.
If there was only a small amount of dirty pages to recover, this will be pretty quick, perhaps only seconds. If the buffer pool was large and had a lot of dirty pages, it'll take longer. The MySQL Server isn't fully started, and cannot take new client connections, until crash recovery is complete. This has caused many MySQL DBA's many minutes of anxiety while watching the progress of the crash recovery. There's no way to predict how long it takes after a crash.
Since the redo log is needed for crash recovery, if the redo log fills up, MySQL must flush some dirty pages. It won't allow dirty pages to be un-flushed and also unrecoverable from the redo log. If this happens, you'll actually see writes paused by InnoDB until it can do a kind of "emergency flush" of the oldest dirty pages. This used to be a problem for MySQL, but with improvements like adaptive flushing and the page cleaner, it can keep up with the pace of changes much better. You'd have to have a really extraordinary number of writes, and an undersized redo log to experience a hard stop on InnoDB while it does a sync flush.
Here's a good blog about flushing: https://www.percona.com/blog/2011/04/04/innodb-flushing-theory-and-solutions/
P.S.: For an obligatory bash against MyISAM, I'll point out that MyISAM doesn't have a redo log, doesn't have crash recovery, and relies on the host OS file buffer during writes to its data files. If your host has a power failure while there are pending writes in the file buffer and not yet written to disk, you will lose them. MyISAM does not have any real support for the Durability property of ACID.
Re your comment:
A page will probably be flushed by the time the redo log recycles. That is, if you have 2x 48MB redo log files (the default size), and you write enough transactions to it to cycle completely through it and start over at the beginning, any pages in the buffer pool made dirty during that time will need to be flushed. A page cannot remain dirty in the BP if the respective transaction in the redo log is overwritten with new transactions.
As far as I understand, it would be virtually impossible for a dirty page to stay dirty in the buffer pool without being flushed for 12-24 hours.
The possible exception, and I'm just speculating about this, is that a given page gets updated again and again before it's flushed. Therefore it remains a recent dirty page for a long time. Again, I don't know for sure if this overcomes the need to flush a page when the redo log recycles.
Regardless, I think it's highly unlikely.
Also, I'm not sure what you mean by forensic. There's no direct way to examine page versions from the buffer pool. To get info about recent changes from InnoDB, you'd need to examine the undo segment to find previous versions of pages, and correlated them with redo log entries. The dirty page and its previous versions can both be in the buffer pool, or on disk. There's no commands or API or any data structure to do any of that correlation. So you'd be doing manual dumps of both disk images and memory images, and following pointers manually.
A much easier way of tracing data changes is by examining the stream of changes in the binary log. That's independent of InnoDB.

Semantics of ib_buffer_pool file in MySQL

MySQL's default storage engine, InnoDB, maintains an internal buffer pool of database pages. In newer versions of MySQL (e.g. 5.7+) the space and page IDs of the pages in the buffer pool are persisted to disk in the "ib_buffer_pool" file.
I'm curious about how this file is constructed, and in particular if the relative young-ness/old-ness of the pages in the buffer pool persists across restarts. In other words, if some page in the pool is younger than some other page, will that relationship hold after the file is written to, and then read from, the disk?
A broader question is the following: how much of the state of the InnoDB buffer pool persists across restarts?
Most of what you ask does not matter.
That file contains pointers, not data blocks. Each "pointer" probably contain the tablespace id (ibdata1 versus individual .ibd files) and block number. It would be handy, but not absolutely necessary to include the LRU info.
The goal is to quickly refill the RAM-based "buffer pool" after a restart. The buffer pool is a cache; in the past is was simply not reloaded. During normal activity, the blocks in the buffer pool are organized based (somewhat) on "least recently used". This helps prevent bumping out a block "too soon".
If all the block pointers are stored in that file before shutting down, then the buffer pool can be restored to essentially where it was. At restart, this takes some disk activity, but after that, each query should be as fast as if the restart had not occurred..
If, because of whatever, some block is inappropriately reloaded, it will be a minor performance hit, but nothing will be "wrong". That block will soon be bumped out of the buffer pool.
How much state persists across a restart? Well, the absolute requirement is that the integrity of the data in the database be maintained -- even across a power failure. Anything beyond that is just performance optimizations. So, to fully answer the question, one needs to understand iblog* (needed after a crash; not needed after clean shutdown), the new tmp table file (not needed), the "double buffer" (used to recover from 'torn page' after abrupt crash), etc.

how to avoid high CPU usage when no active queries

On my linux server I ran into a problem this morning. I realized that after I killed all background tasks that were making ~200k database entries (1gb in size) throughout the night, my CPU usage was still at 80%, only due to MySQL.
Neither a reboot or restarting MySQL or nginx worked.
"InnoDB saved the data in the rows it was changing, and queries against that changes that are still being rolled back should be answered transparently using the data that's still in the undo log."
I am not too familiar with this topic, but it seems like that this is the answer of why there is a high CPU usage even when there are no queries. When I SHOW PROCESSLIST it shows three connections and it says in the "state"-column "Copying to tmp table".
I guess right now I just have to wait out until all those rollback processes are finished, but why do they come up in the first place and how can I avoid this from happening again?
Writes and index-updates are "delayed". This can lead to I/O and CPU activity even after all queries are finished.
"copying to tmp table" in the PROCESSLIST implies that something is still running. Chase down that query. It can possibly be improved with a better index or a rewrite. Killing mysqld will lead to a costly rollback now and/or when mysqld is restarted.
Killing a process in the middle of a transaction leads to an immediate ROLLBACK. Change the application to intercept the 'kill' and gracefully wait until things are in a good position for shutting down.
UPDATEing a million rows in a single statement takes a loooong time. Perhaps you killed that (or something like it)? Consider breaking such up into chunks using 1000-row ranges on the PRIMARY KEY.

What is INNODB "Pages to be Flushed"? And can I restart mysqld while there are still pages to be flushed?

I've tried reading some of the material out there but they are a bit over my head. However, from what I understand, if you have a lot of memory allocated to the buffer pool, then the writes to memory are happening faster than the disk can keep up with, and therefore, there are "pages to be flushed" still? Additionally, if I restart the mySQL server, will that cause any issues?
InnoDB performs certain tasks in the background, including flushing of dirty pages (those pages that have been changed but are not yet written to the database files) from the buffer pool, a task performed by the master thread.
For more information you can refer:
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html#innodb-performance-adaptive_flushing
Having dirty pages is something normal. When you update a row, MySQL updates it in the buffer pool, marking the page as dirty. The change is written in the binary log as well, so in case of crash, MySQL will replay the log and data won't be lost. Writting to the binary log is a append-only operation, while the actual update involve random writes, and random write is slower. MySQL flushes dirty pages to disk when it needs to load new data in the buffer pool. So, having dirty pages in InnoDB is something normal - it's how it works and it's done to improve the overall performance.But if you really want to get rid of them, set innodb_max_dirty_pages_pct value to 0
If you are using MySQL v5.6 then you can enable this variable innodb_buffer_pool_dump_at_shutdown which
Specifies whether to record the pages cached in the InnoDB buffer pool when the MySQL server is shut down, to shorten the warmup process at the next restart. you must use this variable in conjunction with innodb_buffer_pool_load_at_startup.