Why MySQL InnoDB also acquire Gap locks for update/delete operation? - mysql

As far as I know the gap lock is used to prevent phantom read, and I found gap lock is set by locking read in most articles via Google search.
A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record. For example, SELECT c1 FROM t WHERE c1 BETWEEN 10 and 20 FOR UPDATE; prevents other transactions from inserting a value of 15 into column t.c1, whether or not there was already any such value in the column, because the gaps between all existing values in the range are locked.
https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks
I guess this (set gap lock on locking read) is sufficient. Why update, delete also set gap lock.
UPDATE ... WHERE ... sets an exclusive next-key lock on every record the search encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.
https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html
And another issue is what happened if there is no suitable index where gap lock can be attached?
Does fall back to lock on the entire table?
Here we assumed that using the default transaction isolation level Repeatable Read.

It depends on the conditions in your SELECT, UPDATE, or DELETE. They set gap locks to prevent other concurrent sessions from adding rows to the set that would be matched by the conditions.
In InnoDB, locking statements always lock the most recent committed row versions. So they don't really obey the REPEATABLE READ snapshot. They act more like READ-COMMITTED.
Therefore, if you do a statement like this:
UPDATE FROM MyTable SET ... WHERE created_at > '2020-03-22';
It must lock the gap following the highest value of created_at, which will prevent other sessions from adding new rows.
This is to simulate REPEATABLE READ, to make sure that if you run the same UPDATE again, it will affect the same rows, and it won't accidentally affect new rows.

Related

MariaDB Transaction Isolation Levels

I am facing a problem, and I am trying to wrap my head around the isolation levels. To understand these isolation levels, I've read the documentation found on the MariaDB website.
The base isolation level used by InnoDB tables is stated to be REPEATABLE_READ.
Consider the following problem. I have the following two tables structure:
/** tableA **/
id INT AUTO_INCREMENT PRIMARY_KEY
/** tableB **/
id INT
claimedBy INT NULLABLE
and also have a function, which pseudocode looks like this
/** should create a new row in "tableA", and update the rows from "tableB" which ids match the ones in the array from the input parameter, and that are null, to the id of this new row - in case the number of updated rows does not match the length of the input array, it should roll back everything **/
claim(array what) {
- starts transaction
- inserts a row into "tableA" and retrieve's it's id, storing it inside "variableA"
- updates "claimedBy" to "variableA" on all rows from "tableB" that have "claimedBy" set to null and have "id" that is in "what"
- counts the number of rows from "tableB", where "claimedBy" equals to "variableA"
- if the count does not match the length of the "what" parameter, rolls back the transaction
- if the count matches, commits the transaction
}
My questions, which would help me understand isolation levels more concretly are the following:
In case two separate calls are made concurrently to this function which both have "what" arrays that intersect at any point, if I understand correctly, REPEATABLE_READ would prevent my data to become corrupted, because all the rows will be locked in the table as soon as the first update begins to perform, thus whichever function calls update is executed second, will be completely rolled back. Am I right in this? Based on the example on the official documentation it would seem like that rows are checked for the where condition and locked one-by-one. Is this the case? If yes, is it possible, that on concurrent calls to the function, both queries get rolled back? Or worse, is it possible that a deadlock would occur here?
In this concrete example, I could safely decrease the isolation level for the transaction to READ_COMMITED, which would also prevent the data corruption, but would not retain the locks for the duration of the update for rows that are not affected by the update, am I correct in this?
The lock retaining for manual TRANSACTIONS in MariaDB are for the duration of the query operation that create these locks, or for the duration of the complete transaction operation? (ie, until the transaction is either rolled back or commitd?)
FOLLOWUP QUESTION
Am I mistaken, that if using READ_COMMITED isolation, the following two concurrent calls could execute at the same time (without one, waiting for the lock of the other to be released), but not if REPEATABLE_READ isolation was used?
/** Session #1 **/
claim(array(1,2,3));
/** Session #2 **/
claim(array(4,5,6));
There's very little difference between REPEATABLE-READ and READ-COMMITTED in the scenario you describe.
The same locks are acquired in both cases. Locks are always held until the end of the transaction.
REPEATABLE-READ queries may also acquire gap locks to prevent new rows inserted, if those rows would change the result of some SELECT query. The MySQL manual explains gap locks better, and it works the same way in MariaDB: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks Regardless, I don't think this will be an issue.
I don't think you're at risk of a deadlock in the scenario you describe. Your UPDATE should lock all the rows examined. Rows are not locked one by one; the lock request is atomic. That is, if any of the set of examined rows cannot be locked because another session already has some or all of them locked, then the new lock request waits.
Once your UPDATE succeeds (locks are acquired and then the rows are updated), then your session has them locked and keeps them locked until the end of the transaction. Subsequently doing a count would reference only locked rows, so there's no way another session could slip in and cause a deadlock.
One subtle point about locking that you may not notice in the documentation: locking SQL statements act as if they are run in READ-COMMITTED mode, even if your transaction is REPEATABLE-READ. In other words, locks are acquired on the most recently committed version of a row, even if a non-locking SELECT query would not read the most recent version of that row. This is surprising to some programmers.
Re your comments:
I wrote a demo of the locking/nonlocking odd behavior in my answer here: How MVCC works with Lock in MySql?
Regarding releasing locks, yes, that's correct, in READ-COMMITTED mode, a lock is released if your UPDATE doesn't make any net change to the row. That is, if your update sets a column to the value that it already had. But in your case, you are changing values on rows that match your conditions. You specifically select for rows where the claimedBy is NULL, and you are setting that column to a non-NULL value.
Regarding your followup question, you don't have an index on the claimedBy column, so your query will have to at least examine all the rows. In READ-COMMITTED mode, it will be able to release the lock pretty promptly on rows that don't match the search conditions. But it would be better to have an index on claimedBy so it is able to examine only rows that match the condition. I would think it's better (if only by a slight margin) to avoid locking extra rows, instead of locking them and releasing the locks.
I don't think that transaction isolation is such an important factor in performance optimization. Choosing indexes to narrow down the set of examined rows is a much better strategy in most cases.

Why does InnoDB block more records in case of a secondary index?

I'm using MySQL InnoDB tables and trying to understand the reasons for some row-level locking in the case of an index range scan. I found that an extra index record (out of range) may be locked depending on the uniqueness of the index used. See the example below (verified in version 8.0.18).
CREATE TABLE foo (
a INT NOT NULL,
b INT NOT NULL,
c CHAR(1) NOT NULL,
PRIMARY KEY (a),
UNIQUE KEY (b)
) ENGINE=InnoDB;
INSERT INTO foo VALUES (1,1,'A'), (3,3,'B'), (5,5,'C'), (7,7,'D'), (9,9,'E');
Test case 1
Session 1:
START TRANSACTION;
SELECT * FROM foo WHERE a < 2 FOR UPDATE;
Session 2:
DELETE FROM foo WHERE a = 3; -- Success
Test case 2
This uses the original rows of the table with the deleted record returned.
Session 1:
START TRANSACTION;
SELECT * FROM foo WHERE b < 2 FOR UPDATE;
Session 2:
DELETE FROM foo WHERE b = 3; -- Blocks
Locking the secondary index record with b = 3 in the second test case looks unnecessary.
Why does InnoDB block the next index entry to the right of the scanned range in case of a secondary index? Is there any practical reason for this?
Can someone give an example of a problem that could happen if the record with b = 3 is not blocked in the second test case?
Finally I found the answer. In short, there are no significant reasons for such additional locking in the second test case. When a locking read of a secondary index range is performed, sufficient locks are set, but not a necessary minimum. Thus, extra locks are set only because it was easier for some InnoDB programmers to write code. Who cares about extra locks if everything works for the most part?
I posted a bug report about this issue: https://bugs.mysql.com/bug.php?id=98639
Unfortunately, their employee does not want to register this bug. He does not understand my arguments and comes up with erroneous explanations. He made my last arguments private and stopped responding.
I also asked about this issue in the official forum and received the following answer: https://forums.mysql.com/read.php?22,684356,684482
In short, significant efforts are required to fix this bug. But since this is a very small and insignificant bug (more precisely, a performance issue), they do not want to fix it yet. However, in version 8.0.18, they fixed a similar problem for the clustered index, and it took them more than a month.
I'm very surprised that optimizing such a simple single-level scanning algorithm takes so much time and is so difficult for the MySQL team.
Like #Barmar already mentioned it is so because MySQL is setting a GAP or a Next-key lock. I assume that you are using the default innodb isolation level REPEATABLE READ
The MySQL documentation says:
For locking reads (SELECT with FOR UPDATE or LOCK IN SHARE MODE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition.
For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it.
For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range. For information about gap locks and next-key locks, see Section 14.7.1, “InnoDB Locking”.
See:https://dev.mysql.com/doc/refman/5.6/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read
At the moment i'm looking for info on how the gap lock is set, but I also got this info, and I guess it is helpfull information:
Keep in mind that the lock is based on a internal index because the column that you are using in your criteria is not uniquely indexed.
A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record
Gap locks in InnoDB are “purely inhibitive”, which means that their only purpose is to prevent other transactions from inserting to the gap. Gap locks can co-exist. A gap lock taken by one transaction does not prevent another transaction from taking a gap lock on the same gap. There is no difference between shared and exclusive gap locks. They do not conflict with each other, and they perform the same function.
Gap locking can be disabled explicitly. This occurs if you change the transaction isolation level to READ COMMITTED or enable the innodb_locks_unsafe_for_binlog system variable (which is now deprecated). Under these circumstances, gap locking is disabled for searches and index scans and is used only for foreign-key constraint checking and duplicate-key checking.
There are also other effects of using the READ COMMITTED isolation level or enabling innodb_locks_unsafe_for_binlog. Record locks for nonmatching rows are released after MySQL has evaluated the WHERE condition. For UPDATE statements, InnoDB does a “semi-consistent” read, such that it returns the latest committed version to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE.
see https://dev.mysql.com/doc/refman/5.6/en/innodb-locking.html#innodb-gap-locks

How to use this sql add lock in the rows?

SQL:
INSERT LOW_PRIORITY IGNORE INTO A_new (??) SELECT ?? FROM A FORCE INDEX(`PRIMARY`)
WHERE ((`id` >='XX' )) AND ((`id` <= 'XX')) LOCK IN SHARE MODE;
Is it possible to add S Locks in the range rows without any transaction?
If not,does this sql only work in transaction?
I get the answer from mysql document.
INSERT INTO T SELECT ... FROM S WHERE ... sets an exclusive index
record lock (without a gap lock) on each row inserted into T. If the
transaction isolation level is READ COMMITTED, InnoDB does the search
on S as a consistent read (no locks). Otherwise, InnoDB sets shared
next-key locks on rows from S. InnoDB has to set locks in the latter
case: During roll-forward recovery using a statement-based binary log,
every SQL statement must be executed in exactly the same way it was
done originally.

LOCK IN SHARE MODE locks entire table

Documentation:
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on any rows that are read. Other sessions can read the rows, but cannot modify them until your transaction commits. If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
However, some experimentation suggests that it locks more than the rows that are read.
CREATE TABLE example (a int);
START TRANSACTION;
SELECT a FROM example WHERE a = 0 LOCK IN SHARE MODE;
And then on another connection
INSERT INTO example VALUES (1);
The later connection blocks on the lock.
It would seems that LOCK IN SHARE MODE locks more than "any rows that are read".
What exactly does LOCK IN SHARE MODE lock?
Make sure you have an index on the a column. Otherwise, in order to evaluate WHERE a = 0, it has to read every row in the table, and it will then set a lock on each row as it reads it.
ALTER TABLE example ADD INDEX (a);

Does SQL INSERT involve any read/write lock?

This is an extremely simple question, but i really can't find any useful answer on Google.
I've been reading that there's a READ LOCK on SELECT, and WRITE LOCK on UPDATE statements. However, i'm trying to find out if there's any kind of LOCK when we INSERT into a table?
The assumption is that the table is using InnoDB as the engine.
When inserting, InnoDB creates a so called "gap lock".
The manual describes this quite well:
A type of gap lock called an insertion intention gap lock is set by INSERT operations prior to row insertion. This lock signals the intent to insert in such a way that multiple transactions inserting into the same index gap need not wait for each other if they are not inserting at the same position within the gap. Suppose that there are index records with values of 4 and 7. Separate transactions that attempt to insert values of 5 and 6 each lock the gap between 4 and 7 with insert intention locks prior to obtaining the exclusive lock on the inserted row, but do not block each other because the rows are nonconflicting
In addition to that, any unique index will be locked for the provided values to make sure that two different transactions don't insert the same value (that's slightly different than the gap lock if I'm not mistaken).
For innoDB mysql uses row level locking.
When inserting there is no row to lock because You are creating it.
Adding to the gap lock description, Mysql INNODB does a ROW-LOCK on the row that is being inserted.