I am facing a problem, and I am trying to wrap my head around the isolation levels. To understand these isolation levels, I've read the documentation found on the MariaDB website.
The base isolation level used by InnoDB tables is stated to be REPEATABLE_READ.
Consider the following problem. I have the following two tables structure:
/** tableA **/
id INT AUTO_INCREMENT PRIMARY_KEY
/** tableB **/
id INT
claimedBy INT NULLABLE
and also have a function, which pseudocode looks like this
/** should create a new row in "tableA", and update the rows from "tableB" which ids match the ones in the array from the input parameter, and that are null, to the id of this new row - in case the number of updated rows does not match the length of the input array, it should roll back everything **/
claim(array what) {
- starts transaction
- inserts a row into "tableA" and retrieve's it's id, storing it inside "variableA"
- updates "claimedBy" to "variableA" on all rows from "tableB" that have "claimedBy" set to null and have "id" that is in "what"
- counts the number of rows from "tableB", where "claimedBy" equals to "variableA"
- if the count does not match the length of the "what" parameter, rolls back the transaction
- if the count matches, commits the transaction
}
My questions, which would help me understand isolation levels more concretly are the following:
In case two separate calls are made concurrently to this function which both have "what" arrays that intersect at any point, if I understand correctly, REPEATABLE_READ would prevent my data to become corrupted, because all the rows will be locked in the table as soon as the first update begins to perform, thus whichever function calls update is executed second, will be completely rolled back. Am I right in this? Based on the example on the official documentation it would seem like that rows are checked for the where condition and locked one-by-one. Is this the case? If yes, is it possible, that on concurrent calls to the function, both queries get rolled back? Or worse, is it possible that a deadlock would occur here?
In this concrete example, I could safely decrease the isolation level for the transaction to READ_COMMITED, which would also prevent the data corruption, but would not retain the locks for the duration of the update for rows that are not affected by the update, am I correct in this?
The lock retaining for manual TRANSACTIONS in MariaDB are for the duration of the query operation that create these locks, or for the duration of the complete transaction operation? (ie, until the transaction is either rolled back or commitd?)
FOLLOWUP QUESTION
Am I mistaken, that if using READ_COMMITED isolation, the following two concurrent calls could execute at the same time (without one, waiting for the lock of the other to be released), but not if REPEATABLE_READ isolation was used?
/** Session #1 **/
claim(array(1,2,3));
/** Session #2 **/
claim(array(4,5,6));
There's very little difference between REPEATABLE-READ and READ-COMMITTED in the scenario you describe.
The same locks are acquired in both cases. Locks are always held until the end of the transaction.
REPEATABLE-READ queries may also acquire gap locks to prevent new rows inserted, if those rows would change the result of some SELECT query. The MySQL manual explains gap locks better, and it works the same way in MariaDB: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks Regardless, I don't think this will be an issue.
I don't think you're at risk of a deadlock in the scenario you describe. Your UPDATE should lock all the rows examined. Rows are not locked one by one; the lock request is atomic. That is, if any of the set of examined rows cannot be locked because another session already has some or all of them locked, then the new lock request waits.
Once your UPDATE succeeds (locks are acquired and then the rows are updated), then your session has them locked and keeps them locked until the end of the transaction. Subsequently doing a count would reference only locked rows, so there's no way another session could slip in and cause a deadlock.
One subtle point about locking that you may not notice in the documentation: locking SQL statements act as if they are run in READ-COMMITTED mode, even if your transaction is REPEATABLE-READ. In other words, locks are acquired on the most recently committed version of a row, even if a non-locking SELECT query would not read the most recent version of that row. This is surprising to some programmers.
Re your comments:
I wrote a demo of the locking/nonlocking odd behavior in my answer here: How MVCC works with Lock in MySql?
Regarding releasing locks, yes, that's correct, in READ-COMMITTED mode, a lock is released if your UPDATE doesn't make any net change to the row. That is, if your update sets a column to the value that it already had. But in your case, you are changing values on rows that match your conditions. You specifically select for rows where the claimedBy is NULL, and you are setting that column to a non-NULL value.
Regarding your followup question, you don't have an index on the claimedBy column, so your query will have to at least examine all the rows. In READ-COMMITTED mode, it will be able to release the lock pretty promptly on rows that don't match the search conditions. But it would be better to have an index on claimedBy so it is able to examine only rows that match the condition. I would think it's better (if only by a slight margin) to avoid locking extra rows, instead of locking them and releasing the locks.
I don't think that transaction isolation is such an important factor in performance optimization. Choosing indexes to narrow down the set of examined rows is a much better strategy in most cases.
Related
I have a scenario where my cluster is in read committed isolation mode and the use case is like below:
A select statement when executed takes around 1 minutes to run the query and get the response back.
During which updates(Committed) to data can happen during this time frame of 1 minute.
So my question is will i get the updated record in the response or the old record??
I read the documentation and it's mentioned any phantom reads are allowed.
I am confused here so just want some clarity, please help.
Using READ COMMITTED has additional effects(Reference MYSQL docs):
For UPDATE or DELETE statements, InnoDB holds locks only for rows
that it updates or deletes. Record locks for nonmatching rows are
released after MySQL has evaluated the WHERE condition. This greatly
reduces the probability of deadlocks, but they can still happen.
For UPDATE statements, if a row is already locked, InnoDB performs a
“semi-consistent” read, returning the latest committed version to
MySQL so that MySQL can determine whether the row matches the WHERE
condition of the UPDATE. If the row matches (must be updated), MySQL
reads the row again and this time InnoDB either locks it or waits
for a lock on it.
There is no way concurrent updates to data can modify a given query while it is executing. It's as if every query runs in its own REPEATABLE READ snapshot, even if your transaction is READ COMMITTED.
It will return rows that had been committed at the time the statement began executing. It will not include any rows committed after the statement began.
Re your comment:
No, there is no transaction isolation level that can change this. Even if you use READ UNCOMMITTED, a given query reads only rows that were committed at the time the query began executing.
If you want to query recent updates, you can only do it by starting a new query.
If you're concerned that you aren't getting notified about recent updates, then you need to optimize your query so it doesn't take 60 seconds to execute.
This is starting to sound like you're polling the database. Running frequent expensive queries to poll a database is an indication that perhaps you need to use a message queue instead.
Re your second comment:
Locking SQL statements, including UPDATE and DELETE and also locking SELECT statements do function like READ COMMITTED even when your transaction is REPEATABLE READ. Locking statements always read the most recent row that was committed at the time the statement started.
But they still cannot read new rows committed after the statement started. If for no other reason than they can't get the locks on those rows.
Your original question was about SELECT statements, and I assumed you meant non-locking SELECT (that is, without the options of FOR UPDATE or LOCK IN SHARE MODE). Those SELECT statements also cannot view rows added after the SELECT started.
P.S. I have never found a good use of READ UNCOMMITTED for any purpose.
By default, INNOBD will lock the tables during processing, but there are ways to do an UNLOCKED SELECT. In that case, it will run on a versioned snapshot of the table, so any COMMIT during the processing won't alter the result.
For more information:
https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
In all cases, the ACID property of databases will always prevent unstable functions: https://en.wikipedia.org/wiki/ACID
I have a query such as
Select count(*) from table log where num = ?;
If I set the isolation level to serializable, then the range lock will be acquired for the where clause.
My question is: Can other transaction also acquire the range lock in share mode to read the count as the above OR the range lock is exclusive and all other transactions have to wait until the current transaction commits before executing the above read query.
Background: I am trying to implement a view counter for heavy traffic website. To reduce IO to the database, I create a log table so that every time there is a view, I only write a new row in the log table. Once a while, I (randomly) decide if I want to clear the log table and add the number of rows in the log table into a column of a view count table. This means I have to be careful with interleaving transaction.
The statements below are relevant only to SQL Server and were made before the OP made clear this was really about MySQL, about which I know nothing. I'm leaving it here since it (and the resulting discussion) might be of some use nevertheless, but it is not a complete, relevant answer to the question.
SELECT statements only ever acquire shared locks, on all isolation levels (unless overridden with a table hint). And shared locks are always compatible with each other (see Lock Compatibility), so there's no problem if other transactions want to acquire shared (range) locks as well. So yes, you can have any number of queries performing SELECT COUNT(*) in parallel and they will never block each other.
This doesn't mean other transactions don't have to wait. In particular, a DELETE query must eventually acquire an exclusive lock, and it will have to wait if the SELECT is holding a shared lock. Normally this is not an issue since the engine releases locks as soon as possible. When it does become an issue, you'll want to look at solutions like snapshot isolation, which uses optimistic concurrency and conflict detection rather than locking. Under that model, a SELECT will never block any other query (save those that want table locks). Of course, this isn't free; the row versioning is uses takes up disk space and I/O.
Can concurrency effects called "Missing and Double Reads Caused by Row Updates" and mentioned here https://msdn.microsoft.com/en-us/en-en/library/ms190805.aspx be relevant for Innodb engine?
ex.:
Transactions that are running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. Transactions that are running at the READ COMMITTED level do issue shared locks, but the row or page locks are released after the row is read. In either case, when you are scanning an index, if another user changes the index key column of the row during your read, the row might appear again if the key change moved the row to a position ahead of your scan. Similarly, the row might not appear if the key change moved the row to a position in the index that you had already read. To avoid this, use the SERIALIZABLE or HOLDLOCK hint, or row versioning
And one more update. From MSSQL engine:
"Inside Microsoft SQL Server 2008"
In certain circumstances, scans can end up returning multiple occurrences of rows or even skip rows. Allocation order scans are more prone to such behavior than index order scans. I’ll fi rst describe how such a phenomenon can happen with allocation order scans and in which circumstances. Then I’ll explain how it can happen with index order scans. Allocation Order Scans Figure 4-30 demonstrate in three steps how an allocation order scan can return multiple occurrences of rows. Step 1 shows an allocation order scan in progress, reading the leaf pages of some index in fi le order (not index order). Two pages were already read (keys 50, 60, 70, 80, 10, 20, 30, 40). At this point, before the third page of the index is read, someone inserts a row into the table with key 25. Step 2 shows a split that took place in the page that was the target for the insert since it was full. As a result of the split, a new page was allocated—in our case later in the fi le at a point that the scan did not yet reach. Half the rows from the original page move to the new page (keys 30, 40), and the new row with key 25 was added to the original page because of its key value. Step 3 shows the continuation of the scan: reading the remaining two pages (keys 90, 100, 110, 120, 30, 40) including the one that was added because of the split. Notice that the rows with keys 30 and 40 were read a second time..
This may be relevant to InnoDB engine in some circumstances.
For InnoDB, with SELECT queries issued in READ COMMITTED and REPEATABLE READ transaction isolation levels a Consistent Read mode is used, which is an implementation of MVCC otherwise known as optimistic concurrency. Under this mode, the reading query doesn't issue any locks but instead the engine maintains a snapshot of the database as it was when the query begun. No changes outside of that query are visible to it until it is committed (or entire transaction depending which of the above two isolation levels was chosen).
Is such scenario the situation you describe in your question would not be possible.
Example from the MySQL manual section lined above:
Session A Session B
SET autocommit=0; SET autocommit=0;
time
| SELECT * FROM t;
| empty set
| INSERT INTO t VALUES (1, 2);
|
v SELECT * FROM t;
empty set
COMMIT;
SELECT * FROM t;
empty set
COMMIT;
SELECT * FROM t;
---------------------
| 1 | 2 |
---------------------
1 row in set
Reading queries issued under READ UNCOMMITTED transaction isolation level bypass MVCC and "see" everything that is happening in the database, including any uncommitted transactions. This makes phantom and dirty reads an issue.
Reading queries relying on explicit use of locks (SELECT... FOR UPDATE and SELECT... LOCK IN SHARED MODE) or querying under SERIALIZABLE isolation level automatically falls back to locking-based concurrency. The latter upgrading any SELECT query to LOCK IN SHARED MODE. In this particular case whether you're safe from data movement as described in your question is dependent on the SELECT query WHERE predicates you have used. This impacts whether the engine locks only data you have just read or also entire ranges between the data you have read. Following is an excerpt from the relevant manual page:
For locking reads (SELECT with FOR UPDATE or LOCK IN SHARE MODE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition. For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it. For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key (gap plus index-record) locks to block insertions by other sessions into the gaps covered by the range.
I have locked one row in one transaction by following query
START TRANSACTION;
SELECT id FROM children WHERE id=100 FOR UPDATE;
And in another transaction i have a query as below
START TRANSACTION;
SELECT id FROM children WHERE id IN (98,99,100) FOR UPDATE;
It gives error lock wait timeout exceeded.
Here 100 is already locked (in first transaction ) But the ids 98,99 are not locked.Is there any possibility return records of 98,99 if only 100 is row locked in above query.So result should be as below
Id
===
98
99
===
Id 100 should be ignored because 100 is locked by a transaction.
Looks like SKIP LOCKED option mentioned in a previous answer is now available in MySQL. It does not wait to acquire a row lock and allows you to work with rows that are not currently locked.
From MySQL 8.0.0 Release Notes/Changes in MySQL 8.0.1:
InnoDB now supports NOWAIT and SKIP LOCKED options with SELECT ... FOR SHARE and SELECT ... FOR UPDATE locking read statements. NOWAIT causes the statement to return immediately if a requested row is locked by another transaction. SKIP LOCKED removes locked rows from the result set. See Locking Read Concurrency with NOWAIT and SKIP LOCKED.
Sample usage (complete example with outputs can be found in the link above):
START TRANSACTION;
SELECT * FROM tableName FOR UPDATE SKIP LOCKED;
Also, it might be good to include the warning in the Reference Manual here as well:
Queries that skip locked rows return an inconsistent view of the data. SKIP LOCKED is therefore not suitable for general transactional work. However, it may be used to avoid lock contention when multiple sessions access the same queue-like table.
MySQL does not have a way to ignore locked rows in a SELECT. You'll have to find a different way to set a row aside as "already processed".
The simplest way is to lock the row briefly in the first query just to mark it as "already processed", then unlock it and lock it again for the rest of the processing - the second query will wait for the short "marker" query to complete, and you can add an explicit WHERE condition to ignore already-marked rows. If you don't want to rely on the first operation being able to complete successfully, you may need to add a bit more complexity with timestamps and such to clean up after those failed operations.
MySQL does not have this feature. For anyone searching for this topic in general, some RDBMS have better/smarter locking features than others.
For developers constrained to MySQL, the best approach is to add a column (or use an existing, e.g., status column) that can be set to "locked" or "in progress" or similar, execute a SELECT ID, * ... WHERE IN_PROGRESS != 1 FOR UPDATE; to get the row ID you want to lock, issue UPDATE .. SET IN_PROGRESS = 1 WHERE ID = XX to unlock the records.
Using LOCK IN SHARE MODE is almost never the solution because while it'll let you read the old value, but the old value is in the process of being updated so unless you are performing a non-atomic task, there's no point in even looking at that record.
Better* RDBMS recognize this pattern (select one row to work on and lock it, work on it, unlock it) and provide a smarter approach that lets you only search unlocked records. For example, PostgreSQL 9.5+ provide SELECT ... SKIP LOCKED which only selects from within the unlocked subset of rows matching the query. That lets you obtain an exclusive lock on a row, service that record to completion, then update & unlock the record in question without having to block other threads/consumers from being able to work independent of yourself.
*Here "better" means from the perspective of atomic updates, multi-consumer architecture, etc. and not necessarily "better designed" or "overall better." Not trying to start a flamewar here.
As per http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
The solution is to perform the SELECT in a locking mode using LOCK IN SHARE MODE:
SELECT * FROM parent WHERE NAME = 'Jones' LOCK IN SHARE MODE;
I am a developer and have only fair knowledge about databases. I need to understand the transaction level locking mechanism in InnoDB.
I read that InnoDB uses row level locking? As far as I understand, it locks down a particular row within a transaction. What will happen to a select statement when a table update is going on ?
For Example, assume there is transaction and a select statement both triggered from two different processes and assume Transaction1 starts before the select statement is issued.
Transaction1 : Start
Update table_x set x = y where 1=1
Transaction1 : End
Select Query
Select x from table_x
What will happen to the select statement. Will it return values "during" Transaction1 takes place or "after" it completes? And if it can begin only after Transaction1 ends, where is Row level locking in this picture?
Am I making sense or my fundamental understanding itself is wrong? Please advise.
It depends on the Isolation level.
SERIALIZABLE
REPEATABLE READS
READ COMMITTED
READ UNCOMMITTED
Good explained on wikipedia
And the mySQL docu
It does not depend only on the locking involved, but on the isolation level, which uses locking to provide the transaction isolation as defined by ACID standards. InnoDB uses not only locking, but also multiversioning of the rows to speed up transactions.
In serializable isolation level it would use read-lock with the update, so the select will have to wait for first transaction to be completed. On lower isolation levels however the lock will be write, and selects won't be blocked. In repeatable read and read committed it will scan the rollback log to get the previous value of the record, if it is updated, and in read uncommitted in will return the current value.
The difference between table-level locking and row-level locking is when you have 2 transactions that run update query. In table-level locking, the 2nd will have to wait the first one, as the whole table is locked. In row-level locking, only the rows that match the where clause* (as well as some gaps between them, but this is another topic) will be locked, which means that different transactions can update different parts of the table without need to wait for each other.
*assuming that there is index covering the where clause
The select will not wait for the transaction to complete, instead it will return the current value of the rows (aka, before the transaction started).
If you want the select to wait for the transaction to finish you can use "LOCK IN SHARE MODE":
Select x from table_x LOCK IN SHARE MODE;
This will cause the select to wait for any row(s) that are currently lock by a transaction holding an exclusive (update/delete) lock on them.
A read performed with LOCK IN SHARE MODE reads the latest available
data and sets a shared mode lock on the rows read. A shared mode lock
prevents others from updating or deleting the row read. Also, if the
latest data belongs to a yet uncommitted transaction of another
session, we wait until that transaction ends.
http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html
SELECT started from outside of a transaction will see the table as it was before transaction started. It will see updated values only after transsaction is commited.