Please help me understand the use-case behind SELECT ... FOR UPDATE.
Question 1: Is the following a good example of when SELECT ... FOR UPDATE should be used?
Given:
rooms[id]
tags[id, name]
room_tags[room_id, tag_id]
room_id and tag_id are foreign keys
The application wants to list all rooms and their tags, but needs to differentiate between rooms with no tags versus rooms that have been removed. If SELECT ... FOR UPDATE is not used, what could happen is:
Initially:
rooms contains [id = 1]
tags contains [id = 1, name = 'cats']
room_tags contains [room_id = 1, tag_id = 1]
Thread 1: SELECT id FROM rooms;
returns [id = 1]
Thread 2: DELETE FROM room_tags WHERE room_id = 1;
Thread 2: DELETE FROM rooms WHERE id = 1;
Thread 2: [commits the transaction]
Thread 1: SELECT tags.name FROM room_tags, tags WHERE room_tags.room_id = 1 AND tags.id = room_tags.tag_id;
returns an empty list
Now Thread 1 thinks that room 1 has no tags, but in reality the room has been removed. To solve this problem, Thread 1 should SELECT id FROM rooms FOR UPDATE, thereby preventing Thread 2 from deleting from rooms until Thread 1 is done. Is that correct?
Question 2: When should one use SERIALIZABLE transaction isolation versus READ_COMMITTED with SELECT ... FOR UPDATE?
Answers are expected to be portable (not database-specific). If that's not possible, please explain why.
The only portable way to achieve consistency between rooms and tags and making sure rooms are never returned after they had been deleted is locking them with SELECT FOR UPDATE.
However in some systems locking is a side effect of concurrency control, and you achieve the same results without specifying FOR UPDATE explicitly.
To solve this problem, Thread 1 should SELECT id FROM rooms FOR UPDATE, thereby preventing Thread 2 from deleting from rooms until Thread 1 is done. Is that correct?
This depends on the concurrency control your database system is using.
MyISAM in MySQL (and several other old systems) does lock the whole table for the duration of a query.
In SQL Server, SELECT queries place shared locks on the records / pages / tables they have examined, while DML queries place update locks (which later get promoted to exclusive or demoted to shared locks). Exclusive locks are incompatible with shared locks, so either SELECT or DELETE query will lock until another session commits.
In databases which use MVCC (like Oracle, PostgreSQL, MySQL with InnoDB), a DML query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, a SELECT FOR UPDATE would come handy: it would lock either SELECT or the DELETE query until another session commits, just as SQL Server does.
When should one use REPEATABLE_READ transaction isolation versus READ_COMMITTED with SELECT ... FOR UPDATE?
Generally, REPEATABLE READ does not forbid phantom rows (rows that appeared or disappeared in another transaction, rather than being modified)
In Oracle and earlier PostgreSQL versions, REPEATABLE READ is actually a synonym for SERIALIZABLE. Basically, this means that the transaction does not see changes made after it has started. So in this setup, the last Thread 1 query will return the room as if it has never been deleted (which may or may not be what you wanted). If you don't want to show the rooms after they have been deleted, you should lock the rows with SELECT FOR UPDATE
In InnoDB, REPEATABLE READ and SERIALIZABLE are different things: readers in SERIALIZABLE mode set next-key locks on the records they evaluate, effectively preventing the concurrent DML on them. So you don't need a SELECT FOR UPDATE in serializable mode, but do need them in REPEATABLE READ or READ COMMITED.
Note that the standard on isolation modes does prescribe that you don't see certain quirks in your queries but does not define how (with locking or with MVCC or otherwise).
When I say "you don't need SELECT FOR UPDATE" I really should have added "because of side effects of certain database engine implementation".
Short answers:
Q1: Yes.
Q2: Doesn't matter which you use.
Long answer:
A select ... for update will (as it implies) select certain rows but also lock them as if they have already been updated by the current transaction (or as if the identity update had been performed). This allows you to update them again in the current transaction and then commit, without another transaction being able to modify these rows in any way.
Another way of looking at it, it is as if the following two statements are executed atomically:
select * from my_table where my_condition;
update my_table set my_column = my_column where my_condition;
Since the rows affected by my_condition are locked, no other transaction can modify them in any way, and hence, transaction isolation level makes no difference here.
Note also that transaction isolation level is independent of locking: setting a different isolation level doesn't allow you to get around locking and update rows in a different transaction that are locked by your transaction.
What transaction isolation levels do guarantee (at different levels) is the consistency of data while transactions are in progress.
What is SELECT FOR UPDATE?
SELECT FOR UPDATE is a SQL command that’s useful in the context of transactional workloads. It allows you to “lock” the rows returned by a SELECT query until the entire transaction that query is part of has been committed. Other transactions attempting to access those rows are placed into a time-based queue to wait, and are executed chronologically after the first transaction is completed.
BEGIN;
SELECT * FROM kv WHERE k = 1 FOR UPDATE;
UPDATE kv SET v = v + 5 WHERE k = 1;
COMMIT
Related
A documentation for MySQL 5.6 regarding REPEATABLE READ isolation level states:
This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.
Does reverse (somewhat) guarantee also holds? I.e. does any record which was not read yet, will be the newest version of itself? What I'm actually trying to understand, is whether in MySQL there is a possibility to have a race condition in data fetched between starting a transaction and aquiring a lock (SELECT ... FOR UPDATE)?
Example:
Transaction 1 Transaction 2
begin
begin
select id from shops where id = 1
update shops where name = 'something new'
select id, name from shops where id = 1
(waiting for lock)
...
commit
???
Is there a guarantee which tells me that in the end, in ??? place, transaction 1 will surely see name equal being to something? Source in documentation would be much appreciated (for now I've just checked manually with two db sessions that it "seems to work").
From the 5.6 documentation:
Consistent reads ignore any locks set on the records that exist in the
read view. (Old versions of a record cannot be locked; they are
reconstructed by applying undo logs on an in-memory copy of the
record.)
This is not the most straightforward way to put it, but that fragment says that the old versions cannot be locked, so if anything has changed, the lock is always aquired on the newest version possible (effectively not being a "repeatable read").
The PHP Documentation says:
If you've never encountered transactions before, they offer 4 major
features: Atomicity, Consistency, Isolation and Durability (ACID). In
layman's terms, any work carried out in a transaction, even if it is
carried out in stages, is guaranteed to be applied to the database
safely, and without interference from other connections, when it is
committed.
QUESTION:
Does this mean that I can have two separate php scripts running transactions simultaneously without them interfering with one another?
ELABORATING ON WHAT I MEAN BY "INTERFERING":
Imagine we have the following employees table:
__________________________
| id | name | salary |
|------+--------+----------|
| 1 | ana | 10000 |
|------+--------+----------|
If I have two scripts with similar/same code and they run at the exact same time:
script1.php and script2.php (both have the same code):
$conn->beginTransaction();
$stmt = $conn->prepare("SELECT * FROM employees WHERE name = ?");
$stmt->execute(['ana']);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
$salary = $row['salary'];
$salary = $salary + 1000;//increasing salary
$stmt = $conn->prepare("UPDATE employees SET salary = {$salary} WHERE name = ?");
$stmt->execute(['ana']);
$conn->commit();
and assuming the sequence of events is as follows:
script1.php selects data
script2.php selects data
script1.php updates data
script2.php updates data
script1.php commit() happens
script2.php commit() happens
What would the resulting salary of ana be in this case?
Would it be 11000? And would this then mean that 1 transaction will overlap the other because the information was obtained before either commit happened?
Would it be 12000? And would this then mean that regardless of the order in which data was updated and selected, the commit() function forced these to happen individually?
Please feel free to elaborate as much as you want on how transactions and separate scripts can interfere (or don't interfere) with one another.
You are not going to find the answer in php documentation because this has nothing to do with php or pdo.
Innodb table engine in mysql offers 4 so-called isolation levels in line with the sql standard. The isolation levels in conjunction with blocking / non-blocking reads will determine the result of the above example. You need to understand the implications of the various isolation levels and choose the appropriate one for your needs.
To sum up: if you use serialisable isolation level with autocommit turned off, then the result will be 12000. In all other isolation levels and serialisable with autocommit enabled the result will be 11000. If you start using locking reads, then the result could be 12000 under all isolation levels.
Judging by the given conditions (a solitary DML statement), you don't need a transaction here, but a table lock. It's a very common confusion.
You need a transaction if you need to make sure that ALL your DML statements were performed correctly or weren't performed at all.
Means
you don't need a transaction for any number of SELECT queries
you don't need a transaction if only one DML statement is performed
Although, as it was noted in the excellent answer from Shadow, you may use a transaction here with appropriate isolation level, it would be rather confusing. What you need here is table locking. InnoDB engine lets you lock particular rows instead of locking the entire table and thus should be preferred.
In case you want the salary to be 1200 - then use table locks.
Or - a simpler way - just run an atomic update query:
UPDATE employees SET salary = salary + 1000 WHERE name = ?
In this case all salaries will be recorded.
If your goal is different, better express it explicitly.
But again: you have to understand that transactions in general has nothing to do with separate scripts execution. Regarding your topic of race condition you are interested not in transactions but in table/row locking. This is a very common confusion, and you better learn it straight:
a transaction is to ensure that a set of DML queries within one script were executed successfully.
table/row locking is to ensure that other script executions won't interfere.
The only topic where transactions and locking interfere is a deadlock, but again - it's only in case when a transaction is using locking.
Alas, the "without interference" needs some help from the programmer. It needs BEGIN and COMMIT to define the extent of the 'transaction'. And...
Your example is inadequate. The first statement needs SELECT ... FOR UPDATE. This tells the transaction processing that there is likely to be an UPDATE coming for the row(s) that the SELECT fetches. That warning is critical to "preventing interference". Now the timeline reads:
script1.php BEGINs
script2.php BEGINs
script1.php selects data (FOR UPDATE)
script2.php selects data is blocked, so it waits
script1.php updates data
script1.php commit() happens
script2.php selects data (and will get the newly-committed value)
script2.php updates data
script2.php commit() happens
(Note: This is not a 'deadlock', just a 'wait'.)
I have locked one row in one transaction by following query
START TRANSACTION;
SELECT id FROM children WHERE id=100 FOR UPDATE;
And in another transaction i have a query as below
START TRANSACTION;
SELECT id FROM children WHERE id IN (98,99,100) FOR UPDATE;
It gives error lock wait timeout exceeded.
Here 100 is already locked (in first transaction ) But the ids 98,99 are not locked.Is there any possibility return records of 98,99 if only 100 is row locked in above query.So result should be as below
Id
===
98
99
===
Id 100 should be ignored because 100 is locked by a transaction.
Looks like SKIP LOCKED option mentioned in a previous answer is now available in MySQL. It does not wait to acquire a row lock and allows you to work with rows that are not currently locked.
From MySQL 8.0.0 Release Notes/Changes in MySQL 8.0.1:
InnoDB now supports NOWAIT and SKIP LOCKED options with SELECT ... FOR SHARE and SELECT ... FOR UPDATE locking read statements. NOWAIT causes the statement to return immediately if a requested row is locked by another transaction. SKIP LOCKED removes locked rows from the result set. See Locking Read Concurrency with NOWAIT and SKIP LOCKED.
Sample usage (complete example with outputs can be found in the link above):
START TRANSACTION;
SELECT * FROM tableName FOR UPDATE SKIP LOCKED;
Also, it might be good to include the warning in the Reference Manual here as well:
Queries that skip locked rows return an inconsistent view of the data. SKIP LOCKED is therefore not suitable for general transactional work. However, it may be used to avoid lock contention when multiple sessions access the same queue-like table.
MySQL does not have a way to ignore locked rows in a SELECT. You'll have to find a different way to set a row aside as "already processed".
The simplest way is to lock the row briefly in the first query just to mark it as "already processed", then unlock it and lock it again for the rest of the processing - the second query will wait for the short "marker" query to complete, and you can add an explicit WHERE condition to ignore already-marked rows. If you don't want to rely on the first operation being able to complete successfully, you may need to add a bit more complexity with timestamps and such to clean up after those failed operations.
MySQL does not have this feature. For anyone searching for this topic in general, some RDBMS have better/smarter locking features than others.
For developers constrained to MySQL, the best approach is to add a column (or use an existing, e.g., status column) that can be set to "locked" or "in progress" or similar, execute a SELECT ID, * ... WHERE IN_PROGRESS != 1 FOR UPDATE; to get the row ID you want to lock, issue UPDATE .. SET IN_PROGRESS = 1 WHERE ID = XX to unlock the records.
Using LOCK IN SHARE MODE is almost never the solution because while it'll let you read the old value, but the old value is in the process of being updated so unless you are performing a non-atomic task, there's no point in even looking at that record.
Better* RDBMS recognize this pattern (select one row to work on and lock it, work on it, unlock it) and provide a smarter approach that lets you only search unlocked records. For example, PostgreSQL 9.5+ provide SELECT ... SKIP LOCKED which only selects from within the unlocked subset of rows matching the query. That lets you obtain an exclusive lock on a row, service that record to completion, then update & unlock the record in question without having to block other threads/consumers from being able to work independent of yourself.
*Here "better" means from the perspective of atomic updates, multi-consumer architecture, etc. and not necessarily "better designed" or "overall better." Not trying to start a flamewar here.
As per http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
The solution is to perform the SELECT in a locking mode using LOCK IN SHARE MODE:
SELECT * FROM parent WHERE NAME = 'Jones' LOCK IN SHARE MODE;
I am running these queries on MySQL 5.6.13.
I using repeatable read isolation level. The table looks like below:
In Session A terminal I have issued below statement
UPDATE manufacurer
SET lead_time = 2
WHERE mname = 'Hayleys';
In Session B terminal I tried to update the value lead_time of ACL Cables to 2. But since the previous UPDATE command from Session A is not yet committed (and Session A has an exclusive lock on manufacturer table), this update waits. This I can understand.
But when I try to execute a SELECT statement on Session B as below,
SELECT * FROM manufacturer
WHERE mcode = 'ACL';
it correctly query the manufacturer table and give out the row. How can this happen? Because Session A still hold the exclusive lock on manufacturer table and as I understand when an exclusive lock is held on a table no other transactions can read from or write to it till the previous transaction is committed.
Found below information on this page
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_repeatable-read
Scope of Transaction Characteristics
You can set transaction characteristics globally, for the current
session, or for the next transaction:
With the GLOBAL keyword, the statement applies globally for all
subsequent sessions. Existing sessions are unaffected.
With the SESSION keyword, the statement applies to all subsequent
transactions performed within the current session.
Without any SESSION or GLOBAL keyword, the statement applies to the
next (not started) transaction performed within the current session.
Have this been taken into consideration?
REPEATABLE READ
This is the default isolation level for InnoDB. For consistent reads,
there is an important difference from the READ COMMITTED isolation
level: All consistent reads within the same transaction read the
snapshot established by the first read. This convention means that if
you issue several plain (nonlocking) SELECT statements within the same
transaction, these SELECT statements are consistent also with respect
to each other.
In this article its decribes very well.
http://www.mysqlperformanceblog.com/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/
It is important to remember that InnoDB actually locks index entries,
not rows. During the execution of a statement InnoDB must lock every
entry in the index that it traverses to find the rows it is modifying.
It must do this to prevent deadlocks and maintain the isolation level.
Are the tables well indexed? Can you run a SHOW ENGINE innodb STATUS to confirm that the lock is held?
There are kinds of lock in mysql: row-level lock and table-level lock.
What you need is row-level lock,which allows read the lines beyond the ones updating.
And to implement the row-level lock,you have to define the engine type of your table to 'InnoDB':
alter table TABLE_NAME engine=innodb;
Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.