OK here's the basic idea of what is happening:
begin transaction
some_data=select something from some_table where some_condition;
if some_data does not exists or some_data is outdated
new_data = insert a_new_entry to some_table
commit transaction
return new_data
else
return some_data
end
When multiple processes execute the code above simultaneously(like the client issues a lot of identical requests at a same time), a lot of 'new_data' will be inserted while actually only one is needed.
I think it's a quite typical scenario of concurrency, but still I can't figure out a decent way to avoid it. Things I can think about maybe like having a single worker process to do the select_or_insert job, or maybe set the isolation level to Serializable(unacceptable). But neither is quite satisfactory to me.
PS: The database is mysql, table engine is innodb, and isolation level is repeatable read
In your initial SELECT, use SELECT ... FOR UPDATE.
This ensures that the record is locked against other clients reading it until the transaction has been committed (or rolled-back), so they wait on the SELECT command and do not continue through to the rest of the logic until the first client has completed its UPDATE.
Note that you will need to ROLLBACK the transaction in the else condition, or else the lock will continue blocking until the connection is dropped.
Related
Let's say I am trying to execute the following UPDATE statement in mysql (Innodb):
UPDATE main SET name = "Ralph" WHERE rowid=19283
Is there a way before doing this statement to see if there is a row/table-level lock on rowid=19283 before running this update? Or is the application strategy of dealing with deadlocks to catch the exception and then deal with them after the fact? I find that once a deadlock is reached, it's often impossible to update that row without some very loop-y logic, and so I'm seeing if the deadlock can be detected before the potential UPDATE/INSERT statement
A common pattern is
BEGIN;
SELECT ... FOR UPDATE; -- grab a lock on the row
... do some other processing, then eventually:
UPDATE ... -- change the row (or maybe skip this in some cases)
COMMIT;
This allows multiple connections to gracefully change that row, but without stepping on each other.
No, this does not eliminate deadlocks. It may turn a deadlock into a "lock wait", which is fine.
And it is not quite a "dry run". It moves the lock from the UPDATE back to the SELECT. If there are other things going on in this, and the competing, transaction, there could be a deadlock.
If you have 2 connections doing that transaction at "exactly" the same time, one will wait until the other finishes. (No deadlock.)
Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.
I'm fairly sure this has a simple solution, but I haven't been able to find it so far. Provided an InnoDB MySQL database with the isolation level set to SERIALIZABLE, and given the following operation:
BEGIN WORK;
SELECT * FROM users WHERE userID=1;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
I would like to make sure that as soon as the select inside the transaction is issued, the row corresponding to userID=1 is locked for reads until the transaction is done. As it stands now, UPDATEs to this row will wait for the transaction to be finished if it is in process, but SELECTs simply will read the previous value. I understand this is the expected behaviour in this case, but I wonder if there is a way to lock the row in such a way that SELECTs will also wait until the transaction is finished to return the values?
The reason I'm looking for that is that at some point, and with enough concurrent users, it could happen that while the previous transaction is in process someone else reads the "credits" to calculate something else. Ideally the code run by that someone else should wait for the transaction to finish to use the new value, because otherwise it could lead to irreversible desync issues.
Note that I don't want to lock the entire table for reads, just the specific row.
Also, I could add a boolean "locked" field to the tables and set it to 1 every time I'm starting a transaction but I don't really feel this is the most elegant solution here, unless there is absolutely no other way to handle this through mysql directly.
I found a workaround, specifically:
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on the rows
read. A shared mode lock enables other sessions to read the rows but
not to modify them. The rows read are the latest available, so if they
belong to another transaction that has not yet committed, the read
blocks until that transaction ends.
(Source)
It seems that one can just include LOCK IN SHARE MODE in the critical SELECT statements that rely on transactional data and they will indeed wait for current transactions to finish before retrieving the row/s. For this to work the transaction has to use FOR UPDATE explicitly (as opposed to the original example I gave). E.g., given the following:
BEGIN WORK;
SELECT * FROM users WHERE userID=1 FOR UPDATE;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
Anywhere else in the code I could use:
SELECT * FROM users WHERE userID=1 LOCK IN SHARE MODE;
Since this statement is not wrapped in a transaction, the lock is released immediately, thus having no impacts in subsequent queries, but if the row involving userID=1 has been selected for update within a transaction this statement would wait until the transaction is done, which is exactly what I was looking for.
You could try the SELECT ... FOR UPDATE locking read.
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
Please go through the following site: http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
I would like to be able to lock an entire table to prevent any INSERTs or UPDATEs in it between the "beginTransaction" and the ending "commit" or "rollback".
I know that beginning a transaction results in an implicit UNLOCK TABLES and that a LOCK table results in a implicit COMMIT... so is there any way to do what I want?
Why? Perhaps you have missed the point of transactions.
If you use repeatable-read transaction isolation, inserts, updates etc, can happen during your transaction, BUT YOU WILL NOT SEE THEM. So as far as your process is concerned, the table is locked for inserts/updates. Except they are still happening, they're still durable to disc, and other processes can continue.
After you do your first "select", a snapshot is created, and you are effectively reading that snapshot, not the latest version. If this is what you want, repeatable-read works well for you.
select count(*) from table
within a transaction, locks the talbe on msSQL 2000
If you're using PHP, so when there is a transaction going on, you can set a SESSION variable to tell the script not to do anything with the database, i.e. $_SESSION['on_going_transaction'] = true.
When the transaction is completed, just destroy the SESSION variable so that another transaction can occur. This is much easier.
I have a transaction with a SELECT and possible INSERT. For concurrency reasons, I added FOR UPDATE to the SELECT. To prevent phantom rows, I'm using the SERIALIZABLE transaction isolation level. This all works fine when there are any rows in the table, but not if the table is empty. When the table is empty, the SELECT FOR UPDATE does not do any (exclusive) locking and a concurrent thread/process can issue the same SELECT FOR UPDATE without being locked.
CREATE TABLE t (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
display_order INT
) ENGINE = InnoDB;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT COALESCE(MAX(display_order), 0) + 1 from t FOR UPDATE;
..
This concept works as expected with SQL Server, but not with MySQL. Any ideas on what I'm doing wrong?
EDIT
Adding an index on display_order does not change the behavior.
There's something fun with this, both transaction are ready to get the real lock. As soon as one of the transaction will try to perform an insert the lock will be there. If both transactions try it one will get a deadlock and rollback. If only one of them try it it will get a lock wait timeout.
If you detect the lock wait timeout you can rollback and this will allow the next transaction to perform the insert.
So I think you're likely to get a deadlock exception or a timeout exception quite fast and this should save the situation. But talking about perfect 'serializable' situation this is effectively a bad side effect of empty table. The engine cannot be perfect on all cases, at least No double-transaction-inserts can be done..
I've send yesterday an interesting case of true seriability vs engine seriability, on potsgreSQl documentation, check this example it's funny : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
Update:
Other interesting resource: Does MySQL/InnoDB implement true serializable isolation?
This is probably not a bug.
The way that the different databases implement specific transaction isolation levels is NOT 100% consistent, and there are a lot of edge-cases to consider which behave differently. InnoDB was meant to emulate Oracle, but even there, I believe there are cases where it works differently.
If your application relies on very subtle locking behaviour in specific transaction isolation modes, it is probably broken:
Even if it "works" right now, it might not if somebody changes the database schema
It is unlikely that engineers maintaining your code will understand how it's using the database if it depends upon subtleties of locking
Did you have a look at this document:
http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
If you ask me, mysql wasn't built to be used in that way...
My recomendation is:
If you can affort it -> Lock the whole table.