MySql: Transactional safety when inserting rows - mysql

I want to execute the following two MySQL statements:
1) SELECT * FROM table1 WHERE field1=val1 FOR UPDATE;
2) UPDATE table1 SET field2=val2 WHERE field1=val1;
It is important that the second statement exactly changes the rows returned by the first statement (no additional rows and no one less). Therefore I execute the transactions with auto_commit=false and use the "for update" version of the select statement.
"For Update" locks all rows it returns, so they are in the original state when the second statement is executed. But what about insertions? Is it possible that another thread inserts a new row with field1=val1 inbetween, which then gets changed by the second statement?
And another question: Does it make a difference if the second statement doesn't changes the rows itself but does something like the following?
3) INSERT INTO table2 (SELECT * FROM table1 WHERE field1=val1)
If (3) is in the same transaction as (1), is then ensured that both selects return exactly the same elements?
edit:
I'm using InnoDB and I read some stuff about next key locking and gap locking.
As far as I understood it, while executing (1), InnoDB would not only lock the selected rows but also the accessed indices.
So am I right to say that this problem doesn't occur if I have an index over column "field1"? What if there is no index for it? Is it different then?

I can't speak to the mySQL case completely, but I rather doubt it. The only thing that SELECT FROM ... FOR UPDATE (appears) to do is preventing other transactions from modifying the given set of rows.
It doesn't restrict future statements - if another row is inserted (by another transaction) before that UPDATE statement runs, it'll be updated as well.
Needless to say, the third statement you've given will also fall prey to the same issue.
What are you attempting to actually do here? It's possible there's another way to accomplish this. For example, if there's some sort of 'insertedAt' timestamp, you could probably just add that as an extra condition.

Related

Does DBAL support SELECT FOR UPDATE update/delete lock on other database users?

I'm using DBAL in my project because it is easier to convert the database statements in an already written project that I'm converting to Symfony v2.8 and MySQL than going with full-on Doctrine, but now I need to implement "read-only row locks" to prevent data changes by other users while a pair of tightly coupled but separate SELECT statements are consecutively executed, and I'm thinking that I should use Transactions and SELECT FOR UPDATE statements. However, I don't see that DBAL supports SELECT FOR UPDATE statements in it's documentation. I do see that Transactions are supported, but as I understand it, these won't prevent other users from UPDATE-ing or DELETE-ing the data in the same data row that the SELECTs statements are using.
Specifically, the two SELECTs share data retrieved in one row by the first SELECT with a second SELECT that retrieve multiple rows from the same tables based on the first SELECT. The two SELECTs are somewhat complex, and I don't know if I could combine them into a super-sized single SELECT, nor do I really want to as that would make the new SELECT harder to maintain in the future.
The problem is that other users could be updating the same values retrieved by the first SELECT and if this done between the the two SELECTs, it would break the second SELECT of the pair and either prevent the second from returning data or at least return the wrong data.
I believe that I need to use a SELECT FOR UPDATE to lock the row that it retrieve to temporarily prevent other users from performing their updates and deletes on the single row retrieved by the first SELECT of the pair, but since I'm not actually performing an update, but rather two SELECTs, how do I release the lock on the one row locked by the first SELECT without performing a 'fake' update, say by UPDATE-ING a column value with the same value it already had?
Thanks
For the transaction you want repeatable results for:
START TRANSACTION READ ONLY
SELECT ...
{some processing}
SELECT {that covers the same rows} [will return the same result]
COMMIT
note: READ ONLY is optional
Experiment by running two mysql client connections and observer the results. The other connection can modify or insert rows covering the first selects criteria and the first transaction won't observe them.

Whether UPDATE statement locks the rows in the table separately or entirely when using InnoDb

Say, we have a table called person like below
CREATE TABLE person (
id INT,
name VARCHAR(30),
point INT
);
I want to update the entire table changing the point of a person according to other's like
UPDATE person SET point=(
SELECT point FROM person WHERE some-condition
);
or, simply just increasing by one, like
UPDATE person SET point=point+1;
When executing the scripts above, which rows will be locked and will other statements wait until the update statement finishes or can be executed between two update operations?
Neither of your update statements has a where clause. (Your first one has a select with a where clause; it's possible you want that where clause to be part of the update, but I am not sure about that.)
That means they'll update all the rows in your person table. Transaction semantics provided by InnoDB says that each row will be locked until the entire update is completed. That is, other updates will be blocked. If you attempt other updates in an order different from the one in this query, you're risking a deadlock.
Other client connection select-queries will see the previous state of the table ... the state at the instant before your update statement began ... until your update statement completes. In many cases InnoDB can do that without delaying its response to the other connections' queries. But sometimes it must delay its response. The biggest delay may come at the end of your update query while InnoDB is committing its results.
Keep this in mind: in order to implement transaction semantics, InnoDB sacrifices the predictability of query performance.
I strongly suggest you avoid doing updates without where clauses where it makes sense to do that. It doesn't in your second (give every person another point) query.

When I SELECT multiple rows FOR UPDATE, can I deadlock?

In MySQL+InnoDB, suppose I have a single table, and two threads which both do "SELECT ... FOR UPDATE". Suppose that both of the SELECT statements end up selecting multiple rows, e.g. both of them end up selecting rows R42 and R99. Is it possible that this will deadlock?
I'm thinking of this situation: the first thread tries to lock R42 then R99, the second thread tries to lock R99 then R42. If I'm unlucky, the two threads will deadlock.
I read in the MySQL Glossary for "deadlock" that
A deadlock can occur when the transactions lock rows in multiple tables (through statements such as UPDATE or SELECT ... FOR UPDATE), but in the opposite order. ...
To reduce the possibility of deadlocks, ... create indexes on the columns used in SELECT ... FOR UPDATE and UPDATE ... WHERE statements.
This hints that in my situation (single table) I won't deadlock, maybe because MySQL automatically tries to lock rows in the order of the primary key, but I want to be certain, and I can't find the proper place in the documentation that tells me exactly what's going on.
From MySQL documentation
InnoDB uses automatic row-level locking. You can get deadlocks even in the case of
transactions that just insert or delete a single row. That is because these operations
are not really “atomic”; they automatically set locks on the (possibly several) index
records of the row inserted or deleted.
http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
So generally, deadlocking is not fatal, you just need to try again, or add the appropriate indexes so that fewer rows are scanned and thus fewer rows are locked.

Will SQL update affect its subquery during the update run?

I'm just composing a complex update query which looks more or less like this:
update table join
(select y, min(x) as MinX
from table
group by y) as t1
using (y)
set x = x - MinX
Which means that the variable x is updated based on the subquery, which also processes variable x - but couldn't this x already be modified by the running update command? Isn't this a problem? I mean, in normal programming you normally have to handle this explicitly, i.e. store new value to some other place from the old value and after the job is done, replace the old value with new... but how will SQL database do this?
I'm not interested in a single observation or experiment. I would like to have a snippet from the docs or sql standard that will say what is the defined behaviour in this case. I'm using MySQL, but answers valid also for other PostgresQL, Oracle, etc. and especially for SQL standard in general are appreciated. Thanks!
** Edited **
Selecting from the target table
From 13.2.9.8. Subqueries in the FROM Clause:
Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
So, yes, you can perform the above query.
The problem
There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.
Locking
In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.
A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.
Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.
Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.
A write lock prevents other queries from obtaining either a read or write lock.
You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.
Deadlocks
Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.
Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.
An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.
Derived tables
For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).
Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.
You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).
A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.
Regards,
Rob.
Oracle has this in the 11.2 Documentation
A consistent
result set is provided for every query, guaranteeing data consistency,
with no action by the user. An implicit query, such as a query implied
by a WHERE clause in an UPDATE statement, is guaranteed a consistent
set of results. However, each statement in an implicit query does not
see the changes made by the DML statement itself, but sees the data as
it existed before changes were made.
Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via
update Table1,
(select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
set Table1.x = Table1.x - PreQuery.MinX
where Table1.y = PreQuery.y
I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

How to make sure there is no race condition in MySQL database when incrementing a field?

How to prevent a race condition in MySQL database when two connections want to update the same record?
For example, connection 1 wants to increase "tries" counter. And the second connection wants to do the same. Both connections SELECT the "tries" count, increase the value and both UPDATE "tries" with the increased value. Suddenly "tries" is only "tries+1" instead of being "tries+2", because both connections got the same "tries" and incremented it by one.
How to solve this problem?
Here's 3 different approaches:
Atomic update
update table set tries=tries+1 where condition=value;
and it will be done atomically.
Use transactions
If you do need to first select the value and update it in your application, you likely need to use transactions. That means you'll have to use InnoDB, not MyISAM tables.
Your query would be something like:
BEGIN; //or any method in the API you use that starts a transaction
select tries from table where condition=value for update;
.. do application logic to add to `tries`
update table set tries=newvalue where condition=value;
END;
if the transaction fails, you might need to manually retry it.
Version scheme
A common approach is to introduce a version column in your table. Your queries would do something like:
select tries,version from table where condition=value;
.. do application logic, and remember the old version value.
update table set tries=newvalue,version=version + 1 where condition=value and version=oldversion;
If that update fails/returns 0 rows affected, someone else has updated the table in the mean time. You have to start all over - that is, select the new values, do the application logic and try the update again.
Use a single statement instead of two. A single UPDATE statement that performs both the read and the write will be atomic and won't conflict with another simultaneous update.
UPDATE table SET tries = tries + 1 WHERE ...
Or you can use transactions to make the two operations atomic.
BEGIN
SELECT ...
UPDATE ...
COMMIT
Or, more primitively, lock the table while you're reading/writing to it.
LOCK TABLES table WRITE
SELECT ...
UPDATE ...
UNLOCK TABLES