Will SQL update affect its subquery during the update run? - mysql

I'm just composing a complex update query which looks more or less like this:
update table join
(select y, min(x) as MinX
from table
group by y) as t1
using (y)
set x = x - MinX
Which means that the variable x is updated based on the subquery, which also processes variable x - but couldn't this x already be modified by the running update command? Isn't this a problem? I mean, in normal programming you normally have to handle this explicitly, i.e. store new value to some other place from the old value and after the job is done, replace the old value with new... but how will SQL database do this?
I'm not interested in a single observation or experiment. I would like to have a snippet from the docs or sql standard that will say what is the defined behaviour in this case. I'm using MySQL, but answers valid also for other PostgresQL, Oracle, etc. and especially for SQL standard in general are appreciated. Thanks!

** Edited **
Selecting from the target table
From 13.2.9.8. Subqueries in the FROM Clause:
Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
So, yes, you can perform the above query.
The problem
There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.
Locking
In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.
A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.
Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.
Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.
A write lock prevents other queries from obtaining either a read or write lock.
You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.
Deadlocks
Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.
Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.
An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.
Derived tables
For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).
Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.
You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).

A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.
Regards,
Rob.

Oracle has this in the 11.2 Documentation
A consistent
result set is provided for every query, guaranteeing data consistency,
with no action by the user. An implicit query, such as a query implied
by a WHERE clause in an UPDATE statement, is guaranteed a consistent
set of results. However, each statement in an implicit query does not
see the changes made by the DML statement itself, but sees the data as
it existed before changes were made.

Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via
update Table1,
(select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
set Table1.x = Table1.x - PreQuery.MinX
where Table1.y = PreQuery.y
I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

Related

Do Sql Update Statements run at the same time if requested at the same time?

If two independent scripts call a database with update requests to the same field, but with different values, would they execute at the same time and one overwrite the other?
as an example to help ensure clarity, imagine both of these statements being requested to run at the same time, each by a different script, where Status = 2 is called microseconds after Status = 1 by coincidence.
Update My_Table SET Status = 1 WHERE Status= 0;
Update My_Table SET Status = 2 WHERE Status= 0;
What would my results be and why? if other factors play a roll, expand on them as much as you please, this is meant to be a general idea.
Side Note:
Because i know people will still ask, my situation is using MySql with Google App Engine, but i don't want to limit this question to just me should it be useful to others. I am using Status as an identifier for what script is doing stuff to the field. if status is not 0, no other script is allowed to touch it.
This is what locking is for. All major SQL implementations lock DML statements by default so that one query won't overwrite another before the first is complete.
There are different levels of locking. If you've got row locking then your second update will run in parallel with the first, so at some point you'll have 1s and 2s in your table.
Table locking would force the second query to wait for the first query to completely finish to release it's table lock.
You can usually turn off locking right in your SQL, but it's only ever done if you need a performance boost and you know you won't encounter race conditions like in your example.
Edits based on the new MySQL tag
If you're updating a table that used the InnoDB engine, then you're working with row locking, and your query could yield a table with both 1s and 2s.
If you're working with a table that uses the MyISAM engine, then you're working with table locking, and your update statements would end up with a table that would either have all 1s or all 2s.
from https://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html (MySql)
Normally, you do not need to lock tables, because all single UPDATE statements are atomic; no other session can interfere with any other currently executing SQL statement. However, there are a few cases when locking tables may provide an advantage:
from https://msdn.microsoft.com/en-us/library/ms177523.aspx (sql server)
An UPDATE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive lock, no other transactions can modify data.
If you were having two separate connections executing the two posted update statements, whichever statement was started first, would be the one that completed. THe other statement would not update the data as there would no longer be records with a status of 0
The short answer is: it depends on which statement commits first. Just because one process started an update statement before another doesn't mean that it will complete before another. It might not get scheduled first, it might be blocked by another process, etc.
Ultimately, it's a race condition: the operation that completes (and commits) last, wins.
Since you have TWO scripts doing the same thing and using different values for the UPDATE, they will NOT run at the same time, one of the scripts will run before even if you think you are calling them at the same time. You need to specify WHEN each script should run, otherwise the program will not know what should be 1 and what should be 2.

Making an INSERT ... SELECT statement atomic

I have two tables: one stores data and the other stores locks to indicate when a user is operating on that data. I'd like to select some number of items from the first table, such that they match several conditions and do not have a corresponding lock in the other table, and then add locks for these items to the second table. Since many users may simultaneously attempt to lock items it will be necessary for this to be done atomically.
I've written the SQL statement below to attempt to do this, but I receive the error Deadlock found when trying to get lock;.
INSERT INTO table2 (id, user, date)
SELECT id, ?, NOW()
FROM table1
LEFT JOIN table2 USING id
WHERE locked IS NULL AND <several conditions on table1>
ORDER BY date 'DESC'
LIMIT 15;
Is there any way to make this an atomic operation without locking the tables? Currently I'm using a transaction and reattempting if it's unsuccessful, but I'm interested in whether this is avoidable. I'm using MySQL version 5.0.95 with InnoDB.
Thanks
EDIT
Having given this some further thought I've realised that whilst locking table1 is unacceptable, I can lock table2. Since I can't actually lock the table in the statement (since I have to lock all tables if I choose to lock one of them) I can instead use GET_LOCK to create a mutex preventing multiple processes calling this code simultaneously. I've not yet had a chance to test this approach, but it feels like it might be a more lightweight solution than transactions.
No. This is what the transactions are all about. They organize a bunch of statements in one atomic operation which either succeeds or fails as a whole.
Here you could find some explanation for optimistic and pesimistic lock which perhaps you could find useful. Here you could find some details about the locking mechanism used in InnoDB (pesimistic locking). Here you could find guidelines on how to implement the optimistic locking in mysql.

Whether UPDATE statement locks the rows in the table separately or entirely when using InnoDb

Say, we have a table called person like below
CREATE TABLE person (
id INT,
name VARCHAR(30),
point INT
);
I want to update the entire table changing the point of a person according to other's like
UPDATE person SET point=(
SELECT point FROM person WHERE some-condition
);
or, simply just increasing by one, like
UPDATE person SET point=point+1;
When executing the scripts above, which rows will be locked and will other statements wait until the update statement finishes or can be executed between two update operations?
Neither of your update statements has a where clause. (Your first one has a select with a where clause; it's possible you want that where clause to be part of the update, but I am not sure about that.)
That means they'll update all the rows in your person table. Transaction semantics provided by InnoDB says that each row will be locked until the entire update is completed. That is, other updates will be blocked. If you attempt other updates in an order different from the one in this query, you're risking a deadlock.
Other client connection select-queries will see the previous state of the table ... the state at the instant before your update statement began ... until your update statement completes. In many cases InnoDB can do that without delaying its response to the other connections' queries. But sometimes it must delay its response. The biggest delay may come at the end of your update query while InnoDB is committing its results.
Keep this in mind: in order to implement transaction semantics, InnoDB sacrifices the predictability of query performance.
I strongly suggest you avoid doing updates without where clauses where it makes sense to do that. It doesn't in your second (give every person another point) query.

MySql: Transactional safety when inserting rows

I want to execute the following two MySQL statements:
1) SELECT * FROM table1 WHERE field1=val1 FOR UPDATE;
2) UPDATE table1 SET field2=val2 WHERE field1=val1;
It is important that the second statement exactly changes the rows returned by the first statement (no additional rows and no one less). Therefore I execute the transactions with auto_commit=false and use the "for update" version of the select statement.
"For Update" locks all rows it returns, so they are in the original state when the second statement is executed. But what about insertions? Is it possible that another thread inserts a new row with field1=val1 inbetween, which then gets changed by the second statement?
And another question: Does it make a difference if the second statement doesn't changes the rows itself but does something like the following?
3) INSERT INTO table2 (SELECT * FROM table1 WHERE field1=val1)
If (3) is in the same transaction as (1), is then ensured that both selects return exactly the same elements?
edit:
I'm using InnoDB and I read some stuff about next key locking and gap locking.
As far as I understood it, while executing (1), InnoDB would not only lock the selected rows but also the accessed indices.
So am I right to say that this problem doesn't occur if I have an index over column "field1"? What if there is no index for it? Is it different then?
I can't speak to the mySQL case completely, but I rather doubt it. The only thing that SELECT FROM ... FOR UPDATE (appears) to do is preventing other transactions from modifying the given set of rows.
It doesn't restrict future statements - if another row is inserted (by another transaction) before that UPDATE statement runs, it'll be updated as well.
Needless to say, the third statement you've given will also fall prey to the same issue.
What are you attempting to actually do here? It's possible there's another way to accomplish this. For example, if there's some sort of 'insertedAt' timestamp, you could probably just add that as an extra condition.

Dummies guide to locking in innodb

The typical documentation on locking in innodb is way too confusing. I think it will be of great value to have a "dummies guide to innodb locking"
I will start, and I will gather all responses as a wiki:
The column needs to be indexed before row level locking applies.
EXAMPLE: delete row where column1=10; will lock up the table unless column1 is indexed
Here are my notes from working with MySQL support on a recent, strange locking issue (version 5.1.37):
All rows and index entries traversed to get to the rows being changed will be locked. It's covered at:
http://dev.mysql.com/doc/refman/5.1/en/innodb-locks-set.html
"A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. ... If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table."
That is a MAJOR headache if true.
It is. A workaround that is often helpful is to do:
UPDATE whichevertable set whatever to something where primarykey in (select primarykey from whichevertable where constraints order by primarykey);
The inner select doesn't need to take locks and the update will then have less work to do for the updating. The order by clause ensures that the update is done in primary key order to match InnoDB's physical order, the fastest way to do it.
Where large numbers of rows are involved, as in your case, it can be better to store the select result in a temporary table with a flag column added. Then select from the temporary table where the flag is not set to get each batch. Run updates with a limit of say 1000 or 10000 and set the flag for the batch after the update. The limits will keep the amount of locking to a tolerable level while the select work will only have to be done once. Commit after each batch to release the locks.
You can also speed this work up by doing a select sum of an unindexed column before doing each batch of updates. This will load the data pages into the buffer pool without taking locks. Then the locking will last for a shorter timespan because there won't be any disk reads.
This isn't always practical but when it is it can be very helpful. If you can't do it in batches you can at least try the select first to preload the data, if it's small enough to fit into the buffer pool.
If possible use the READ COMMITTED transaction isolation mode. See:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
To get that reduced locking requires use of row-based binary logging (rather than the default statement based binary logging).
Two known issues:
Subqueries can be less than ideally optimised sometimes. In this case it was an undesirable dependent subquery - the suggestion I made to use a subquery turned out to be unhelpful compared to the alternative in this case because of that.
Deletes and updates do not have the same range of query plans as select statements so sometimes it's hard to properly optimise them without measuring the results to work out exactly what they are doing.
Both of these are gradually improving. This bug is one example where we've just improved the optimisations available for an update, though the changes are significant and it's still going through QA to be sure it doesn't have any great adverse effects:
http://bugs.mysql.com/bug.php?id=36569