Making an INSERT ... SELECT statement atomic - mysql

I have two tables: one stores data and the other stores locks to indicate when a user is operating on that data. I'd like to select some number of items from the first table, such that they match several conditions and do not have a corresponding lock in the other table, and then add locks for these items to the second table. Since many users may simultaneously attempt to lock items it will be necessary for this to be done atomically.
I've written the SQL statement below to attempt to do this, but I receive the error Deadlock found when trying to get lock;.
INSERT INTO table2 (id, user, date)
SELECT id, ?, NOW()
FROM table1
LEFT JOIN table2 USING id
WHERE locked IS NULL AND <several conditions on table1>
ORDER BY date 'DESC'
LIMIT 15;
Is there any way to make this an atomic operation without locking the tables? Currently I'm using a transaction and reattempting if it's unsuccessful, but I'm interested in whether this is avoidable. I'm using MySQL version 5.0.95 with InnoDB.
Thanks
EDIT
Having given this some further thought I've realised that whilst locking table1 is unacceptable, I can lock table2. Since I can't actually lock the table in the statement (since I have to lock all tables if I choose to lock one of them) I can instead use GET_LOCK to create a mutex preventing multiple processes calling this code simultaneously. I've not yet had a chance to test this approach, but it feels like it might be a more lightweight solution than transactions.

No. This is what the transactions are all about. They organize a bunch of statements in one atomic operation which either succeeds or fails as a whole.
Here you could find some explanation for optimistic and pesimistic lock which perhaps you could find useful. Here you could find some details about the locking mechanism used in InnoDB (pesimistic locking). Here you could find guidelines on how to implement the optimistic locking in mysql.

Related

How to perform check-and-insert in mysql?

In my code I need to do the following:
Check a MySQL table (InnoDB) if a particular row (matching some criteria) exists. If it does, return it. If it doesn't, create it and then return it.
The problem I seem to have is race conditions. Every now and then two processes run so closely together, that they both check the table at the same time, don't see the row, and both insert it - thus duplicate data.
I'm reading MySQL documentation trying to come up with some way to prevent this. What I've come up so far:
Unique indexes seem to be one option, but they're not universal (it only works when the criteria is something unique for all rows).
Transactions even at SERIALIZABLE level don't protect against INSERT, period.
Neither do SELECT ... LOCK IN SHARE MODE or SELECT ... FOR UPDATE.
A LOCK TABLE ... WRITE would do it, but it's a very drastic measure - other processes won't be able to read from the table, and I need to lock ALL tables that I intend to use until I unlock them.
Basically, I'd like to do either of the following:
Prevent all INSERT to the table from processes other than mine, while allowing SELECT/UPDATE (this is probably impossible because it make so little sense most of the time).
Organize some sort of manual locking. The two processes would coordinate among themselves which one gets to do the select/insert dance, while the other waits. This needs some sort of operation that waits until the lock is released. I could probably implement a spin-lock (one process repeatedly checks if the other has released the lock), but I'm afraid that it would be too resource intensive.
I think I found an answer myself. Transactions + SELECT ... FOR UPDATE in an InnoDB table can provide a synchronization lock (aka mutex). Have all processes lock on a specific row in a specific table before they start their work. Then only one will be able to run at a time and the rest will wait until the first one finishes its transaction.

When I SELECT multiple rows FOR UPDATE, can I deadlock?

In MySQL+InnoDB, suppose I have a single table, and two threads which both do "SELECT ... FOR UPDATE". Suppose that both of the SELECT statements end up selecting multiple rows, e.g. both of them end up selecting rows R42 and R99. Is it possible that this will deadlock?
I'm thinking of this situation: the first thread tries to lock R42 then R99, the second thread tries to lock R99 then R42. If I'm unlucky, the two threads will deadlock.
I read in the MySQL Glossary for "deadlock" that
A deadlock can occur when the transactions lock rows in multiple tables (through statements such as UPDATE or SELECT ... FOR UPDATE), but in the opposite order. ...
To reduce the possibility of deadlocks, ... create indexes on the columns used in SELECT ... FOR UPDATE and UPDATE ... WHERE statements.
This hints that in my situation (single table) I won't deadlock, maybe because MySQL automatically tries to lock rows in the order of the primary key, but I want to be certain, and I can't find the proper place in the documentation that tells me exactly what's going on.
From MySQL documentation
InnoDB uses automatic row-level locking. You can get deadlocks even in the case of
transactions that just insert or delete a single row. That is because these operations
are not really “atomic”; they automatically set locks on the (possibly several) index
records of the row inserted or deleted.
http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
So generally, deadlocking is not fatal, you just need to try again, or add the appropriate indexes so that fewer rows are scanned and thus fewer rows are locked.

Will SQL update affect its subquery during the update run?

I'm just composing a complex update query which looks more or less like this:
update table join
(select y, min(x) as MinX
from table
group by y) as t1
using (y)
set x = x - MinX
Which means that the variable x is updated based on the subquery, which also processes variable x - but couldn't this x already be modified by the running update command? Isn't this a problem? I mean, in normal programming you normally have to handle this explicitly, i.e. store new value to some other place from the old value and after the job is done, replace the old value with new... but how will SQL database do this?
I'm not interested in a single observation or experiment. I would like to have a snippet from the docs or sql standard that will say what is the defined behaviour in this case. I'm using MySQL, but answers valid also for other PostgresQL, Oracle, etc. and especially for SQL standard in general are appreciated. Thanks!
** Edited **
Selecting from the target table
From 13.2.9.8. Subqueries in the FROM Clause:
Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
So, yes, you can perform the above query.
The problem
There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.
Locking
In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.
A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.
Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.
Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.
A write lock prevents other queries from obtaining either a read or write lock.
You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.
Deadlocks
Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.
Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.
An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.
Derived tables
For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).
Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.
You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).
A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.
Regards,
Rob.
Oracle has this in the 11.2 Documentation
A consistent
result set is provided for every query, guaranteeing data consistency,
with no action by the user. An implicit query, such as a query implied
by a WHERE clause in an UPDATE statement, is guaranteed a consistent
set of results. However, each statement in an implicit query does not
see the changes made by the DML statement itself, but sees the data as
it existed before changes were made.
Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via
update Table1,
(select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
set Table1.x = Table1.x - PreQuery.MinX
where Table1.y = PreQuery.y
I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

how to avoid deadlock in mysql

I have the following query (all tables are innoDB)
INSERT INTO busy_machines(machine)
SELECT machine FROM all_machines
WHERE machine NOT IN (SELECT machine FROM busy_machines)
and machine_name!='Main'
LIMIT 1
Which causes a deadlock when I run it in threads, obviously because of the inner select, right?
The error I get is:
(1213, 'Deadlock found when trying to get lock; try restarting transaction')
How can I avoid the deadlock? Is there a way to change to query to make it work, or do I need to do something else?
The error doesn't happen always, of course, only after running this query lots of times and in several threads.
To my understanding, a select does not acquire lock and should not be the cause of the deadlock.
Each time you insert/update/or delete a row, a lock is acquired. To avoid deadlock, you must then make sure that concurrent transactions don't update row in an order that could result in a deadlock. Generally speaking, to avoid deadlock you must acquire lock always in the same order even in different transaction (e.g. always table A first, then table B).
But if within one transaction you insert in only one table this condition is met, and this should then usually not lead to a deadlock. Are you doing something else in the transaction?
A deadlock can however happen if there are missing indexes. When a row in inserted/update/delete, the database need to check the relational constraints, that is, make sure the relations are consistent. To do so, the database needs to check the foreign keys in the related tables. It might result in other lock being acquired than the row that is modified. Be sure then to always have index on the foreign keys (and of course primary keys), otherwise it could result in a table lock instead of a row lock. If table lock happen, the lock contention is higher and the likelihood of deadlock increases.
Not sure what happens exactly in your case, but maybe it helps.
You will probably get better performance if you replace your "NOT IN" with an outer join.
You can also separate this into two queries to avoid inserting and selecting the same table in a single query.
Something like this:
SELECT a.machine
into #machine
FROM all_machines a
LEFT OUTER JOIN busy_machines b on b.machine = a.machine
WHERE a.machine_name!='Main'
and b.machine IS NULL
LIMIT 1;
INSERT INTO busy_machines(machine)
VALUES (#machine);

mysql insert race condition

How do you stop race conditions in MySQL? the problem at hand is caused by a simple algorithm:
select a row from table
if it doesn't exist, insert it
and then either you get a duplicate row, or if you prevent it via unique/primary keys, an error.
Now normally I'd think transactions help here, but because the row doesn't exist, the transaction don't actually help (or am I missing something?).
LOCK TABLE sounds like an overkill, especially if the table is updated multiple times per second.
The only other solution I can think of is GET_LOCK() for every different id, but isn't there a better way? Are there no scalability issues here as well? And also, doing it for every table sounds a bit unnatural, as it sounds like a very common problem in high-concurrency databases to me.
what you want is LOCK TABLES
or if that seems excessive how about INSERT IGNORE with a check that the row was actually inserted.
If you use the IGNORE keyword, errors
that occur while executing the INSERT
statement are treated as warnings
instead.
It seems to me you should have a unique index on your id column, so a repeated insert would trigger an error instead of being blindingly accepted again.
That can be done by defining the id as a primary key or using a unique index by itself.
I think the first question you need to ask is why do you have many threads doing the exact SAME work? Why would they have to insert the exact same row?
After that being answered, I think that just ignoring the errors will be the most performant solution, but measure both approaches (GET_LOCK v/s ignore errors) and see for yourself.
There is no other way that I know of. Why do you want to avoid errors? You still have to code for the case when another type of error occurs.
As staticsan says transactions do help but, as they usually are implied, if two inserts are ran by different threads, they will both be inside an implied transactions and see consistent views of the database.
Locking the entire table is indeed overkill. To get the effect that you want, you need something that the litterature calls "predicate locks". No one has ever seen those except printed on the paper that academic studies are published on. The next best thing are locks on the "access paths" to the data (in some DBMS's : "page locks").
Some non-SQL systems allow you to do both (1) and (2) in one single statement, more or less meaning the potential race conditions arising from your OS suspending your execution thread right between (1) and (2), are entirely eliminated.
Nevertheless, in the absence of predicate locks such systems will still need to resort to some kind of locking scheme, and the finer the "granularity" (/"scope") of the locks it takes, the better for concurrency.
(And to conclude : some DBMS's - especially the ones you don't have to pay for - do indeed offer no finer lock granularity than "the entire table".)
On a technical level, a transaction will help here because other threads won't see the new row until you commit the transaction.
But in practice that doesn't solve the problem - it only moves it. Your application now needs to check whether the commit fails and decide what to do. I would normally have it rollback what you did, and restart the transaction because now the row will be visible. This is how transaction-based programmer is supposed to work.
I ran into the same problem and searched the Net for a moment :)
Finally I came up with solution similar to method to creating filesystem objects in shared (temporary) directories to securely open temporary files:
$exists = $success = false;
do{
$exists = check();// select a row in the table
if (!$exists)
$success = create_record();
if ($success){
$exists = true;
}else if ($success != ERROR_DUP_ROW){
log_error("failed to create row not 'coz DUP_ROW!");
break;
}else{
//probably other process has already created the record,
//so try check again if exists
}
}while(!$exists)
Don't be afraid of busy-loop - normally it will execute once or twice.
You prevent duplicate rows very simply by putting unique indexes on your tables. That has nothing to do with LOCKS or TRANSACTIONS.
Do you care if an insert fails because it's a duplicate? Do you need to be notified if it fails? Or is all that matters that the row was inserted, and it doesn't matter by whom or how many duplicates inserts failed?
If you don't care, then all you need is INSERT IGNORE. There is no need to think about transactions or table locks at all.
InnoDB has row level locking automatically, but that applies only to updates and deletes. You are right that it does not apply to inserts. You can't lock what doesn't yet exist!
You can explicitly LOCK the entire table. But if your purpose is to prevent duplicates, then you are doing it wrong. Again, use a unique index.
If there is a set of changes to be made and you want an all-or-nothing result (or even a set of all-or-nothing results within a larger all-or-nothing result), then use transactions and savepoints. Then use ROLLBACK or ROLLBACK TO SAVEPOINT *savepoint_name* to undo changes, including deletes, updates and inserts.
LOCK tables is not a replacement for transactions, but it is your only option with MyISAM tables, which do not support transactions. You can also use it with InnoDB tables if row-level level locking isn't enough. See this page for more information on using transactions with lock table statements.
I have a similar issue. I have a table that under most circumstances should have a unique ticket_id value, but there are some cases where I will have duplicates; not the best design, but it is what it is.
User A checks to see if the ticket is reserved, it isn't
User B checks to see if the ticket is reserved, it isn't
User B inserts a 'reserved' record into the table for that ticket
User A inserts a 'reserved' record into the table for that ticket
User B check for duplicate? Yes, is my record newer? Yes, leave it
User A check for duplicate? Yes, is my record newer? No, delete it
User B has reserved the ticket, User A reports back that the ticket has been taken by someone else.
The key in my instance is that you need a tie-breaker, in my case it's the auto-increment id on the row.
In case insert ignore doesnt fit for you as suggested in the accepted answer , so according to the requirements in your question :
1] select a row from table
2] if it doesn't exist, insert it
Another possible approach is to add a condition to the insert sql statement,
e.g :
INSERT INTO table_listnames (name, address, tele)
SELECT * FROM (SELECT 'Rupert', 'Somewhere', '022') AS tmp
WHERE NOT EXISTS (
SELECT name FROM table_listnames WHERE name = 'Rupert'
) LIMIT 1;
Reference:
https://stackoverflow.com/a/3164741/179744