SQL table locking race condition - SELECT then INSERT - mysql

Here's a neat locking problem with MariaDB/MySQL.
A server is reassembling multipart SMS messages. Messages arrive in segments. Segments with the same "smsfrom" and "uniqueid" are part of the same message. Segments have a segment number starting from 1 up to "segmenttotal". When all segments of a message have arrived, the message is complete. We have a table of unmatched segments waiting to be reassembled, as follows:
CREATE TABLE frags (
smsfrom TEXT,
uniqueid VARCHAR(32) NOT NULL,
smsbody TEXT,
segmentnum INTEGER NOT NULL,
segmenttotal INTEGER NOT NULL);
When a new segment comes in, we do, in a transaction,
SELECT ... FROM frags WHERE smsfrom = % AND uniqueid = %;
This gets us all the segments received so far. If the new one
plus these has all the segment numbers, we have a complete message.
We send the message off for further processing and delete the fragments involved. Fine.
If not all segments have arrived yet, we do an INSERT of the segment we just got. Autocommit is off, so both operations are part of a transaction. InnoDB engine, incidentally.
This has a race condition. Two segments come in at the same time for a two-segment message, and are processed by separate processes. Process A does the SELECT, finds nothing. Process B does the SELECT, finds nothing. Process A inserts segment 1, no problem. Process B inserts segment 2, no problem. Now we're stuck - all segments are in the table but we didn't notice. So the message is stuck there forever. (In practice we do a purge every few minutes to remove old unmatched stuff, but ignore that for now.)
So what's wrong? The SELECTs lock no rows, because they find nothing.
We need a row lock on a row that doesn't exist yet. Adding FOR UPDATE to the SELECT doesn't help; nothing to lock. Nor does LOCK IN SHARE MODE. Even going to a transaction type of SERIALIZABLE doesn't help, because that's just global LOCK IN SHARE MODE.
OK, so suppose we do the INSERT first and then do a SELECT to see if we have all the segments. Process A does the INSERT of 1, no problem. Process B does the insert of 2, no problem. Process A does a SELECT, and sees only 1. Process B does a SELECT, and sees only 2. That's repeatable read semantics. No good.
The brute force approach is a LOCK TABLE before doing any of this. That ought to work, although it's annoying, because I'm in a transaction involving other tables and LOCK TABLE implies a commit.
Doing a commit after each INSERT might work, but I'm not totally sure.
Is there a more elegant solution?

Why not
1) Process 1. Insert Into your frag table. Nothing else
Insert ....
Commit;
2) Process 2
This find the complete multipart SMS by
select smsfrom, unique, uniqueid,count() from frags group by smsfrom, unique, unique having count() == segmenttotal;
Move them to the new table
delete from frags where smsfrom=<> and unique = <>;
commit;

As I wrote above, I ended up doing this:
INSERT ... -- Insert new fragment.
COMMIT
SELECT ... FROM frags WHERE smsfrom = % AND uniqueid = % FOR UPDATE;
Check if the SELECT returned a complete set of fragments. If so, reassemble and process message, then
DELETE ... FROM FRAGS WHERE smsfrom = % AND uniqueid = %;
Both the COMMIT and the FOR UPDATE are necessary. The COMMIT is needed so that each process sees any INSERT from another process. The FOR UPDATE is needed on the SELECT to row lock all the fragments until the DELETE can be done. Otherwise, two processes might see the complete set of fragments in the SELECT and reassemble and process the message twice.
This is surprisingly complicated for a one-table problem, but seems to work.

Related

How do MySQL Transactions work under the hood? Where does MySQL store the temporary value of a field?

I understand that a MySQL transaction will let you perform multiple inserts/updates at once where either all or none of them will succeed and I understand how to use them.
What I am not sure about is how MySQL manages to hold all the data over long periods of time and what affect this might have performance.
Selecting before another transaction has committed
I have a table of 100 people named "John" and I update every name to "Jane" in a loop inside a transaction, each update takes 1 second meaning it takes 100 seconds to finish the transaction.
If I make a select from another process during that 100 seconds the result will be "John" rather than "Jane". If I make a select after the transaction is committed it will return "Jane".
This is all fine and i'm not really confused about this works.
Selecting within a transaction
This is the more confusing bit.
I have a table of 100 people named "John" and I start a transaction in which I loop through and select each row one by one. Each select query takes 1 second so this takes 100 seconds.
After 50 seconds another process, not within a transaction, updates every row to "Jane".
In my first process, within the transaction, I will still receive "John" as a result even after the update to "Jane" has completed.
To be clear the timing would be like so:
12:00:00 - All rows say John and a select begins in a transaction that takes 1 second per row
12:00:30 - All rows are updated to Jane
12:00:31 - Row 31 is selected from the first transaction and still returns "John" rather than "Jane".
How does it work under the hood
So now I could execute SELECT name FROM names WHERE id = 31 at the exact same time and have one return "John" and one return "Jane" depending on whether I was in a transaction, or when the transaction started.
MySQL must then be storing the value of this field twice in some way.
Does it take a copy?
I don't think it takes a copy of the database or table, since when you begin a transaction it doesn't know what tables you're going to touch. You may not touch a table until 10 minutes into the transaction and yet the data is at it was 10 minutes ago, no matter how many modifications other processes made in the mean time.
I've also experimented with databases and tables that are GB's in size and take minutes to dump, there's no way it's making entire copies.
Temporary hold somewhere?
Perhaps it temporarily holds the value of the field somewhere waiting for the transaction to finish?
It would then need to check if there's a pending value when performing a select.
Therefore doing SELECT name FROM names WHERE id = 31 would be the equivalent of something like:
// John
if (pending_value('names', 'name', 31) {
// Jane
name = get_pending_value('names', 'name', 31);
} else {
// John
name = get_db_value('name', 'name', 31);
}
That is obviously very dumb pseudo code, but it's essentially saying "is there a pending update? If yes, use that instead"
This would presumably be held in memory somewhere? Or perhaps a file? Or one of the system databases?
How does it affect performance
If my names table had 1 billion rows and we performed the same queries then MySQL would simultaneously know that 1 billion rows held the value "John" and that 1 billion rows held the value "Jane". This must surely impact performance.
But is it the query within the transaction that is impacted or the query outside the transaction?
e.g.
Process 1 = Begin transaction
Process 2 = UPDATE names SET name = "Jane"
Process 1 = SELECT name FROM names WHERE id = 31 //John
Process 2 = SELECT name FROM names WHERE id = 31 //Jane
Does the query in step (3) or step (4) have a performance impact or both?
Some clues:
Read about "MVCC" -- MultiVersion Concurrency Control
A tentatively-changed row is kept until COMMIT or ROLLBACK. (See "history list" in the documentation.) This is row-by-row, not whole table or database. It will not "escalate the row locks to a table lock".
Each row of each table has a transaction_id. Each new transaction has a new, higher, id.
That xaction id, together with the "transaction isolation mode", determine which copy of each row your transaction can "see". So, yes, there can briefly be multiple "rows" WHERE id = 31.
Rows are locked, not tables. In some of your examples, transactions ran for a while, then stumbled over the 'same' row.
In some cases, the "gap" between rows is locked. (I did not notice that in your examples.)
Whole tables are locked only for DDL (Drop, Alter, etc), not DML (Select, Update, etc)
When a conflict occurs, a "deadlock" might occur. This is when each transaction is waiting for the other one to release a lock. One transaction is automatically rolled back.
When a conflict occurs, a "lock wait" might occur. This is when the transaction with a lock will eventually let go, letting the waiting transaction continue.
When a conflict occurs and "lock wait" occurs, innodb_lock_wait_timeout controls how long before giving up.
Every statement is inside a transaction. When autocommit=ON, each statement is a transaction unto itself. (Your last example is missing a BEGIN, in which case Process 2 has 2 separate transactions.)
In your first example, the isolation mode of read_uncommitted would let you see the other transaction's changes as they happened. That is a rarely used mode. The other modes won't let you see the changes until they are COMMITted, and it would never see the changes if it were ROLLBACK'd. (Yes, there was a copy of each changed row.)
repeatable_read mode (and others) effectively limit you to seeing only the rows with your transaction_id or older. Hence, even at 12:00:31, you still see "John".
General advice:
Don't write a transaction that runs longer than a few seconds
Remember to use SELECT ... FOR UPDATE where appropriate -- this adds a stronger lock on the rows in the SELECT just in case they will be updated or deleted in the transaction.
Where practical, it is better to have one INSERT adding 100 rows; that will be 10 times as fast as 100 single-row INSERTs. (Similarly for UPDATE and DELETE.)
Use SHOW ENGINE=InnoDB STATUS; (I find it useful in dealing with deadlocks, but cryptic for other purposes.)

Handeling Latency in MySQL Transactions

The Problem
I'm trying to figure out how to correctly set up a transaction in a database, and account for potential latency.
The Setup
In my example I have a table of users, keys, where each user can have multiple keys, and a config table that dictates how many keys each user is allowed to have.
I want to run a stored procedure that:
figures out if the given user is allowed to request a key.
get an available, unclaimed key .
attempts to redeem the key for the given user.
the pseudocode for the procedure would be:
START TRANSACTION
(1) CALL check_permission(...,#result);
IF (#result = 'has_permission') THEN
(2) SET #unclaimed_key_id = (QUERY FOR RETURNING AVAILABLE KEY ID);
(3) CALL claim_key(#unclaimed_key_id);
END IF;
COMMIT;
The problem that I am running into, is that when I simulate lag after step 1, (by using SELECT SLEEP(<seconds>)), it's possible for a given user to redeem multiple keys when they only have permissions to redeem one, by running the procedure in multiple sessions before the first procedure has finished its sleep (which again, is to simulate lag)
Here is the code for the Tables and the Procedures
(note: for the small example I didn't bother with indexes and foreign keys, but obviously I use those on the actual project).
To see my issue just set up the tables and procedures in a database, then open two mysql terminals, and in the first run this:
CALL `P_user_request_key`(10,1,#out);
SELECT #out;
And then quickly (you have 10 seconds) in the second run this:
CALL `P_user_request_key`(0,1,#out);
SELECT #out;
Both queries will successfully return key_claimed and User Bob will end up with 4 keys assigned to him, although the max value in config is set to 3 per user.
The Questions
What is the best way of avoiding issues like this? I'm trying to use a transaction but I feel like It's not going to help specifically with this issue, and may be implementing this wrong.
I realize that one possible way to fix the problem would be to just encapsulate everything in one large update query, but I would prefer to avoid that, since I like being able to set up individual procedures, where each is only meant to do a single task.
The database behind this example is intended to be used by many (thousands) of concurrent users. As such it would be best if one user attempting to redeem a code doesn't block all other users from redeeming one. I'm fine with changing my code to just attempt to redeem again if another user already claimed a key, but it should absolutely not happen that a user can redeem two codes when they only have permission to get one.
You're off the hook for not wanting to encapsulate everything in one large query, because that won't actually solve anything either, it just makes it less likely.
What you need are locks on the rows, or locks on the index where the new row would be inserted.
InnoDB uses an algorithm called next-key locking that combines index-row locking with gap locking. InnoDB performs row-level locking in such a way that when it searches or scans a table index, it sets shared or exclusive locks on the index records it encounters. Thus, the row-level locks are actually index-record locks. In addition, a next-key lock on an index record also affects the “gap” before that index record. That is, a next-key lock is an index-record lock plus a gap lock on the gap preceding the index record. If one session has a shared or exclusive lock on record R in an index, another session cannot insert a new index record in the gap immediately before R in the index order.
http://dev.mysql.com/doc/refman/5.5/en/innodb-next-key-locking.html
So how do we get exclusive locks?
Two connections, mysql1 and mysql2, each of them requesting an exclusive lock using SELECT ... FOR UPDATE. The table 'history' has a column 'user_id' which is indexed. (It's also a foreign key.) There are no rows found, so they both appear to proceed normally as if nothing unusual is going to happen. The user_id 2808 is valid but has nothing in history.
mysql1> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql2> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql1> select * from history where user_id = 2808 for update;
Empty set (0.00 sec)
mysql2> select * from history where user_id = 2808 for update;
Empty set (0.00 sec)
mysql1> insert into history(user_id) values (2808);
... and I don't get my prompt back ... no response ... because another session has a lock, too ... but then:
mysql2> insert into history(user_id) values (2808);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Then mysql1 immediately returns success on the insert.
Query OK, 1 row affected (3.96 sec)
All that is left is for mysql1 to COMMIT and magically, we prevented a user with 0 entries from inserting more than 1 entry. The deadlock occurred because both sessions needed incompatible things to happen: mysql1 needed mysql2 to release its lock before it would be able to commit and mysql2 needed mysql1 to release its lock before it would be able to insert. Somebody has to lose that fight, and generally the thread that has done the least work is the loser.
But what if there had been 1 or more rows already existing when I did the SELECT ... FOR UPDATE? In that case, the lock would have been on the rows, so the second session to try to SELECT would actually block waiting for the SELECT until the first session decided to either COMMIT or ROLLBACK, at which time the second session would have seen an accurate count of the number of rows (including any inserted or deleted by the first session) and could have accurately decided the user already had the maximum allowed.
You can't outrace a race condition, but you can lock them out.

How does getting mysql's last insert ID work with transactions? + transaction questions

A two part question:
In my CodeIgniter script, I'm starting a transaction, then inserting a row, setting the insert_id() to a php variable, inserting more rows into another table using the new ID as a foreign key, and then I commit everything.
So my question is: if everything does not commit before ending the transaction, how is mysql able to return the last insert ID, if nothing was even inserted? My script works (almost) perfectly, with the new ID being used in subsequent queries.
(I say "almost" because, using the PDO mysql driver, sometimes the first insert that is supposed to return the insert_id() is duplicated--it get's inserted twice. Any idea why that would be? Is that related to getting the last ID? It never happens if using the mysqli or mysql driver.)
I first wrote the script without transactions, so I have code that checks for mysql errors along the way, such as:
if(!$this->db->insert($table, $data)) {
//log message here
}
How does this affect the mysql process once I wrapped all my mysql code in a transaction? It's not causing any visible errors (hopefully unrelated to the problem stated above), but should it be removed?
Thank you.
To answer your first question...
When using transactions, your queries are executed normally as far as your connection is concerned. You can choose to commit, saving those changes, or rollback, reverting all of the changes. Consider the following pseudo-code:
insert into number(Random_number) values (rand());
select Random_number from number where Number_id=Last_insert_id();
//php
if($num < 1)
$this->db->query('rollback;'); // This number is too depressing.
else
$this->db->query('commit;'); // This number is just right.
The random number that was generated can be read prior to commit to ensure that it is suitable before saving it for everyone to see (e.g. commit and unlock the row).
If the PDO driver is not working, consider using the mysqli driver. If that is not an option, you can always use the query 'select last_insert_id() as id;' rather than the $this->db->insert_id() function.
To answer your second question, if you are inserting or updating data that other models will be updating or reading, be sure to use transactions. For example, if a column 'Number_remaining' is set to 1 the following problem can occur.
Person A reads 1
Person B reads 1
Person A wins $1000!
Person A updates 1 to be 0
Person B wins $1000!
Person B updates 0 to be 0
Using transactions in the same situation would yield this result:
Person A starts transaction Person A reads '1' from
Number_remaining (The row is now locked if select for update is used) Person B
attempts to read Number_remaining - forced to wait Person A wins
$1000 Person A updates 1 to be 0 Person A commits Person B
reads 0 Person B does not win $1000 Person B cries
You may want to read up on transaction isolation levels as well.
Be careful of deadlock, which can occur in this case:
Person A reads row 1 (select ... for update) Person B reads row
2 (select ... for update) Person A attempts to read row 2,
forced to wait Person B attempts to read row 1, forced to wait
Person A reaches innodb_lock_wait_timeout (default 50sec) and is
disconnected Person B reads row 1 and continues normally
At the end, since Person B has probably reached PHP's max_execution_time, the current query will finish executing independently of PHP, but no further queries will be received. If this was a transaction with autocommit=0, the query will automatically rollback when the connection to your PHP server is severed.

While in a transaction, how can reads to an affected row be prevented until the transaction is done?

I'm fairly sure this has a simple solution, but I haven't been able to find it so far. Provided an InnoDB MySQL database with the isolation level set to SERIALIZABLE, and given the following operation:
BEGIN WORK;
SELECT * FROM users WHERE userID=1;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
I would like to make sure that as soon as the select inside the transaction is issued, the row corresponding to userID=1 is locked for reads until the transaction is done. As it stands now, UPDATEs to this row will wait for the transaction to be finished if it is in process, but SELECTs simply will read the previous value. I understand this is the expected behaviour in this case, but I wonder if there is a way to lock the row in such a way that SELECTs will also wait until the transaction is finished to return the values?
The reason I'm looking for that is that at some point, and with enough concurrent users, it could happen that while the previous transaction is in process someone else reads the "credits" to calculate something else. Ideally the code run by that someone else should wait for the transaction to finish to use the new value, because otherwise it could lead to irreversible desync issues.
Note that I don't want to lock the entire table for reads, just the specific row.
Also, I could add a boolean "locked" field to the tables and set it to 1 every time I'm starting a transaction but I don't really feel this is the most elegant solution here, unless there is absolutely no other way to handle this through mysql directly.
I found a workaround, specifically:
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on the rows
read. A shared mode lock enables other sessions to read the rows but
not to modify them. The rows read are the latest available, so if they
belong to another transaction that has not yet committed, the read
blocks until that transaction ends.
(Source)
It seems that one can just include LOCK IN SHARE MODE in the critical SELECT statements that rely on transactional data and they will indeed wait for current transactions to finish before retrieving the row/s. For this to work the transaction has to use FOR UPDATE explicitly (as opposed to the original example I gave). E.g., given the following:
BEGIN WORK;
SELECT * FROM users WHERE userID=1 FOR UPDATE;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
Anywhere else in the code I could use:
SELECT * FROM users WHERE userID=1 LOCK IN SHARE MODE;
Since this statement is not wrapped in a transaction, the lock is released immediately, thus having no impacts in subsequent queries, but if the row involving userID=1 has been selected for update within a transaction this statement would wait until the transaction is done, which is exactly what I was looking for.
You could try the SELECT ... FOR UPDATE locking read.
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
Please go through the following site: http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html

mysql insert race condition

How do you stop race conditions in MySQL? the problem at hand is caused by a simple algorithm:
select a row from table
if it doesn't exist, insert it
and then either you get a duplicate row, or if you prevent it via unique/primary keys, an error.
Now normally I'd think transactions help here, but because the row doesn't exist, the transaction don't actually help (or am I missing something?).
LOCK TABLE sounds like an overkill, especially if the table is updated multiple times per second.
The only other solution I can think of is GET_LOCK() for every different id, but isn't there a better way? Are there no scalability issues here as well? And also, doing it for every table sounds a bit unnatural, as it sounds like a very common problem in high-concurrency databases to me.
what you want is LOCK TABLES
or if that seems excessive how about INSERT IGNORE with a check that the row was actually inserted.
If you use the IGNORE keyword, errors
that occur while executing the INSERT
statement are treated as warnings
instead.
It seems to me you should have a unique index on your id column, so a repeated insert would trigger an error instead of being blindingly accepted again.
That can be done by defining the id as a primary key or using a unique index by itself.
I think the first question you need to ask is why do you have many threads doing the exact SAME work? Why would they have to insert the exact same row?
After that being answered, I think that just ignoring the errors will be the most performant solution, but measure both approaches (GET_LOCK v/s ignore errors) and see for yourself.
There is no other way that I know of. Why do you want to avoid errors? You still have to code for the case when another type of error occurs.
As staticsan says transactions do help but, as they usually are implied, if two inserts are ran by different threads, they will both be inside an implied transactions and see consistent views of the database.
Locking the entire table is indeed overkill. To get the effect that you want, you need something that the litterature calls "predicate locks". No one has ever seen those except printed on the paper that academic studies are published on. The next best thing are locks on the "access paths" to the data (in some DBMS's : "page locks").
Some non-SQL systems allow you to do both (1) and (2) in one single statement, more or less meaning the potential race conditions arising from your OS suspending your execution thread right between (1) and (2), are entirely eliminated.
Nevertheless, in the absence of predicate locks such systems will still need to resort to some kind of locking scheme, and the finer the "granularity" (/"scope") of the locks it takes, the better for concurrency.
(And to conclude : some DBMS's - especially the ones you don't have to pay for - do indeed offer no finer lock granularity than "the entire table".)
On a technical level, a transaction will help here because other threads won't see the new row until you commit the transaction.
But in practice that doesn't solve the problem - it only moves it. Your application now needs to check whether the commit fails and decide what to do. I would normally have it rollback what you did, and restart the transaction because now the row will be visible. This is how transaction-based programmer is supposed to work.
I ran into the same problem and searched the Net for a moment :)
Finally I came up with solution similar to method to creating filesystem objects in shared (temporary) directories to securely open temporary files:
$exists = $success = false;
do{
$exists = check();// select a row in the table
if (!$exists)
$success = create_record();
if ($success){
$exists = true;
}else if ($success != ERROR_DUP_ROW){
log_error("failed to create row not 'coz DUP_ROW!");
break;
}else{
//probably other process has already created the record,
//so try check again if exists
}
}while(!$exists)
Don't be afraid of busy-loop - normally it will execute once or twice.
You prevent duplicate rows very simply by putting unique indexes on your tables. That has nothing to do with LOCKS or TRANSACTIONS.
Do you care if an insert fails because it's a duplicate? Do you need to be notified if it fails? Or is all that matters that the row was inserted, and it doesn't matter by whom or how many duplicates inserts failed?
If you don't care, then all you need is INSERT IGNORE. There is no need to think about transactions or table locks at all.
InnoDB has row level locking automatically, but that applies only to updates and deletes. You are right that it does not apply to inserts. You can't lock what doesn't yet exist!
You can explicitly LOCK the entire table. But if your purpose is to prevent duplicates, then you are doing it wrong. Again, use a unique index.
If there is a set of changes to be made and you want an all-or-nothing result (or even a set of all-or-nothing results within a larger all-or-nothing result), then use transactions and savepoints. Then use ROLLBACK or ROLLBACK TO SAVEPOINT *savepoint_name* to undo changes, including deletes, updates and inserts.
LOCK tables is not a replacement for transactions, but it is your only option with MyISAM tables, which do not support transactions. You can also use it with InnoDB tables if row-level level locking isn't enough. See this page for more information on using transactions with lock table statements.
I have a similar issue. I have a table that under most circumstances should have a unique ticket_id value, but there are some cases where I will have duplicates; not the best design, but it is what it is.
User A checks to see if the ticket is reserved, it isn't
User B checks to see if the ticket is reserved, it isn't
User B inserts a 'reserved' record into the table for that ticket
User A inserts a 'reserved' record into the table for that ticket
User B check for duplicate? Yes, is my record newer? Yes, leave it
User A check for duplicate? Yes, is my record newer? No, delete it
User B has reserved the ticket, User A reports back that the ticket has been taken by someone else.
The key in my instance is that you need a tie-breaker, in my case it's the auto-increment id on the row.
In case insert ignore doesnt fit for you as suggested in the accepted answer , so according to the requirements in your question :
1] select a row from table
2] if it doesn't exist, insert it
Another possible approach is to add a condition to the insert sql statement,
e.g :
INSERT INTO table_listnames (name, address, tele)
SELECT * FROM (SELECT 'Rupert', 'Somewhere', '022') AS tmp
WHERE NOT EXISTS (
SELECT name FROM table_listnames WHERE name = 'Rupert'
) LIMIT 1;
Reference:
https://stackoverflow.com/a/3164741/179744