Insert row if not exists without deadlock - mysql

I have a simple table
CREATE TABLE test (
col INT,
data TEXT,
KEY (col)
);
and a simple transaction
START TRANSACTION;
SELECT * FROM test WHERE col = 4 FOR UPDATE;
-- If no results, generate data and insert
INSERT INTO test SET col = 4, data = 'data';
COMMIT;
I am trying to ensure that two copies of this transaction running concurrently result in no duplicate rows and no deadlocks. I also don't want to incur the cost of generating data for col = 4 more than once.
I have tried:
SELECT .. (without FOR UPDATE or LOCK IN SHARE MODE):
Both transactions see that there are no rows with col = 4 (without acquiring a lock) and both generate data and insert two copies of the row with col = 4.
SELECT .. LOCK IN SHARE MODE
Both transactions acquire a shared lock on col = 4, generate data and attempt to insert a row with col = 4. Both transactions wait for the other to release their shared lock so it can INSERT, resulting in ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction.
SELECT .. FOR UPDATE
I would expect that one transaction's SELECT will succeed and acquire an exclusive lock on col = 4 and the other transaction's SELECT will block waiting for the first.
Instead, both SELECT .. FOR UPDATE queries succeed and the transactions proceed to deadlock just like with SELECT .. LOCK IN SHARE MODE. The exclusive lock on col = 4 just doesn't seem to work.
How can I write this transaction without causing duplicate rows and without deadlock?

Adjust your schema slightly:
CREATE TABLE test (
col INT NOT NULL PRIMARY KEY,
data TEXT
);
With col being a primary key it cannot be duplicated.
Then use the ON DUPLICATE KEY feature:
INSERT INTO test (col, data) VALUES (4, ...)
ON DUPLICATE KEY UPDATE data=VALUES(data)

Maybe this...
START TRANSACTION;
INSERT IGNORE INTO test (col, data) VALUES (4, NULL); -- or ''
-- if Rows_affected() == 0, generate data and replace `data`
UPDATE test SET data = 'data' WHERE col = 4;
COMMIT;
Caution: If the PRIMARY KEY is an AUTO_INCREMENT, this may 'burn' an id.

Note that InnoDB has 2 types of exclusive locks: one is for update and delete, and another one for insert. So to execute your SELECT FOR UPDATE transaction InnoDB will have to first take the lock for update in one transaction, then the second transaction will try to take the same lock and will block waiting for the first transaction (it couldn't have succeeded as you claimed in the question), then when first transaction will try to execute INSERT it will have to change its lock from the lock for update to the lock for insert. The only way InnoDB can do that is first downgrade the lock down to shared one and then upgrade it back to lock for insert. And it can't downgrade the lock when there's another transaction waiting to acquire the exclusive lock as well. That's why in this situation you get a deadlock error.
The only way for you to correctly execute this is to have unique index on col, try to INSERT the row with col = 4 (you can put dummy data if you don't want to generate it before the INSERT), then in case of duplicate key error rollback, and in case INSERT was successful you can UPDATE the row with the correct data.
Note though that if you don't want to incur cost of generating data unnecessarily it probably means that generating it takes a long time, and all that time you'll hold an open transaction that inserted row with col = 4 which will hold all other processes trying to insert the same row hanging. I'm not sure that would be significantly better than generating data first and then inserting it.

If you're goal is to have only one session insert the missing row, and any other sessions do nothing without even attempting an insert of DATA, then you need to either lock the entire table (which reduces your concurrency) or insert an incomplete row and follow it with an update.
A. create a primary key on column COL
Code:
begin
insert into test values (4,null);
update test set data = ... where col = 4;
commit;
exception
when dup_val_on_index then
null;
end;
The first session that attempts the insert on col 4 will succeed and procede to the update where you can do the expensive calculation of DATA. Any other session trying to do this will raise a PK violation (-00001, or DUP_VAL_ON_INDEX) and go to the exception handler which traps it and does nothing (NULL). It will never reach the update statement, so won't do whatever expensive thing it is you do to calculate DATA.
Now, this will cause the other session to wait while the first session calculates DATA and does the update. If you don't want that wait, you can use NOWAIT to cause the lagging sessions to throw an exception immediately if the row is locked. If the row doesn't exist, that will also throw an exception, but a different one. Not great to use exception handling for normal code branches, but hey, it should work.
declare
var_junk number;
begin
begin
select col into var_junk from test where col = 4 for update nowait;
exception
when no_data_found then
insert into test values (col,null);
update test set data = ... where col = 4;
commit;
when others then
null;
end;
end;

Related

Why set a shared lock when duplicate-key error occurs during insert

As Locks Set by Different SQL Statements in InnoDB, INSERT section said,
INSERT sets an exclusive lock on the inserted row.
Prior to inserting the row, a type of gap lock called an insert intention gap lock is set.
If a duplicate-key error occurs, a shared lock on the duplicate index record is set.
I wonder why set a shared lock, as the subsequent doc said, this might lead to a deadlock. Why not simply request an exclusive lock?
This use of a shared lock can result in deadlock should there be multiple sessions trying to insert the same row if another session already has an exclusive lock.
The below example copied from the MySQL document which demonstrate how to trigger the deadlock.
CREATE TABLE t1 (i INT, PRIMARY KEY (i)) ENGINE = InnoDB;
Now suppose that three sessions perform the following operations in order:
Session 1:
START TRANSACTION;
INSERT INTO t1 VALUES(1);
Session 2:
START TRANSACTION;
INSERT INTO t1 VALUES(1);
Session 3:
START TRANSACTION;
INSERT INTO t1 VALUES(1);
Session 1:
ROLLBACK;

MYSQL - Are inserts with autocommit on, considered a single step or multi step process?

In MYSQL if a conditional insert is performed with autocommit ON,
ie.
set autocommit true;
insert into blah (x,y,z) values (1,2,3) where not exists (.....);
Would the above statement be executed atomically and committed at the same time? Or is it possible that there can be a delay between executing the insert and doing the commit?
EDIT:
Updated the insert statement to reflect more accurate query:
set autocommit true;
insert into foo (x,y,z) select 1,2,3 from dual where not exists (select 1 from bar where a = 1);
I want to insert only if a row in another table does not exist. What I want to confirm is that in the below scenario there will be a failure:
SESSION1: insert into foo ..... where not exists (select 1 from bar where a = 1);
SESSION2: insert into bar (a) values (1);
SESSION2: commit;
SESSION1: commit; // should fail here.
The way it works is the same as not using autocommit, but you begin a new transaction, immediately do your INSERT, and then immediately COMMIT without delay.
START TRANSACTION;
INSERT ...
COMMIT;
This is atomic, in the sense that no other client will see your INSERT in a partially-finished state. Atomicity isn't about speed, it's about making sure the change is either committed fully or else not at all. No half-committed state is visible to other sessions.
By the way, the syntax you show, INSERT INTO ... VALUES ... WHERE NOT EXISTS ... is not meaningful. INSERT does not have a WHERE clause. You may be thinking of an INSERT that uses rows output by a SELECT statement:
INSERT INTO ...
SELECT ... FROM ... WHERE ...;
If you do this, you would NOT use a VALUES() clause for your INSERT.
Given your updated question, it cannot work the way you show.
SESSION1: insert into foo ..... where not exists (select 1 from bar where a = 1);
If you use the default transaction isolation level of REPEATABLE-READ, this will acquire a gap lock on bar, where the row where a=1 would exist. It does this to ensure that there is no change to the latest committed entries in the table that the query was reading.
SESSION2: insert into bar (a) values (1);
This causes session to wait, because it cannot lock the gap to insert into. It will time out with an error unless Session 1 commits within innodb_lock_wait_timeout seconds (default 50).

Table lock during transaction contains max()

If I understand correctly this code
START TRANSACTION;
SELECT field FROM table WHERE ... FOR UPDATE ; // single row
UPDATE table SET field = ... ;
COMMIT;
will lock the SELECT row until COMMIT.
But if I use MAX()
START TRANSACTION;
SELECT MAX(field) FROM table WHERE ... FOR UPDATE ; // whole table
UPDATE table SET field = ... ;
COMMIT;
will this code lock the whole table until COMMIT?
EDIT
Sorry, I have my question wrong.
Obviously above code will lock rows affected by WHERE. But it wouldn't lock the table. Meaning
INSERT INTO table() VALUES();
could still took place regardless of COMMIT.
That would mean the return value of
SELECT MAX(field) FROM table WHERE ... FOR UPDATE ;
is now no longer valid.
How to lock the table during transaction so neither INSERT nor UPDATE could took place before COMMIT?
It doesn't matter what you're selecting. FOR UPDATE locks all the rows that have to be examined to evaluate the WHERE clause. Otherwise, another transaction could change the columns that are mentioned there, so the later UPDATE would assign to different rows.
And since inserting a new row can change the value of MAX(field), it actually locks the entire table. When I try your example, and try to insert a new from another transaction, the second transaction blocks until I commit the first transaction.

Deadlock error in inserting bulk of records to MySQL database using trigger and Qt multi threading

I have a multi thread program (In this case in Qt) which every thread at a specific time, writes 500 records in a MySQL table (I call that as A_tbl) and also there is a trigger that inserts some values to other tables (I call them as B_tbl) and after that by some select queries in the trigger, I get the IDs of those inserted records in B_tbl.
But these select queries lead to this error:
Deadlock found when trying to get lock; try restarting transaction QMYSQL3: Unable to execute statement
This is my trigger:
CREATE DEFINER=`root`#`localhost` TRIGGER `IpAn`.`A_tbl_BEFORE_INSERT` BEFORE INSERT ON `A_tbl` FOR EACH ROW
BEGIN
INSERT IGNORE INTO source_names (source_name) VALUES (NEW.source_name);
INSERT IGNORE INTO locations (loc1,loc2) VALUES (loc1,loc2);
SET #source_names_id = (select id from source_names USE INDEX (PRIMARY) where source_name=NEW.source_name);
SET #locations_id = (select id from locations USE INDEX (PRIMARY) where loc1=NEW.loc1 and loc2=NEW.loc2);
...
END
If I change the threads to one, the error will not occur.
How can I solve this?

Row Level Locking in Mysql

I have 5 rows in a table (1 to 5). I want row 2 lock for some update and in the meanwhile if someone tries to update row 4, then he should able to update.
I am trying this with code below, but I feel its placing lock on table level rather than row level.
------ session 1
START TRANSACTION;
SELECT * FROM test WHERE t=1 FOR UPDATE;
UPDATE test SET NAME='irfandd' WHERE t=2;
COMMIT;
----- session 2 (which is being blocked)
START TRANSACTION;
UPDATE test SET NAME='irfandd' WHERE t=4;
COMMIT;
Instead of FOR UPDATE use LOCK IN SHARE MODE. FOR UPDATE prevents other transactions to read the row as well. LOCK IN SHARE MODE allows read, but prevents updating.
Reference: MySQL Manual
------ session 1
START TRANSACTION;
SELECT * FROM test WHERE t=1 LOCK IN SHARE MODE;
UPDATE test SET NAME='irfandd' WHERE t=2;
COMMIT;
----- session 2 (which is not being blocked anymore :) )
START TRANSACTION;
UPDATE test SET NAME='irfandd' WHERE t=4;
COMMIT;
Update:
Realizing that the table has no index on t, I have the following explanation:
First, transaction T1 locks the row 1 in SELECT * FROM test WHERE t=1 FOR UPDATE
Next, transaction T2 tries to execute UPDATE test SET NAME='irfandd' WHERE t=4. To find out which row(s) are affected, it needs to scan all rows, including row 1. But that is locked, so T2 must wait until T1 finishes.
If there is any kind of index, the WHERE t=4 can use the index to decide if row 1 contains t=4 or not, so no need to wait.
Option 1: add an index on test.t so your update can use it.
Option 2: use LOCK IN SHARE MODE, which is intended for putting a read lock only.
Unfortunately this option creates a deadlock. Interestingly, T2 transaction executes (updating row 4), and T1 fails (updating row 2). It seems that T1 read-locks row 4 also, and since T2 modifies it, T1 fails because of the transaction isolation level (REPEATABLE READ by default). The final solution would be playing with Transaction Isolation Levels, using READ UNCOMMITTED or READ COMMITTED transaction levels.
The simplest is Option 1, IMHO, but it's up to your possibilities.
I found below option is more appropriate i generate 40000 numbers from concurrent session on the same time. I didnt found any duplicate number. Without below command i generate 10000 numbers and found 5 duplicate numbers.
START TRANSACTION
SELECT * FROM test WHERE t=1 FOR UPDATE;
UPDATE test SET NAME='irfandd' WHERE t=2;
COMMIT;