Does MySQL do table-level lock when inserting a multi-value insert like below statement? This statement is atomic for InnoDB, so it will guarantee all or nothing. How does MySQL handle this underneath?
INSERT INTO MyTable ( Column1, Column2 ) VALUES
( Value1, Value2 ), ( Value1, Value2 )
I have read https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html but cannot find anything particular to this related.
InnoDB locks row not tables
Here I created a simple test case with 3 sessions using IntelliJ consoles with this sample data structure.
Engine: InnoDB (Note that I'm only talking about InnoDB here and I'm sure different engines are different)
Mysql version: 8.0.25
DROP TABLE IF EXISTS `locktest`;
CREATE TABLE locktest
(
id INT UNIQUE KEY,
val INT
);
DROP TABLE IF EXISTS `dataTest`;
CREATE TABLE `dataTest`
(
id INT UNIQUE KEY,
val INT
);
INSERT INTO dataTest
VALUES (1, 1),
(2, 2),
(3, 3),
(4, 4);
Now in session 1:
START TRANSACTION;
UPDATE dataTest
SET val = 100
WHERE id = 2;
So now it's using row 2 and hasn't committed yet.
Then in Session 2:
START TRANSACTION;
INSERT INTO locktest
SELECT id, val from dataTest
WHERE id in (1,2);
As expected, this doesn't succeed immediately because it's waiting for the lock on row 2 from dataTest. You can see here that it's waiting:
But it has succeeded in inserting the row with id 1 into locktest. How do we know this? Look at Session 3
Then in Session 3:
START TRANSACTION;
INSERT INTO locktest VALUES (10, 200);
INSERT INTO locktest VALUES (1, 200);
Here it shows that inserting row with id 10 happens quickly but then it has to wait for the next insert into locktest because in session 2 we had inserted a row with id 1 and that session hasn't finished yet.
We conclude that MySQL doesn't lock table locktest completely and only locks the row with id 1 (because id is unique) and it lets row with id 10 be inserted.
Also, read this and sidenote, be careful about Gap Locks.
Related
I need to import data from an external web service to my mySQL(5.7) database.
Problem is, that I need to split the data into to tables. So for example I have the tables
CREATE TABLE a (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
CREATE TABLE b (
id INT PRIMARY KEY AUTO_INCREMENT,
a_id INT,
name VARCHAR(100)
);
Now I have to insert multiple rows into table b for one row in table a (1:n)
As I do not know the id of table a before inserting it, the only way is to insert one row in table a, get the last id and then insert all connected entries to table b.
But, my database is very slow when I insert row by row. It takes more than 1h to insert about 35000 rows in table a and 120000 in table b. If I do a batch insert about 1000 rows on table a (just for testing without filling table b) it is incredible faster (less then 3 minutes)
I guess there must be a solution how I can speed up my import.
Thanks for your help
I presume you are working with a programming language driving your inserts. You need to be able to program this sequence of operations.
First, you need to use this sequence to put a row into a and dependent rows into b. It uses LAST_INSERT_ID() to handle a_id. That's faster and much more robust than querying the table to find the correct id value.
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
The trick is to capture the a.id value in the session variable #a_id, and then reuse it for each dependent INSERT. (I have turned you into an aristocrat to illustrate this, sorry :-)
Second, you should keep this in mind: INSERTs are cheap, but transaction COMMITs are expensive. That's because MySQL (InnoDB actually) does not actually update tables until COMMIT. Unless you manage your transactions explicitly, the DBMS uses a feature called "autocommit" in which it immediately commits each INSERT (or UPDATE or DELETE).
Fewer transactions gets you better speed. Therefore, to improve bulk-loading performance you want to bundle together 100 or so INSERTs into a single transaction. (The exact number doesn't matter very much.) You can do something like this:
START TRANSACTION; /* start an insertion bundle */
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
INSERT INTO a (name) VALUES ('Oliver');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Jones');
... more INSERT operations ...
INSERT INTO a (name) VALUES ('Jeff');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Atwood');
COMMIT; /* commit the bundle */
START TRANSACTION; /* start the next bundle */
INSERT INTO a (name) VALUES ('Joel');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Spolsky');
... more INSERT operations ...
COMMIT; /* finish the bundle */
(All this, except LAST_INSERT_ID(), works on any SQL-based RDBMS. Each make of RDBMS has its own way of handling IDs.(
This is not a full/correct MySQL query only pseudo-code:
Select *
from Notifications as n
where n.date > (CurrentDate-10 days)
limit by 1
FOR UPDATE
http://dev.mysql.com/doc/refman/5.0/en/select.html states:
If you use FOR UPDATE with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction
Is here only the one record returned locked by MySQL or all records it has to scan to find the single record?
Why don't we just try it?
Set up the database
CREATE DATABASE so1;
USE so1;
CREATE TABLE notification (`id` BIGINT(20), `date` DATE, `text` TEXT) ENGINE=InnoDB;
INSERT INTO notification(id, `date`, `text`) values (1, '2011-05-01', 'Notification 1');
INSERT INTO notification(id, `date`, `text`) values (2, '2011-05-02', 'Notification 2');
INSERT INTO notification(id, `date`, `text`) values (3, '2011-05-03', 'Notification 3');
INSERT INTO notification(id, `date`, `text`) values (4, '2011-05-04', 'Notification 4');
INSERT INTO notification(id, `date`, `text`) values (5, '2011-05-05', 'Notification 5');
Now, start two database connections
Connection 1
BEGIN;
SELECT * FROM notification WHERE `date` >= '2011-05-03' FOR UPDATE;
Connection 2
BEGIN;
If MySQL locks all rows, the following statement would block. If it only locks the rows it returns, it shouldn't block.
SELECT * FROM notification WHERE `date` = '2011-05-02' FOR UPDATE;
And indeed it does block.
Interestingly, we also cannot add records that would be read, i.e.
INSERT INTO notification(id, `date`, `text`) values (6, '2011-05-06', 'Notification 6');
blocks as well!
I can't be sure at this point whether MySQL just goes ahead and locks the entire table when a certain percentage of rows are locked, or where it's actually really intelligent in making sure the result of the SELECT ... FOR UPDATE query can never be changed by another transaction (with an INSERT, UPDATE, or DELETE) while the lock is being held.
The thread is pretty old, just to share my two cents regarding the tests above performed by #Frans
Connection 1
BEGIN;
SELECT * FROM notification WHERE `date` >= '2011-05-03' FOR UPDATE;
Connection 2
BEGIN;
SELECT * FROM notification WHERE `date` = '2011-05-02' FOR UPDATE;
The concurrent transaction 2 will be blocked for sure, but the reason is NOT that the transaction 1 is holding the lock on the whole table. The following explains what has happened behind the scene:
First of all, the default isolation level of the InnoDB storage engine is Repeatable Read. In this case,
1- When the column used in where condition is not indexed (as the case above):
The engine is obliged to perform a full table scan to filter out the records not matching the criteria. EVERY ROW that have been scanned are locked in the first place. MySQL may release the locks on those records not matching the where clause later on. It is an optimization for the performance, however, such behavior violates the 2PL constraint.
When transaction 2 starts, as explained, it needs to acquire the X lock for each row retrieved although there exists only a single record (id = 2) matching the where clause. Eventually the transaction 2 will be waiting for the X lock of the first row (id = 1) until the transaction 1 commits or rollbacks.
2- When the column used in where condition is a primary index
Only the index entry satisfying the criteria is locked. That's why in the comments someone says that some tests are not blocked.
3 - When the column used in where condition is an index but not unique
This case is more complicated. 1) The index entry is locked. 2) One X lock is attached to the corresponding primary index. 3) Two gap locks are attached to the non-existing entries right before and after the record matching the search criteria.
I know this question is pretty old, but I've wanted to share the results of some relevant testing I've done with indexed columns which has yielded some pretty strange results.
Table structure:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`notid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
12 rows inserted with INSERT INTO t1 (notid) VALUES (1), (2),..., (12). On connection 1:
BEGIN;
SELECT * FROM t1 WHERE id=5 FOR UPDATE;
On connection 2, the following statements are blocked:
SELECT * FROM t1 WHERE id!=5 FOR UPDATE;
SELECT * FROM t1 WHERE id<5 FOR UPDATE;
SELECT * FROM t1 WHERE notid!=5 FOR UPDATE;
SELECT * FROM t1 WHERE notid<5 FOR UPDATE;
SELECT * FROM t1 WHERE id<=4 FOR UPDATE;
The strangest part is that SELECT * FROM t1 WHERE id>5 FOR UPDATE; is not blocked, nor are any of
...
SELECT * FROM t1 WHERE id=3 FOR UPDATE;
SELECT * FROM t1 WHERE id=4 FOR UPDATE;
SELECT * FROM t1 WHERE id=6 FOR UPDATE;
SELECT * FROM t1 WHERE id=7 FOR UPDATE;
...
I'd also like to point out that it seems the entire table is locked when the WHERE condition in the query from connection 1 matches a non-indexed row. For example, when connection 1 executes SELECT * FROM t1 WHERE notid=5 FOR UPDATE, all select queries with FOR UPDATE and UPDATE queries from connection 2 are blocked.
-EDIT-
This is a rather specific situation, but it was the only I could find that exhibits this behaviour:
Connection 1:
BEGIN;
SELECT *, #x:=#x+id AS counter FROM t1 CROSS JOIN (SELECT #x:=0) b HAVING counter>5 LIMIT 1 FOR UPDATE;
+----+-------+-------+---------+
| id | notid | #x:=0 | counter |
+----+-------+-------+---------+
| 3 | 3 | 0 | 9 |
+----+-------+-------+---------+
1 row in set (0.00 sec)
From connection 2:
SELECT * FROM t1 WHERE id=2 FOR UPDATE; is blocked;
SELECT * FROM t1 WHERE id=4 FOR UPDATE; is not blocked.
Following links from the documentation page you posted gives more information about locking. In this page
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
This seems pretty clear that it is all rows that it has to scan.
From mysql official doc:
A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row.
For the case discussed in Frans' answer, all rows are locked because there's a table scan during sql processing:
If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows.
Check the latest doc here: https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html
As others have mentioned, SELECT... FOR UPDATE locks all rows encountered in the default isolation level. Try setting the isolation for the session which runs this query to READ COMMITTED, for example precede the query with: set session transaction isolation level read committed;
It locks all the rows selected by query.
This is my table for many to many relationship:
Related:
-id
-id_postA
-id_postB
I want this:
If for example there is a row with id_postA = 32 and id_postB = 67
then it must ignore the insertion of a row with id_postA = 67 AND id_postB = 32.
One option would be to create a unique index on both columns:
CREATE UNIQUE INDEX uk_related ON related (id_postA, id_postB);
And then prevent "duplicates by order inversion" using a trigger, ordering id_postA and id_postB on INSERT and UPDATE:
CREATE TRIGGER order_uk_related
BEFORE INSERT -- Duplicate this trigger also for UPDATE
ON related -- As MySQL doesn't support INSERT OR UPDATE triggers
FOR EACH ROW
BEGIN
DECLARE low INT;
DECLARE high INT;
SET low = LEAST(NEW.id_postA, NEW.id_postB);
SET high = GREATEST(NEW.id_postA, NEW.id_postB);
SET NEW.id_postA = low;
SET NEW.id_postB = high;
END;
As you can see in this SQLFiddle, the fourth insert will fail, as (2, 1) has already been switched to (1, 2) by the trigger:
INSERT INTO relation VALUES (1, null, null)
INSERT INTO relation VALUES (2, null, null)
INSERT INTO relation VALUES (3, 2, 1)
INSERT INTO relation VALUES (4, 1, 2)
Function-based indexes
In some other databases, you might be able to use a function-based index. Unfortunately, this is not possible in MySQL (Is it possible to have function-based index in MySQL?). If this were an Oracle question, you'd write:
CREATE UNIQUE INDEX uk_related ON related (
LEAST(id_postA, id_postB),
GREATEST(id_postA, id_postB)
);
you can include a where like:
For example
insert into table_name
(id_postA
,id_postB
select
col1,
col2
from table_1
where where (cast(col1 as varchar)+'~'+cast(col2 as varchar))
not in (select cast(id_postB as varchar)+'~'+cast(id_postA as varchar) from table_name)
If you always insert these with A < B, you won't have to worry about the reverse being inserted. This can be done with a simple sort, or a quick comparison before inserting.
Join tables like this are by their very nature uni-directional. There is no automatic method for detecting the reverse join and blocking it with a simple UNIQUE index.
Normally what you'd do, though, is insert in pairs:
INSERT INTO related (id_postA, id_postB) VALUES (3,4),(4,3);
If this insert fails, then one or both of those links is already present.
I have a table like: idx (PK) clmn_1
Both are INTs. idx is not
defined as auto-increment, but I am trying to simulate it. To
insert into this table, I am using:
"INSERT INTO my_tbl (idx, clmn_1) \
SELECT IFNULL(MAX(idx), 0) + 1, %s \
FROM my_tbl", val_clmn_1
Now, this works. The query that I have is about atomicity. Since I read and then insert to the same table, when multiple inserts happen simultaneous can there potentially be a
duplicate-key error?
And, how can I test it myself?
I am using Percona XtraDB server 5.5.
This is not a good solution, because it creates a shared lock on my_tbl while it's doing the SELECT. Any number of threads can have a shared lock concurrently, but it blocks concurrent write locks. So this causes inserts to become serialized, waiting for the SELECT to finish.
You can observe this lock. Start this query in one session:
INSERT INTO my_tbl (idx, clmn_1)
SELECT IFNULL(MAX(idx), 0) + 1, 1234+SLEEP(60)
FROM my_tbl;
Then go to another session and run innotop and view the locking screen (press key 'L'). You'll see output like this:
___________________________________ InnoDB Locks ___________________________________
ID Type Waiting Wait Active Mode DB Table Index Ins Intent Special
61 TABLE 0 00:00 00:00 IS test my_tbl 0
61 RECORD 0 00:00 00:00 S test my_tbl PRIMARY 0
This is why the auto-increment mechanism works the way it does. Regardless of transaction isolation, the insert thread locks the table briefly only to increment the auto-inc number. This is extremely quick. Then the lock is released, allowing other threads to proceed immediately. Meanwhile, the first thread attempts to finish its insert.
See http://dev.mysql.com/doc/refman/5.5/en/innodb-auto-increment-handling.html for more details about auto-increment locking.
I'm not sure why you want to simulate auto-increment behavior instead of just defining the column as an auto-increment column. You can change an existing table to be auto-incrementing.
Re your comment:
Even if a PK is declared as auto-increment, you can still specify a value. The auto-incrementation only kicks in if you don't specify the PK column in the INSERT, or you specify NULL or DEFAULT as its value.
CREATE TABLE foo (id INT AUTO_INCREMENT PRIMARY KEY, c CHAR(1));
INSERT INTO foo (id, c) VALUES (123, 'x'); -- inserts value 123
INSERT INTO foo (id, c) VALUES (DEFAULT, 'y'); -- inserts value 124
INSERT INTO foo (id, c) VALUES (42, 'n'); -- inserts specified value 42
INSERT INTO foo (c) VALUES ('Z'); -- inserts value 125
REPLACE INTO foo (id, c) VALUES (125, 'z'); -- changes existing row with id=125
Re your comment:
START TRANSACTION;
SELECT IFNULL(MAX(idx), 0)+1 FROM my_tbl FOR UPDATE;
INSERT INTO my_tbl (idx, clmn_1) VALUES (new_idx_val, some_val);
COMMIT;
This is actually worse than your first idea, because now the SELECT...FOR UPDATE creates an X lock instead of an S lock.
You should really not try to re-invent the behavior of AUTO-INCREMENT, because any SQL solution is limited by ACID properties. Auto-inc necessarily works outside of ACID.
If you need to correct existing rows atomically, use either REPLACE or INSERT...ON DUPLICATE KEY UPDATE.
This is not a full/correct MySQL query only pseudo-code:
Select *
from Notifications as n
where n.date > (CurrentDate-10 days)
limit by 1
FOR UPDATE
http://dev.mysql.com/doc/refman/5.0/en/select.html states:
If you use FOR UPDATE with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction
Is here only the one record returned locked by MySQL or all records it has to scan to find the single record?
Why don't we just try it?
Set up the database
CREATE DATABASE so1;
USE so1;
CREATE TABLE notification (`id` BIGINT(20), `date` DATE, `text` TEXT) ENGINE=InnoDB;
INSERT INTO notification(id, `date`, `text`) values (1, '2011-05-01', 'Notification 1');
INSERT INTO notification(id, `date`, `text`) values (2, '2011-05-02', 'Notification 2');
INSERT INTO notification(id, `date`, `text`) values (3, '2011-05-03', 'Notification 3');
INSERT INTO notification(id, `date`, `text`) values (4, '2011-05-04', 'Notification 4');
INSERT INTO notification(id, `date`, `text`) values (5, '2011-05-05', 'Notification 5');
Now, start two database connections
Connection 1
BEGIN;
SELECT * FROM notification WHERE `date` >= '2011-05-03' FOR UPDATE;
Connection 2
BEGIN;
If MySQL locks all rows, the following statement would block. If it only locks the rows it returns, it shouldn't block.
SELECT * FROM notification WHERE `date` = '2011-05-02' FOR UPDATE;
And indeed it does block.
Interestingly, we also cannot add records that would be read, i.e.
INSERT INTO notification(id, `date`, `text`) values (6, '2011-05-06', 'Notification 6');
blocks as well!
I can't be sure at this point whether MySQL just goes ahead and locks the entire table when a certain percentage of rows are locked, or where it's actually really intelligent in making sure the result of the SELECT ... FOR UPDATE query can never be changed by another transaction (with an INSERT, UPDATE, or DELETE) while the lock is being held.
The thread is pretty old, just to share my two cents regarding the tests above performed by #Frans
Connection 1
BEGIN;
SELECT * FROM notification WHERE `date` >= '2011-05-03' FOR UPDATE;
Connection 2
BEGIN;
SELECT * FROM notification WHERE `date` = '2011-05-02' FOR UPDATE;
The concurrent transaction 2 will be blocked for sure, but the reason is NOT that the transaction 1 is holding the lock on the whole table. The following explains what has happened behind the scene:
First of all, the default isolation level of the InnoDB storage engine is Repeatable Read. In this case,
1- When the column used in where condition is not indexed (as the case above):
The engine is obliged to perform a full table scan to filter out the records not matching the criteria. EVERY ROW that have been scanned are locked in the first place. MySQL may release the locks on those records not matching the where clause later on. It is an optimization for the performance, however, such behavior violates the 2PL constraint.
When transaction 2 starts, as explained, it needs to acquire the X lock for each row retrieved although there exists only a single record (id = 2) matching the where clause. Eventually the transaction 2 will be waiting for the X lock of the first row (id = 1) until the transaction 1 commits or rollbacks.
2- When the column used in where condition is a primary index
Only the index entry satisfying the criteria is locked. That's why in the comments someone says that some tests are not blocked.
3 - When the column used in where condition is an index but not unique
This case is more complicated. 1) The index entry is locked. 2) One X lock is attached to the corresponding primary index. 3) Two gap locks are attached to the non-existing entries right before and after the record matching the search criteria.
I know this question is pretty old, but I've wanted to share the results of some relevant testing I've done with indexed columns which has yielded some pretty strange results.
Table structure:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`notid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
12 rows inserted with INSERT INTO t1 (notid) VALUES (1), (2),..., (12). On connection 1:
BEGIN;
SELECT * FROM t1 WHERE id=5 FOR UPDATE;
On connection 2, the following statements are blocked:
SELECT * FROM t1 WHERE id!=5 FOR UPDATE;
SELECT * FROM t1 WHERE id<5 FOR UPDATE;
SELECT * FROM t1 WHERE notid!=5 FOR UPDATE;
SELECT * FROM t1 WHERE notid<5 FOR UPDATE;
SELECT * FROM t1 WHERE id<=4 FOR UPDATE;
The strangest part is that SELECT * FROM t1 WHERE id>5 FOR UPDATE; is not blocked, nor are any of
...
SELECT * FROM t1 WHERE id=3 FOR UPDATE;
SELECT * FROM t1 WHERE id=4 FOR UPDATE;
SELECT * FROM t1 WHERE id=6 FOR UPDATE;
SELECT * FROM t1 WHERE id=7 FOR UPDATE;
...
I'd also like to point out that it seems the entire table is locked when the WHERE condition in the query from connection 1 matches a non-indexed row. For example, when connection 1 executes SELECT * FROM t1 WHERE notid=5 FOR UPDATE, all select queries with FOR UPDATE and UPDATE queries from connection 2 are blocked.
-EDIT-
This is a rather specific situation, but it was the only I could find that exhibits this behaviour:
Connection 1:
BEGIN;
SELECT *, #x:=#x+id AS counter FROM t1 CROSS JOIN (SELECT #x:=0) b HAVING counter>5 LIMIT 1 FOR UPDATE;
+----+-------+-------+---------+
| id | notid | #x:=0 | counter |
+----+-------+-------+---------+
| 3 | 3 | 0 | 9 |
+----+-------+-------+---------+
1 row in set (0.00 sec)
From connection 2:
SELECT * FROM t1 WHERE id=2 FOR UPDATE; is blocked;
SELECT * FROM t1 WHERE id=4 FOR UPDATE; is not blocked.
Following links from the documentation page you posted gives more information about locking. In this page
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
This seems pretty clear that it is all rows that it has to scan.
From mysql official doc:
A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row.
For the case discussed in Frans' answer, all rows are locked because there's a table scan during sql processing:
If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows.
Check the latest doc here: https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html
As others have mentioned, SELECT... FOR UPDATE locks all rows encountered in the default isolation level. Try setting the isolation for the session which runs this query to READ COMMITTED, for example precede the query with: set session transaction isolation level read committed;
It locks all the rows selected by query.