Can concurrent Inserts result in such a way - mysql

I think I am having some problems related to concurrent inserts so I wanted to know if such concurrent insert statements -
insert into table1 (field1,field2,field3) Values (A,B,C);
insert into table1 (field1,field2,field3) Values (1,2,3);
could result in a row like
A B 3
A and B from the first insert statement and the 3 from the second insert statement. My table is using InnoDB storage btw.

The answer is no. Inserts and updates normally lock the tables/row they are operating on so that won't happen.

Related

How can I increase insert speed?

I need to import data from an external web service to my mySQL(5.7) database.
Problem is, that I need to split the data into to tables. So for example I have the tables
CREATE TABLE a (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
CREATE TABLE b (
id INT PRIMARY KEY AUTO_INCREMENT,
a_id INT,
name VARCHAR(100)
);
Now I have to insert multiple rows into table b for one row in table a (1:n)
As I do not know the id of table a before inserting it, the only way is to insert one row in table a, get the last id and then insert all connected entries to table b.
But, my database is very slow when I insert row by row. It takes more than 1h to insert about 35000 rows in table a and 120000 in table b. If I do a batch insert about 1000 rows on table a (just for testing without filling table b) it is incredible faster (less then 3 minutes)
I guess there must be a solution how I can speed up my import.
Thanks for your help
I presume you are working with a programming language driving your inserts. You need to be able to program this sequence of operations.
First, you need to use this sequence to put a row into a and dependent rows into b. It uses LAST_INSERT_ID() to handle a_id. That's faster and much more robust than querying the table to find the correct id value.
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
The trick is to capture the a.id value in the session variable #a_id, and then reuse it for each dependent INSERT. (I have turned you into an aristocrat to illustrate this, sorry :-)
Second, you should keep this in mind: INSERTs are cheap, but transaction COMMITs are expensive. That's because MySQL (InnoDB actually) does not actually update tables until COMMIT. Unless you manage your transactions explicitly, the DBMS uses a feature called "autocommit" in which it immediately commits each INSERT (or UPDATE or DELETE).
Fewer transactions gets you better speed. Therefore, to improve bulk-loading performance you want to bundle together 100 or so INSERTs into a single transaction. (The exact number doesn't matter very much.) You can do something like this:
START TRANSACTION; /* start an insertion bundle */
INSERT INTO a (name) VALUES ('Claus');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'von');
INSERT INTO b (a_id, name) VALUES (#a_id, 'Bönnhoff');
INSERT INTO a (name) VALUES ('Oliver');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Jones');
... more INSERT operations ...
INSERT INTO a (name) VALUES ('Jeff');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Atwood');
COMMIT; /* commit the bundle */
START TRANSACTION; /* start the next bundle */
INSERT INTO a (name) VALUES ('Joel');
SET #a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (#a_id, 'Spolsky');
... more INSERT operations ...
COMMIT; /* finish the bundle */
(All this, except LAST_INSERT_ID(), works on any SQL-based RDBMS. Each make of RDBMS has its own way of handling IDs.(

MySQL : Can I use one SELECT ... FOR UPDATE to "protect" multiple tables? ( LOCKING )

I'm reading the MySQL docs for hours but I still cannot answer to myself a couple of pretty simple questions... :(
Here is my (simplified) scenario: I have two tables in a database: tablea and tableb, both tables use the InnoDB storage engine. tablea (which is my main table) has a PRIMARY index (id) with autoincrement. Now here is what I want to achieve and please keep in mind that the following business logic can be and will be run concurrently:
I start a transaction:
START TRANSACTION
BEGIN
then I check if an id exists in tablea if yes, I SELECT the row FOR UPDATE, let's call the id I am looking for myid :
SELECT `id` FROM `tablea` WHERE `id`='myid' FOR UPDATE;
if the above SELECT returns no rows, I simply ROLLBACK the transaction and exit from my function. In other words I'm done when myid is not present in tablea.
On the other hand when myid exists then first I need to update some values in tablea:
UPDATE `tablea` SET `somefield`='somevalue' WHERE `id`='myid';
then I need to check if myid also exists in tableb:
SELECT * FROM `tableb` WHERE `id`='myid' FOR UPDATE;
my first question is about the above SELECT statement: Is it okay to do another SELECT FOR UPDATE here (on tableb) ??? Or "FOR UPDATE" is not needed here when dealing with tableb, because I already started a transaction and also acquired a lock based on a row in tablea ??? Can someone please answer this?
The last SELECT statement above either returns a row from tableb (and locks that row for update) or it turns out that myid does not exist in tableb.
When myid is present in tableb then I just need to update some values in that row, it's simple:
UPDATE `tableb` SET `somefieldintableb`='somevaluefortableb' WHERE `id`='myid';
On the other hand when myid is not in tableb I need to insert it, and here comes my 2nd question: Should I lock tableb before I issue my INSERT INTO statement, like this:
LOCK TABLES `tableb` WRITE;
INSERT INTO `tableb` (`id`,`somefieldintableb`) VALUES ('myid','somevaluefortableb');
UNLOCK TABLES `tableb`;
and then finally, I do:
COMMIT
My goal is this: Since the above described function (with the MySQL transaction) will run in many instances in parallel, I want to prevent any of those instances updating the same row in either tablea or tableb at the same time. I also want to prevent double-insertion of myid into tableb, hence I thought about using LOCK TABLES when myid was not found in tableb.
So I have two questions: Should I do a SELECT ... FOR UPDATE within my already started transaction when I want to update tableb or locking tableb with SELECT ... FOR UPDATE is unnecessary, because holding the lock on tablea already "protects" tableb too from simultaneous UPDATEs in this case ??? Thanks to the way I started my transaction, I mean.
2nd question: When I need to INSERT a new row into tableb should I lock the whole table for that insertion? Or is that something that is totally unnecessary in this case? (Do I need LOCK TABLES tableb or not?)
I would appreciate if an expert can answer these two questions for me, because reading the various docs and examples online simply won't help me answering these questions. :(
I would do it this way:
BEGIN;
SELECT a.`id` AS a_id, b.`id` AS b_id
FROM `tablea` AS a LEFT OUTER JOIN `tableb` AS b ON a.id=b.id
WHERE a`id`='myid'
FOR UPDATE;
Now you have row locks on both tablea and tableb if rows exist. If the SELECT returns nothing, you know the id is not present in tablea. If the SELECT returns a row with a value for a_id, but a NULL for b_id, then you know it's present in tablea and not in tableb.
If the row is present in both tables, this locks rows in both tables simultaneously. If you do it in two steps, you might risk a race condition and deadlock.
Try the INSERT and use ON DUPLICATE KEY UPDATE:
INSERT INTO `tableb` (id, somefieldintableb) VALUES ('myid', 'somevaluefortableb')
ON DUPLICATE KEY UPDATE `somefieldintableb`='somevaluefortableb';
If the row with your desired id value is not present, this will insert it. If the row is present, this will update the row. And you're sure to have access to an existing row, because your SELECT FOR UPDATE locked it earlier.
Don't use table locks if you can avoid it. That's a sure way to create a bottleneck in your application.
Re your comments:
Yes, you can use extra join conditions for the date column.
You don't have to update all the columns when you use ON DUPLICATE KEY UPDATE. You can leave most of them alone if the row exists, and just update one, or a few, or whatever.
Also you can reference the value you tried to insert.
INSERT INTO `tableb` (id, date, col1, col2, col3, col4, col5, col6)
VALUES ('myid', $a_date, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE col4=VALUES(col4);
For more details, I recommend reading http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

Multiple UPDATE or DELETE + INSERT in MySQL?

Which of those operations should perform faster in MySQL? Presume I have 999 values to change:
DELETE FROM table WHERE id IN(100, ..., 999);
INSERT INTO example VALUES (100, 'Peter'), (...), (999, 'Sam');
OR
UPDATE table_name SET name='Peter'
WHERE person_id=100; ...; UPDATE table_name SET name='Sam' WHERE person_id=999;
Since you have only 900 row values to change then first option will perform fast. Because you will have only two queries to run. But if you go for second option then you will have almost 900 queries to run that will definitely slow. And more better idea to do same would be
TRUNCATE FROM table;
INSERT INTO example VALUES (100, 'Peter'), (...), (999, 'Sam');
because truncate works faster than delete.
In the general case, those two operations (DELETE/INSERT vs UPDATE) could have significantly different results, based on the current state of the table... Does row with id=999 exist? Are there foreign key constraints that reference this table? Are there any triggers defined on the table?
If the specification is to update rows in a table, I would issue an UPDATE statement, rather than issuing a DELETE and INSERT.
There are factors beyond speed to consider.
But fastest would likely be:
UPDATE example t
SET t.name
= CASE t.id
WHEN 100 THEN 'Peter'
WHEN ...
WHEN 999 THEN 'Sam'
ELSE t.name
END
WHERE t.id IN (100,...999)

InnoDB: custom auto-increment using insert select. Can there be duplicate-key error?

I have a table like: idx (PK) clmn_1
Both are INTs. idx is not
defined as auto-increment, but I am trying to simulate it. To
insert into this table, I am using:
"INSERT INTO my_tbl (idx, clmn_1) \
SELECT IFNULL(MAX(idx), 0) + 1, %s \
FROM my_tbl", val_clmn_1
Now, this works. The query that I have is about atomicity. Since I read and then insert to the same table, when multiple inserts happen simultaneous can there potentially be a
duplicate-key error?
And, how can I test it myself?
I am using Percona XtraDB server 5.5.
This is not a good solution, because it creates a shared lock on my_tbl while it's doing the SELECT. Any number of threads can have a shared lock concurrently, but it blocks concurrent write locks. So this causes inserts to become serialized, waiting for the SELECT to finish.
You can observe this lock. Start this query in one session:
INSERT INTO my_tbl (idx, clmn_1)
SELECT IFNULL(MAX(idx), 0) + 1, 1234+SLEEP(60)
FROM my_tbl;
Then go to another session and run innotop and view the locking screen (press key 'L'). You'll see output like this:
___________________________________ InnoDB Locks ___________________________________
ID Type Waiting Wait Active Mode DB Table Index Ins Intent Special
61 TABLE 0 00:00 00:00 IS test my_tbl 0
61 RECORD 0 00:00 00:00 S test my_tbl PRIMARY 0
This is why the auto-increment mechanism works the way it does. Regardless of transaction isolation, the insert thread locks the table briefly only to increment the auto-inc number. This is extremely quick. Then the lock is released, allowing other threads to proceed immediately. Meanwhile, the first thread attempts to finish its insert.
See http://dev.mysql.com/doc/refman/5.5/en/innodb-auto-increment-handling.html for more details about auto-increment locking.
I'm not sure why you want to simulate auto-increment behavior instead of just defining the column as an auto-increment column. You can change an existing table to be auto-incrementing.
Re your comment:
Even if a PK is declared as auto-increment, you can still specify a value. The auto-incrementation only kicks in if you don't specify the PK column in the INSERT, or you specify NULL or DEFAULT as its value.
CREATE TABLE foo (id INT AUTO_INCREMENT PRIMARY KEY, c CHAR(1));
INSERT INTO foo (id, c) VALUES (123, 'x'); -- inserts value 123
INSERT INTO foo (id, c) VALUES (DEFAULT, 'y'); -- inserts value 124
INSERT INTO foo (id, c) VALUES (42, 'n'); -- inserts specified value 42
INSERT INTO foo (c) VALUES ('Z'); -- inserts value 125
REPLACE INTO foo (id, c) VALUES (125, 'z'); -- changes existing row with id=125
Re your comment:
START TRANSACTION;
SELECT IFNULL(MAX(idx), 0)+1 FROM my_tbl FOR UPDATE;
INSERT INTO my_tbl (idx, clmn_1) VALUES (new_idx_val, some_val);
COMMIT;
This is actually worse than your first idea, because now the SELECT...FOR UPDATE creates an X lock instead of an S lock.
You should really not try to re-invent the behavior of AUTO-INCREMENT, because any SQL solution is limited by ACID properties. Auto-inc necessarily works outside of ACID.
If you need to correct existing rows atomically, use either REPLACE or INSERT...ON DUPLICATE KEY UPDATE.

Doing a large number of upserts as fast as possible

My app (which uses MySQL) is doing a large number of subsequent upserts. Right now my SQL looks like this:
INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL','123','123','123')
INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL','123','123','123')
INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('VICTOR H KINDELL OR','123','123','123')
INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('TRACY L WALTER PERSONAL REP FOR','123','123','123')
INSERT IGNORE INTO customer (name,customer_number,social_security_number,phone) VALUES ('TRACY L WALTER PERSONAL REP FOR','123','123','123')
So far I've found INSERT IGNORE to be the fastest way to achieve upserts. Selecting a record to see if it exists and then either updating it or inserting a new one is too slow. Even this is not as fast as I'd like because I need to do a separate statement for each record. Sometimes I'll have around 50,000 of these statements in a row.
Is there a way to take care of all of these in just one statement, without deleting any existing records?
You can put everything in 1 insert
INSERT IGNORE INTO table_1 (field1, field2) VALUES ('val1', 'val2'), ('val3', 'val4'), etc. You may also want to check INSERT ... ON DUPLICATE KEY UPDATE if you need to either updater or insert a record.