I have two tables
spies |
--------- |
id | PK
weapon_id | FK
name |
weapons
--------- |
id | PK
name |
I'm trying to clarify whether there is a difference in these two SQL updates (when using MySQL innoDB)
Query 1:
UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1
OR
Query 2:
UPDATE spies SET name = 'Bond' WHERE id = 1
I have heard that when the updating a row with a FK creates read-only lock (not sure if that's the correct term) on the parent.
Would using Query 2 avoid the lock on the parent table?
Consider the following schema:
(Rem stmts left in for your convenience) :
-- drop table if exists spies;
create table spies
( id int primary key,
weapon_id int not null,
name varchar(100) not null,
key(weapon_id),
foreign key (weapon_id) references weapons(id)
)engine=InnoDB;
-- drop table if exists weapons;
create table weapons
( id int primary key,
name varchar(100) not null
)engine=InnoDB;
insert weapons(id,name) values (1,'slingshot'),(2,'Ruger');
insert spies(id,weapon_id,name) values (1,2,'Sally');
-- truncate table spies;
Now, we have 2 processes, P1 and P2. Best to test where P1 is perhaps MySQL Workbench and P2 is a MySql Command-line window. In other words, you have to set this up as separate connections and right. You would have to have a meticulous eye for step-by-step running these in the proper fashion (described in the Narrative below) and see its impact on the other process window.
Consider the following queries, keeping in mind that a mysql query not wrapped in an explicit transaction is itself an implicit transaction. But below, I swung for explicit:
Q1:
START TRANSACTION;
-- place1
UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1;
-- place2
COMMIT;
Q2:
START TRANSACTION;
-- place1
UPDATE spies SET name = 'Bond' WHERE id = 1;
-- place2
COMMIT;
Q3:
START TRANSACTION;
-- place1
SELECT id into #mine_to_use from weapons where id=1 FOR UPDATE; -- place2
-- place3
COMMIT;
Q4:
START TRANSACTION;
-- place1
SELECT id into #mine_to_use from spies where id=1 FOR UPDATE; -- place2
-- place3
COMMIT;
Q5 (hodge podge of queries):
SELECT * from weapons;
SELECT * from spies;
Narrative
Q1: When P1 starts to begin Q1, and gets to place2, it has obtained an exclusive row-level update lock in both tables weapons and spies for the id=1 row (2 rows total, 1 row in each table). This can be proved by P2 starting to run Q3, getting to place1, but blocking on place2, and only being freed when P1 gets around to calling COMMIT. Everything I just said about P2 running Q3 is ditto for P2 running Q4. In summary, on the P2 screen, place2 freezes until the P1 Commit.
A note again about implicit transactions. Your real Q1 query is going to perform this very fast and coming out of it will do an implicit commit. However, the prior paragraph breaks it down were you to have more time-costly routines running.
Q2: When P1 starts to begin Q2, and gets to place2, it has obtained an exclusive row-level update lock in both tables weapons and spies for the id=1 row (2 rows total, 1 row in each table). However, P2 has no issues with Q3 blocking weapons, but P2 has block issues running Q4 at place2 spies.
So, the differences between Q1 and Q2 come down to MySQL knowing that the FK index is not relevant to a column in the UPDATE, and the manual states that in Note1 below.
When P1 runs Q1, P2 has no problems the read-only non-lock aquiring Q5 types of queries. The only issues are what data renditions P2 sees based on the ISOLATION LEVEL in place.
Note1: From the MySQL Manual Page entitled Locks Set by Different SQL Statements in InnoDB:
If a FOREIGN KEY constraint is defined on a table, any insert, update,
or delete that requires the constraint condition to be checked sets
shared record-level locks on the records that it looks at to check the
constraint. InnoDB also sets these locks in the case where the
constraint fails.
The above is why the behavior of Q2: is such that P2 is free to perform an UPDATE or acquire an UPDATE exclusive momentary lock on weapons. This is because the engine is not performing an UPDATE with P1 on weapon_id and thus does not have a row-level lock in that table.
To pull this back to 50,000 feet, one's biggest concern is the duration at which a lock is held either in an implicit transaction (one with no START/COMMIT), or explicit transaction before a COMMIT. A peer process can be prohibited from acquiring its need for an UPDATE in theory indefinitely. But each attempt at acquiring that lock is governed by its setting for innodb_lock_wait_timeout. What that means is, by default, after about 60 seconds it times out. For a view of your setting, run:
select ##innodb_lock_wait_timeout;
For me, at the moment, it is 50 (seconds).
Why not run EXPLAIN for this query and check it for yourself?
So, lets run!!
EXPLAIN UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1\G
And check for number of rows that this query is scanning for, check for ROWS section and see how many rows its scanning.
Do the same for the below one as well.
EXPLAIN UPDATE spies SET name = 'Bond' WHERE id = 1\G
Now, coming to your question, INNODB will lock every update you are making on the each row in a table. But remember, this is a row level locking.
So, to answer your question, updating a row with or without a foreign key will not make a difference if its the same row and the same table.
It will make a difference if its a different row or different table.
Related
I need to select, make manipulation and update a lot of data for less than 3 minutes. And was decided to create some kind of locking mechanism to make the ability to run separate processes (in parallel) and each process should lock, select and update own rows.
To make it possible was decided to add the column worker_id to the table.
Table structure:
CREATE TABLE offers
(
id int(10) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
offer_id int(11) NOT NULL,
offer_sid varchar(255) NOT NULL,
offer_name varchar(255),
account_name varchar(255),
worker_id varchar(255),
);
CREATE UNIQUE INDEX offers_offer_id_offer_sid_unique ON offers (offer_id, offer_sid);
CREATE INDEX offers_offer_id_index ON offers (offer_id);
CREATE INDEX offers_offer_sid_index ON offers (offer_sid);
Also, we decided to start from 5 parallel processes and to not allow selection of the same row by different processes we are using the formula: offer_id % max_amount_of_processes = process_number (process_number starting from 0, so first is 0 and last is 4)
Each process is following the steps:
set worker_id with current process id to the first 1000 rows using the query: update offers set worker_id =: process_id where worker_id is null and offer_id%5 =: process_number order by offer_id asc limit 1000
select those rows: select * from offers where worker_id =: process_id
order by offer_id asc limit 1000
make manipulation with data, store last offer_id to the variable and prepared data to another variable for further update
run the same query from step 1 to lock next 1000 rows
run the same query as we have in step 2 with additional where clause and offer_id > :last_selected_id to select next 1000 rows
make the same steps in the loop until we lock all rows
remove all locks update offers set worker_id = null where worker_id =: process_id
run the query to update all collected data
and the same steps for other 4 processes
The issue here is that I'm getting a deadlock when all 5 processes simultaneously run the query from step 1 to lock rows (set worker_id) but each process doing lock for own rows which depending on the formula. I tried to set transaction isolation level to READ COMMITED but still the same issue.
I'm a novice in the locking mechanism and I need a help to prevent deadlocks here or to create the better mechanism
The expression offer_id%5 = :process_number cannot use an index, so it can only scan all the rows matched by the first condition, worker_id is null.
You can prove this with two windows:
mysql1> begin;
mysql1> set #p=1;
mysql1> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
Don't commit the transaction in window 1 yet.
mysql2> set #p=2;
mysql2> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
...waits for about 50 seconds, or value of innodb_lock_wait_timeout, then...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
This demonstrates that each concurrent session locks overlapping sets of rows, not only the rows that match the modulus expression. So the sessions queue up against each other's locks.
This will get worse if you put all the steps into a transaction like #SloanThrasher suggests. Making the work of each worker take longer will make them hold only their locks longer, and further delay the other processes waiting on those locks.
I do not understand how updated_at field can cause the issue as I'm still updating other fields
I'm not sure because you haven't posted the InnoDB deadlock diagnostics from SHOW ENGINE INNODB STATUS.
I do notice that your table has a secondary UNIQUE KEY, which will also require locks. There are some cases of deadlocks that occur because of non-atomicity of the lock assignment.
Worker 1 Worker 2
UPDATE SET worker_id = 1
(acquires locks on PK)
UPDATE SET worker_id = 2
(waits for PK locks held by worker 1)
(waits for locks on UNIQUE KEY)
Both worker 1 and worker 2 can therefore be waiting on each other, and enter into a deadlock.
This is just a guess. Another possibility is that the ORM is doing a second UPDATE for the updated_at column, and this introduces another opportunity for a race condition. I haven't quite worked that out mentally, but I think it's possible.
Below is a recommendation for a different system that would avoid these problems:
There's another problem, that you're not really balancing the work over your processes to achieve the best completion time. There might not be an equal number of offers in each group when you split them by modulus. And each offer might not take the same amount of time to process anyway. So some of your workers could finish and have nothing to do, while the last worker is still processing its work.
You can solve both problems, the locking and the load-balancing:
Change the table columns in the following way:
ALTER TABLE offers
CHANGE worker_id work_state ENUM('todo', 'in progress', 'done') NOT NULL DEFAULT 'todo',
ADD INDEX (work_state),
ADD COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX (updated_at);
Create ONE process that reads from the table periodically, and adds the primary key id values of offers in a 'todo' state to a message queue. All the offers, regardless of their offer_id value, get queued in the same way.
SELECT id FROM offers WHERE work_state = 'todo'
/* push each id onto the queue */
Then each of the workers can pull one id at a time from the message queue. The worker does the following steps with each id:
UPDATE offers SET work_state = 'in progress' WHERE id = :id
The worker performs the work for its one offer.
UPDATE offers SET work_state = 'done' WHERE id = :id
These worker queries only reference one offer at a time, and they address the offers by primary key, which will use the PK index and only lock one row at a time.
Once it has finished one offer, then the worker pulls the next offer from the queue.
In this way, the workers will all finish at the same time, and the work will be balanced over the workers better. Also you can start or stop workers at any time, and you don't care about what worker number they are, because your offers don't need to be processed by a worker with the same number as the modulus of the offer_id.
When the workers finish all the offers, the message queue will be empty. Most message queues allow workers to do blocking reads, so while the queue is empty, the worker will just wait for the read to return. When you use a database, the workers have to poll frequently for new work.
There's a chance a worker will fail during its work, and never mark an offer 'done'. You need to check periodically for orphaned offers. Assume they are not going to be completed, and mark their state 'todo'.
UPDATE offers SET work_state = 'todo'
WHERE work_state = 'in progress' AND updated_at < NOW() - INTERVAL 5 MINUTE
Choose the interval length so it's certain that any worker would have finished it by that time unless something had gone wrong. You would probably do this "reset" before the dispatcher queries for current offers todo, so the offers that had been forgotten will be re-queued.
I found the issue. It was because my ORM is by default updating timestamp fields (to simplify the example above I removed them from table structure) while doing an update operation, and after I turn it off the deadlock disappeared. But still, I do not understand how updated_at field can cause the issue as I'm still updating other fields
Assume I have the following table:
| id | claimed |
----------------
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
I can execute this query to update exactly (any) one of the rows without having to execute a select first.
UPDATE mytable SET claimed = [someId] WHERE claimed IS NULL LIMIT 1
However, what happens if two concurrent requests of this query take place. Is it possible for the later request to override the value of the first request? I know the chance of this happening is very slight, but still.
Performing statement UPDATE mytable SET claimed = [someId] WHERE claimed IS NULL LIMIT 1 in a transaction t1 locks the respective record and prevents any other transaction t2 from updating the same record until transaction t1 commits (or aborts). Transaction t2 is blocked in the meanwhile; t2 continues once t1 commits (or aborts), or t2 gets aborted automatically once a timeout is reached.
Confer mysql reference on internal locking methods - row level locking:
MySQL uses row-level locking for InnoDB tables to support simultaneous
write access by multiple sessions, making them suitable for
multi-user, highly concurrent, and OLTP applications.
and mysql reference on Locks Set by Different SQL Statements in InnoDB:
UPDATE ... WHERE ... sets an exclusive next-key lock on every record
the search encounters. However, only an index record lock is required
for statements that lock rows using a unique index to search for a
unique row.
and finally the behaviour of locking in mysql reference InnoDB Locking for record locks:
If a transaction T1 holds an exclusive (X) lock on row r, a request
from some distinct transaction T2 for a lock of either type on r
cannot be granted immediately. Instead, transaction T2 has to wait for
transaction T1 to release its lock on row r.
So two queries will not grab the same record as long as these two queries run in different transactions.
Note that the complete record is locked, such that other update operations by other transactions are blocked, even if they would update other attributes of the respective record.
I tried it out using SequelPro, and you can try it out with any client you want, as follows:
Make sure that mytable contains at least two records with claimed
is null.
Open two connection windows / terminals; let's call them c1 and
c2.
in c1, execute the following two commands: start transaction;
UPDATE mytable SET claimed = 15 WHERE claimed IS NULL LIMIT 1; #
No commit so far!
in c2, execute similar commands (Note the different value for
claimed): start transaction; UPDATE mytable SET claimed = 16 WHERE claimed IS NULL LIMIT 1; # Again, no commit so far
Window c2 should inform you that it is working (i.e. waiting for
the query to finish).
Switch to window c1 and execute command commit;
Switch to window c2, where the (previously started) query should
now have been finished; Execute commit;
When looking into mytable, one record should now have claim=15,
and another one should have claim=16.
Here is my Query:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
DROP TEMPORARY TABLE IF EXISTS taken;
CREATE TEMPORARY Table taken(
id int,
invoice_id int
);
INSERT INTO taken(id, invoice_id)
SELECT id, $invoice_id FROM `licenses` l
WHERE l.`status` = 0 AND `type` = $type
LIMIT $serial_count
FOR UPDATE;
UPDATE `licenses` SET `status` = 1
WHERE id IN (SELECT id FROM taken);
If I'm going to face high concurrency is the query above thread-safe? I mean I don't wanna assign records which has already assigned to another one.
With your FOR UPDATE statement, you are locking all selected licenses until you perform an update, so you can be sure that there will not be concurrency problem on those records.
the only problem i can see is that if your query requires a lot of time to perform (how many licenses do you expect to process at every query?) and other queries requires licenses (even read queries are locked) on the same time, your system will be slowed down.
I have a table with two columns: item_id (int, auto inc) and item_counter (int, default value 0)
A user on a web page is allotted a few items by entering a form. This runs:
SELECT * FROM itemtable WHERE item_counter<100 ORDER BY RAND() LIMIT 5
This is followed by an UPDATE query increasing the item_counter of each of those items.
UPDATE itemtable SET item_counter=item_counter + 1 WHERE item_id=:item_id
About 20-50 users will be doing this operation at once.
A simple application. 1 SELECT and 5 UPDATE operations are sequential for each user, and I want the item_counter to be accurate (avoiding this scenario: an item_id is selected as having item_counter 99 but gets updated by some other user before this user is able to update it.
Should I use concurrency/locking in this?
I don't know if InnoDB's row-level locking is inherent to all UPDATE operations or any syntax changes are required?. I'm wondering what should I use here.. or don't use anything at all.
BEGIN;
SELECT ... FOR UPDATE; -- "FOR UPDATE" is the secret sauce.
UPDATE ...
COMMIT;
This is a follow up on my previous question (you can skip it as I explain in this post the issue):
MySQL InnoDB SELECT...LIMIT 1 FOR UPDATE Vs UPDATE ... LIMIT 1
Environment:
JSF 2.1 on Glassfish
JPA 2.0 EclipseLink and JTA
MySQL 5.5 InnoDB engine
I have a table:
CREATE TABLE v_ext (
v_id INT NOT NULL AUTO_INCREMENT,
product_id INT NOT NULL,
code VARCHAR(20),
username VARCHAR(30),
PRIMARY KEY (v_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;
It is populated with 20,000 records like this one (product_id is 54 for all records, code is randomly generated and unique, username is set to NULL):
v_id product_id code username
-----------------------------------------------------
1 54 '20 alphanumerical' NULL
...
20,000 54 '20 alphanumerical' NULL
When a user purchase product 54, he gets a code from that table. If the user purchases multiple times, he gets a code each times (no unique constraint on username). Because I am preparing for a high activity I want to make sure that:
No concurrency/deadlock can occur
Performance is not impacted by the locking mechanism which will be needed
From the SO question (see link above) I found that doing such a query is faster:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL LIMIT 1 FOR UPDATE;
// Use result for next query
UPDATE v_ext SET username=xxx WHERE v_id=...;
COMMIT;
However I found a deadlock issue ONLY when using an index on username column. I thought of adding an index would help in speeding up a little bit but it creates a deadlock after about 19,970 records (actually quite consistently at this number of rows). Is there a reason for this? I don't understand. Thank you.
From a purely theoretical point of view, it looks like you are not locking the right rows (different condition in the first statement than in the update statement; besides you only lock one row because of LIMIT 1, whereas you possibly update more rows later on).
Try this:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL AND v_id=yyy FOR UPDATE;
UPDATE v_ext SET username=xxx WHERE v_id=yyy;
COMMIT;
[edit]
As for the reason for your deadlock, this is the probable answer (from the manual):
If you have no indexes suitable for your statement and MySQL must scan
the entire table to process the statement, every row of the table
becomes locked (...)
Without an index, the SELECT ... FOR UPDATE statement is likely to lock the entire table, whereas with an index, it only locks some rows. Because you didn't lock the right rows in the first statement, an additional lock is acquired during the second statement.
Obviously, a deadlock cannot happen if the whole table is locked (i.e. without an index).
A deadlock can certainly occur in the second setup.
First of all, the definition of the table is wrong. You have no tid column in the table, so i am suspecting the primary key is v_id.
Second of all, if you select for update, you lock the row. Any other select coming until the first transaction is done will wait for the row to be cleared, because it will hit the exact same record. So you will have waits for this row.
However, i pretty much doubt this can be a real serious problem in your case, because first of all, you have the username there, and second of all you have the product id there. It is extremly unlikely that you will have alot of hits on that exact same record you hit initially, and even if you do, the transaction should be running very fast.
You have to understand that by using transactions, you usually give up pretty much on concurrency for consistent data. There is no way to support consistency of data and concurrency at the same time.