How can I maintain data integrity for a stock table with InnoDB? Does this require the use of both a table lock and a transaction? I have tried to find some good examples, but even well known opensource ecommerce software doesn't implement table locks/transactions. Would below code work? Do I really need to lock the whole table, or is there a way to only lock the rows of the sold products? Or does anyone have a good example?
CREATE TABLE `stock` (
`product_id` VARCHAR(50) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`quantity` INT(10) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`product_id`)
)
COLLATE='utf8mb4_unicode_ci'
ENGINE=InnoDB;
.
LOCK TABLES stock WRITE;
START TRANSACTION;
SET autocommit = 0;
UPDATE stock SET quantity = quantity - 4 WHERE product_id = 'PRODUCT_1' AND quantity >= 4;
UPDATE stock SET quantity = quantity - 2 WHERE product_id = 'PRODUCT_2' AND quantity >= 2;
UPDATE stock SET quantity = quantity - 5 WHERE product_id = 'PRODUCT_3' AND quantity >= 5;
COMMIT or ROLLBACK;
COMMIT if all UPDATE queries have 1 affected row, otherwise ROLLBACK
It should not be necessary to do the table-lock, unless you want to make sure concurrent clients running the same code don't run simultaneous updates. I'm not sure that matters, because each of your updates is working on distinct subsets of rows.
In any case, it's not necessary to set autocommit=0 after you start a transaction. Starting a transaction implicitly means that it won't commit until you use COMMIT (or some statement causes an implicit commit).
You might even wrap all three updates into one update, except that you said you want to make sure each update changes at least one row.
"1 affected row" is not what is tested. Failure, deadlock, timeout, etc are tested. If you want to fail if it is not exactly 1 row each, then you need to fetch rows_affected, test it for "1", then ROLLBACK or not.
Also, the
COMMIT or ROLLBACK
needs to be "if ... then ROLLBACK else COMMIT". This is best done in your application language.
Related
I need to select, make manipulation and update a lot of data for less than 3 minutes. And was decided to create some kind of locking mechanism to make the ability to run separate processes (in parallel) and each process should lock, select and update own rows.
To make it possible was decided to add the column worker_id to the table.
Table structure:
CREATE TABLE offers
(
id int(10) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
offer_id int(11) NOT NULL,
offer_sid varchar(255) NOT NULL,
offer_name varchar(255),
account_name varchar(255),
worker_id varchar(255),
);
CREATE UNIQUE INDEX offers_offer_id_offer_sid_unique ON offers (offer_id, offer_sid);
CREATE INDEX offers_offer_id_index ON offers (offer_id);
CREATE INDEX offers_offer_sid_index ON offers (offer_sid);
Also, we decided to start from 5 parallel processes and to not allow selection of the same row by different processes we are using the formula: offer_id % max_amount_of_processes = process_number (process_number starting from 0, so first is 0 and last is 4)
Each process is following the steps:
set worker_id with current process id to the first 1000 rows using the query: update offers set worker_id =: process_id where worker_id is null and offer_id%5 =: process_number order by offer_id asc limit 1000
select those rows: select * from offers where worker_id =: process_id
order by offer_id asc limit 1000
make manipulation with data, store last offer_id to the variable and prepared data to another variable for further update
run the same query from step 1 to lock next 1000 rows
run the same query as we have in step 2 with additional where clause and offer_id > :last_selected_id to select next 1000 rows
make the same steps in the loop until we lock all rows
remove all locks update offers set worker_id = null where worker_id =: process_id
run the query to update all collected data
and the same steps for other 4 processes
The issue here is that I'm getting a deadlock when all 5 processes simultaneously run the query from step 1 to lock rows (set worker_id) but each process doing lock for own rows which depending on the formula. I tried to set transaction isolation level to READ COMMITED but still the same issue.
I'm a novice in the locking mechanism and I need a help to prevent deadlocks here or to create the better mechanism
The expression offer_id%5 = :process_number cannot use an index, so it can only scan all the rows matched by the first condition, worker_id is null.
You can prove this with two windows:
mysql1> begin;
mysql1> set #p=1;
mysql1> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
Don't commit the transaction in window 1 yet.
mysql2> set #p=2;
mysql2> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
...waits for about 50 seconds, or value of innodb_lock_wait_timeout, then...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
This demonstrates that each concurrent session locks overlapping sets of rows, not only the rows that match the modulus expression. So the sessions queue up against each other's locks.
This will get worse if you put all the steps into a transaction like #SloanThrasher suggests. Making the work of each worker take longer will make them hold only their locks longer, and further delay the other processes waiting on those locks.
I do not understand how updated_at field can cause the issue as I'm still updating other fields
I'm not sure because you haven't posted the InnoDB deadlock diagnostics from SHOW ENGINE INNODB STATUS.
I do notice that your table has a secondary UNIQUE KEY, which will also require locks. There are some cases of deadlocks that occur because of non-atomicity of the lock assignment.
Worker 1 Worker 2
UPDATE SET worker_id = 1
(acquires locks on PK)
UPDATE SET worker_id = 2
(waits for PK locks held by worker 1)
(waits for locks on UNIQUE KEY)
Both worker 1 and worker 2 can therefore be waiting on each other, and enter into a deadlock.
This is just a guess. Another possibility is that the ORM is doing a second UPDATE for the updated_at column, and this introduces another opportunity for a race condition. I haven't quite worked that out mentally, but I think it's possible.
Below is a recommendation for a different system that would avoid these problems:
There's another problem, that you're not really balancing the work over your processes to achieve the best completion time. There might not be an equal number of offers in each group when you split them by modulus. And each offer might not take the same amount of time to process anyway. So some of your workers could finish and have nothing to do, while the last worker is still processing its work.
You can solve both problems, the locking and the load-balancing:
Change the table columns in the following way:
ALTER TABLE offers
CHANGE worker_id work_state ENUM('todo', 'in progress', 'done') NOT NULL DEFAULT 'todo',
ADD INDEX (work_state),
ADD COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX (updated_at);
Create ONE process that reads from the table periodically, and adds the primary key id values of offers in a 'todo' state to a message queue. All the offers, regardless of their offer_id value, get queued in the same way.
SELECT id FROM offers WHERE work_state = 'todo'
/* push each id onto the queue */
Then each of the workers can pull one id at a time from the message queue. The worker does the following steps with each id:
UPDATE offers SET work_state = 'in progress' WHERE id = :id
The worker performs the work for its one offer.
UPDATE offers SET work_state = 'done' WHERE id = :id
These worker queries only reference one offer at a time, and they address the offers by primary key, which will use the PK index and only lock one row at a time.
Once it has finished one offer, then the worker pulls the next offer from the queue.
In this way, the workers will all finish at the same time, and the work will be balanced over the workers better. Also you can start or stop workers at any time, and you don't care about what worker number they are, because your offers don't need to be processed by a worker with the same number as the modulus of the offer_id.
When the workers finish all the offers, the message queue will be empty. Most message queues allow workers to do blocking reads, so while the queue is empty, the worker will just wait for the read to return. When you use a database, the workers have to poll frequently for new work.
There's a chance a worker will fail during its work, and never mark an offer 'done'. You need to check periodically for orphaned offers. Assume they are not going to be completed, and mark their state 'todo'.
UPDATE offers SET work_state = 'todo'
WHERE work_state = 'in progress' AND updated_at < NOW() - INTERVAL 5 MINUTE
Choose the interval length so it's certain that any worker would have finished it by that time unless something had gone wrong. You would probably do this "reset" before the dispatcher queries for current offers todo, so the offers that had been forgotten will be re-queued.
I found the issue. It was because my ORM is by default updating timestamp fields (to simplify the example above I removed them from table structure) while doing an update operation, and after I turn it off the deadlock disappeared. But still, I do not understand how updated_at field can cause the issue as I'm still updating other fields
I have two tables
spies |
--------- |
id | PK
weapon_id | FK
name |
weapons
--------- |
id | PK
name |
I'm trying to clarify whether there is a difference in these two SQL updates (when using MySQL innoDB)
Query 1:
UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1
OR
Query 2:
UPDATE spies SET name = 'Bond' WHERE id = 1
I have heard that when the updating a row with a FK creates read-only lock (not sure if that's the correct term) on the parent.
Would using Query 2 avoid the lock on the parent table?
Consider the following schema:
(Rem stmts left in for your convenience) :
-- drop table if exists spies;
create table spies
( id int primary key,
weapon_id int not null,
name varchar(100) not null,
key(weapon_id),
foreign key (weapon_id) references weapons(id)
)engine=InnoDB;
-- drop table if exists weapons;
create table weapons
( id int primary key,
name varchar(100) not null
)engine=InnoDB;
insert weapons(id,name) values (1,'slingshot'),(2,'Ruger');
insert spies(id,weapon_id,name) values (1,2,'Sally');
-- truncate table spies;
Now, we have 2 processes, P1 and P2. Best to test where P1 is perhaps MySQL Workbench and P2 is a MySql Command-line window. In other words, you have to set this up as separate connections and right. You would have to have a meticulous eye for step-by-step running these in the proper fashion (described in the Narrative below) and see its impact on the other process window.
Consider the following queries, keeping in mind that a mysql query not wrapped in an explicit transaction is itself an implicit transaction. But below, I swung for explicit:
Q1:
START TRANSACTION;
-- place1
UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1;
-- place2
COMMIT;
Q2:
START TRANSACTION;
-- place1
UPDATE spies SET name = 'Bond' WHERE id = 1;
-- place2
COMMIT;
Q3:
START TRANSACTION;
-- place1
SELECT id into #mine_to_use from weapons where id=1 FOR UPDATE; -- place2
-- place3
COMMIT;
Q4:
START TRANSACTION;
-- place1
SELECT id into #mine_to_use from spies where id=1 FOR UPDATE; -- place2
-- place3
COMMIT;
Q5 (hodge podge of queries):
SELECT * from weapons;
SELECT * from spies;
Narrative
Q1: When P1 starts to begin Q1, and gets to place2, it has obtained an exclusive row-level update lock in both tables weapons and spies for the id=1 row (2 rows total, 1 row in each table). This can be proved by P2 starting to run Q3, getting to place1, but blocking on place2, and only being freed when P1 gets around to calling COMMIT. Everything I just said about P2 running Q3 is ditto for P2 running Q4. In summary, on the P2 screen, place2 freezes until the P1 Commit.
A note again about implicit transactions. Your real Q1 query is going to perform this very fast and coming out of it will do an implicit commit. However, the prior paragraph breaks it down were you to have more time-costly routines running.
Q2: When P1 starts to begin Q2, and gets to place2, it has obtained an exclusive row-level update lock in both tables weapons and spies for the id=1 row (2 rows total, 1 row in each table). However, P2 has no issues with Q3 blocking weapons, but P2 has block issues running Q4 at place2 spies.
So, the differences between Q1 and Q2 come down to MySQL knowing that the FK index is not relevant to a column in the UPDATE, and the manual states that in Note1 below.
When P1 runs Q1, P2 has no problems the read-only non-lock aquiring Q5 types of queries. The only issues are what data renditions P2 sees based on the ISOLATION LEVEL in place.
Note1: From the MySQL Manual Page entitled Locks Set by Different SQL Statements in InnoDB:
If a FOREIGN KEY constraint is defined on a table, any insert, update,
or delete that requires the constraint condition to be checked sets
shared record-level locks on the records that it looks at to check the
constraint. InnoDB also sets these locks in the case where the
constraint fails.
The above is why the behavior of Q2: is such that P2 is free to perform an UPDATE or acquire an UPDATE exclusive momentary lock on weapons. This is because the engine is not performing an UPDATE with P1 on weapon_id and thus does not have a row-level lock in that table.
To pull this back to 50,000 feet, one's biggest concern is the duration at which a lock is held either in an implicit transaction (one with no START/COMMIT), or explicit transaction before a COMMIT. A peer process can be prohibited from acquiring its need for an UPDATE in theory indefinitely. But each attempt at acquiring that lock is governed by its setting for innodb_lock_wait_timeout. What that means is, by default, after about 60 seconds it times out. For a view of your setting, run:
select ##innodb_lock_wait_timeout;
For me, at the moment, it is 50 (seconds).
Why not run EXPLAIN for this query and check it for yourself?
So, lets run!!
EXPLAIN UPDATE spies SET name = 'Bond', weapon_id = 1 WHERE id = 1\G
And check for number of rows that this query is scanning for, check for ROWS section and see how many rows its scanning.
Do the same for the below one as well.
EXPLAIN UPDATE spies SET name = 'Bond' WHERE id = 1\G
Now, coming to your question, INNODB will lock every update you are making on the each row in a table. But remember, this is a row level locking.
So, to answer your question, updating a row with or without a foreign key will not make a difference if its the same row and the same table.
It will make a difference if its a different row or different table.
Here is my Query:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
DROP TEMPORARY TABLE IF EXISTS taken;
CREATE TEMPORARY Table taken(
id int,
invoice_id int
);
INSERT INTO taken(id, invoice_id)
SELECT id, $invoice_id FROM `licenses` l
WHERE l.`status` = 0 AND `type` = $type
LIMIT $serial_count
FOR UPDATE;
UPDATE `licenses` SET `status` = 1
WHERE id IN (SELECT id FROM taken);
If I'm going to face high concurrency is the query above thread-safe? I mean I don't wanna assign records which has already assigned to another one.
With your FOR UPDATE statement, you are locking all selected licenses until you perform an update, so you can be sure that there will not be concurrency problem on those records.
the only problem i can see is that if your query requires a lot of time to perform (how many licenses do you expect to process at every query?) and other queries requires licenses (even read queries are locked) on the same time, your system will be slowed down.
We have various tables to represent various types of data. Each table has a corresponding revisions table to track history of this data. Each revision (entry in a revisions table) has a unique number. This number is stored in a change metadata table. Each of these tables references a parent_id. Before we make any changes to the tables we lock the parent row with SELECT … FOR UPDATE.
After making an update/insert we also increment the change number and write that number to the change metadata table. To do so we do a SELECT MAX on the change metadata number and then increment it.
The issue we’re seeing is that somehow a transaction is getting an old change number from the select max statement. To illustrate:
Transaction 1:
START TRANSACTION
lock with FOR UPDATE
do stuff...
Get Latest Change Number (9)
Insert Revision with Number 10
COMMIT
Transaction 2:
START TRANSACTION
lock with FOR UPDATE
do stuff...
Get Latest Change Number (7)
Insert Revision with Number 8
COMMIT
This causes the revision insert for transaction 2 to fail as the change number is a unique key. I’m leaning towards it being an issue of repeatable reads but I’m not sure how the old data can persist across transactions in such a way. For each transactions there's a START TRANSACTION statement and then immediately the parent id is locked with FOR UPDATE. We have a high traffic site with multiple concurrent transactions. It's possible there are many waiting on the lock at any one time. I'd be happy to clarify any point and would appreciate any insight anyone could offer.
SELECT MAX on the change metadata number
That needs FOR UPDATE, too.
Another approach:
Have a "sequence number generator" table.
CREATE TABLE Sequence (
pk TINYINT NOT NULL,
seq INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(pk), -- For ON DUPLICATE KEY UPDATE
INDEX(seq) -- Sufficient for AUTO_INCREMENT
);
The only action (once initialized) should be
INSERT INTO Sequence (pk, seq) VALUE (1, 0)
ON DUPLICATE KEY UPDATE seq := LAST_INSERT_ID(seq+1);
That will update the one row atomically. Then (in the same connection), do this to get the new seq:
SELECT LAST_INSERT_ID();
That statement is tied to the connection, so there is no chance of someone else getting your number.
This is a follow up on my previous question (you can skip it as I explain in this post the issue):
MySQL InnoDB SELECT...LIMIT 1 FOR UPDATE Vs UPDATE ... LIMIT 1
Environment:
JSF 2.1 on Glassfish
JPA 2.0 EclipseLink and JTA
MySQL 5.5 InnoDB engine
I have a table:
CREATE TABLE v_ext (
v_id INT NOT NULL AUTO_INCREMENT,
product_id INT NOT NULL,
code VARCHAR(20),
username VARCHAR(30),
PRIMARY KEY (v_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;
It is populated with 20,000 records like this one (product_id is 54 for all records, code is randomly generated and unique, username is set to NULL):
v_id product_id code username
-----------------------------------------------------
1 54 '20 alphanumerical' NULL
...
20,000 54 '20 alphanumerical' NULL
When a user purchase product 54, he gets a code from that table. If the user purchases multiple times, he gets a code each times (no unique constraint on username). Because I am preparing for a high activity I want to make sure that:
No concurrency/deadlock can occur
Performance is not impacted by the locking mechanism which will be needed
From the SO question (see link above) I found that doing such a query is faster:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL LIMIT 1 FOR UPDATE;
// Use result for next query
UPDATE v_ext SET username=xxx WHERE v_id=...;
COMMIT;
However I found a deadlock issue ONLY when using an index on username column. I thought of adding an index would help in speeding up a little bit but it creates a deadlock after about 19,970 records (actually quite consistently at this number of rows). Is there a reason for this? I don't understand. Thank you.
From a purely theoretical point of view, it looks like you are not locking the right rows (different condition in the first statement than in the update statement; besides you only lock one row because of LIMIT 1, whereas you possibly update more rows later on).
Try this:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL AND v_id=yyy FOR UPDATE;
UPDATE v_ext SET username=xxx WHERE v_id=yyy;
COMMIT;
[edit]
As for the reason for your deadlock, this is the probable answer (from the manual):
If you have no indexes suitable for your statement and MySQL must scan
the entire table to process the statement, every row of the table
becomes locked (...)
Without an index, the SELECT ... FOR UPDATE statement is likely to lock the entire table, whereas with an index, it only locks some rows. Because you didn't lock the right rows in the first statement, an additional lock is acquired during the second statement.
Obviously, a deadlock cannot happen if the whole table is locked (i.e. without an index).
A deadlock can certainly occur in the second setup.
First of all, the definition of the table is wrong. You have no tid column in the table, so i am suspecting the primary key is v_id.
Second of all, if you select for update, you lock the row. Any other select coming until the first transaction is done will wait for the row to be cleared, because it will hit the exact same record. So you will have waits for this row.
However, i pretty much doubt this can be a real serious problem in your case, because first of all, you have the username there, and second of all you have the product id there. It is extremly unlikely that you will have alot of hits on that exact same record you hit initially, and even if you do, the transaction should be running very fast.
You have to understand that by using transactions, you usually give up pretty much on concurrency for consistent data. There is no way to support consistency of data and concurrency at the same time.