mysql deadlock while updating - mysql

I need to select, make manipulation and update a lot of data for less than 3 minutes. And was decided to create some kind of locking mechanism to make the ability to run separate processes (in parallel) and each process should lock, select and update own rows.
To make it possible was decided to add the column worker_id to the table.
Table structure:
CREATE TABLE offers
(
id int(10) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
offer_id int(11) NOT NULL,
offer_sid varchar(255) NOT NULL,
offer_name varchar(255),
account_name varchar(255),
worker_id varchar(255),
);
CREATE UNIQUE INDEX offers_offer_id_offer_sid_unique ON offers (offer_id, offer_sid);
CREATE INDEX offers_offer_id_index ON offers (offer_id);
CREATE INDEX offers_offer_sid_index ON offers (offer_sid);
Also, we decided to start from 5 parallel processes and to not allow selection of the same row by different processes we are using the formula: offer_id % max_amount_of_processes = process_number (process_number starting from 0, so first is 0 and last is 4)
Each process is following the steps:
set worker_id with current process id to the first 1000 rows using the query: update offers set worker_id =: process_id where worker_id is null and offer_id%5 =: process_number order by offer_id asc limit 1000
select those rows: select * from offers where worker_id =: process_id
order by offer_id asc limit 1000
make manipulation with data, store last offer_id to the variable and prepared data to another variable for further update
run the same query from step 1 to lock next 1000 rows
run the same query as we have in step 2 with additional where clause and offer_id > :last_selected_id to select next 1000 rows
make the same steps in the loop until we lock all rows
remove all locks update offers set worker_id = null where worker_id =: process_id
run the query to update all collected data
and the same steps for other 4 processes
The issue here is that I'm getting a deadlock when all 5 processes simultaneously run the query from step 1 to lock rows (set worker_id) but each process doing lock for own rows which depending on the formula. I tried to set transaction isolation level to READ COMMITED but still the same issue.
I'm a novice in the locking mechanism and I need a help to prevent deadlocks here or to create the better mechanism

The expression offer_id%5 = :process_number cannot use an index, so it can only scan all the rows matched by the first condition, worker_id is null.
You can prove this with two windows:
mysql1> begin;
mysql1> set #p=1;
mysql1> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
Don't commit the transaction in window 1 yet.
mysql2> set #p=2;
mysql2> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
...waits for about 50 seconds, or value of innodb_lock_wait_timeout, then...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
This demonstrates that each concurrent session locks overlapping sets of rows, not only the rows that match the modulus expression. So the sessions queue up against each other's locks.
This will get worse if you put all the steps into a transaction like #SloanThrasher suggests. Making the work of each worker take longer will make them hold only their locks longer, and further delay the other processes waiting on those locks.
I do not understand how updated_at field can cause the issue as I'm still updating other fields
I'm not sure because you haven't posted the InnoDB deadlock diagnostics from SHOW ENGINE INNODB STATUS.
I do notice that your table has a secondary UNIQUE KEY, which will also require locks. There are some cases of deadlocks that occur because of non-atomicity of the lock assignment.
Worker 1 Worker 2
UPDATE SET worker_id = 1
(acquires locks on PK)
UPDATE SET worker_id = 2
(waits for PK locks held by worker 1)
(waits for locks on UNIQUE KEY)
Both worker 1 and worker 2 can therefore be waiting on each other, and enter into a deadlock.
This is just a guess. Another possibility is that the ORM is doing a second UPDATE for the updated_at column, and this introduces another opportunity for a race condition. I haven't quite worked that out mentally, but I think it's possible.
Below is a recommendation for a different system that would avoid these problems:
There's another problem, that you're not really balancing the work over your processes to achieve the best completion time. There might not be an equal number of offers in each group when you split them by modulus. And each offer might not take the same amount of time to process anyway. So some of your workers could finish and have nothing to do, while the last worker is still processing its work.
You can solve both problems, the locking and the load-balancing:
Change the table columns in the following way:
ALTER TABLE offers
CHANGE worker_id work_state ENUM('todo', 'in progress', 'done') NOT NULL DEFAULT 'todo',
ADD INDEX (work_state),
ADD COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX (updated_at);
Create ONE process that reads from the table periodically, and adds the primary key id values of offers in a 'todo' state to a message queue. All the offers, regardless of their offer_id value, get queued in the same way.
SELECT id FROM offers WHERE work_state = 'todo'
/* push each id onto the queue */
Then each of the workers can pull one id at a time from the message queue. The worker does the following steps with each id:
UPDATE offers SET work_state = 'in progress' WHERE id = :id
The worker performs the work for its one offer.
UPDATE offers SET work_state = 'done' WHERE id = :id
These worker queries only reference one offer at a time, and they address the offers by primary key, which will use the PK index and only lock one row at a time.
Once it has finished one offer, then the worker pulls the next offer from the queue.
In this way, the workers will all finish at the same time, and the work will be balanced over the workers better. Also you can start or stop workers at any time, and you don't care about what worker number they are, because your offers don't need to be processed by a worker with the same number as the modulus of the offer_id.
When the workers finish all the offers, the message queue will be empty. Most message queues allow workers to do blocking reads, so while the queue is empty, the worker will just wait for the read to return. When you use a database, the workers have to poll frequently for new work.
There's a chance a worker will fail during its work, and never mark an offer 'done'. You need to check periodically for orphaned offers. Assume they are not going to be completed, and mark their state 'todo'.
UPDATE offers SET work_state = 'todo'
WHERE work_state = 'in progress' AND updated_at < NOW() - INTERVAL 5 MINUTE
Choose the interval length so it's certain that any worker would have finished it by that time unless something had gone wrong. You would probably do this "reset" before the dispatcher queries for current offers todo, so the offers that had been forgotten will be re-queued.

I found the issue. It was because my ORM is by default updating timestamp fields (to simplify the example above I removed them from table structure) while doing an update operation, and after I turn it off the deadlock disappeared. But still, I do not understand how updated_at field can cause the issue as I'm still updating other fields

Related

Fetch Unique Rows from Database in a Multi deployment Environment

I have a table (codes) in relational database MySQL which stores a list of unique codes. The schema sample is given below.
Integer id
String codes
String user
Boolean is_available
id | codes | user | is_available
1 | ABC | | true
2 | XYZ | | true
Whenever a code is assigned to a user user_id is updated and is_available is made false.
1 | ABC | user_id | false
There are multiple instances of a service running which fetch the code from database and gives it to the user.
Each request must give a unique code.
GET /code -> returns a unique code
As there are multiple instances what is the best way to handle the concurrency.
Is update and then select the correct way to do it ? Does it make the full table scan ?
Read optimistic locking , where we can set retries on a failed attempt but dont feel this is a good approach.
Setting isolation level to serializable is suggested but i dont think it should be used in production environment.
Having an centralised blocking queue which pops a unique code each time the request is made but it will be a single point of failure.
I have read a lot of theroy on this but I am looking for how it should be implemented on an enterprise scale application having thousands on concurrent calls.
Trying to answer "which is better" would be an opinion-based answer, because there's hardly ever a single right answer to architecture problems. They're always a tradeoff, so you have to decide which advantages are priorities for your project.
I'll try to address your specific questions, which are not opinion-based.
You can use a locking read to acquire an exclusive lock on a single row.
SELECT * FROM codes WHERE is_available='true' ORDER BY id LIMIT 1 FOR UPDATE;
Then update that row to claim it for a given user, and then return the code as the response.
UPDATE codes SET user = ?, is_available = 'false' WHERE id = ?;
Neither of these statements will do a table-scan. The SELECT will terminate when it gets the first row that satisfies the LIMIT, and the UPDATE applies only to one row which it looks up by its primary key.
Optimistic locking has no automatic retries. That's up to your application code. But ideally, you shouldn't need to do retries due to lock timeouts if your code is prompt about finishing transactions.
There's no need to set the isolation level to serializable globally even if you chose this strategy. You can use SET TRANSACTION immediately before starting a transaction, to change the isolation level once.
A centralized queue of unique codes would be an effective way to scale further, once your concurrent traffic grows so much that the locking SQL reads are a bottleneck. Any architecture need to be reevaluated periodically, because what worked at scale N may not be the best for scale 10 * N.
Have user_id NULLable, Then is_available can be eliminated and be replaced by user_id IS NULL.
Pseudo-code:
BEGIN;
$code <- SELECT code FROM t WHERE user_id IS NULL LIMIT 1 FOR UPDATE;
UPDATE t SET user_id = '$user_id' WHERE code = '$code';
COMMIT;
It's atomic. It assigns some code to the given user_id. It allows a user to have multiple "codes".
If a user can have only one "code", I think the transaction can be simplified.

Lock tables and/or transaction for stock table?

How can I maintain data integrity for a stock table with InnoDB? Does this require the use of both a table lock and a transaction? I have tried to find some good examples, but even well known opensource ecommerce software doesn't implement table locks/transactions. Would below code work? Do I really need to lock the whole table, or is there a way to only lock the rows of the sold products? Or does anyone have a good example?
CREATE TABLE `stock` (
`product_id` VARCHAR(50) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`quantity` INT(10) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`product_id`)
)
COLLATE='utf8mb4_unicode_ci'
ENGINE=InnoDB;
.
LOCK TABLES stock WRITE;
START TRANSACTION;
SET autocommit = 0;
UPDATE stock SET quantity = quantity - 4 WHERE product_id = 'PRODUCT_1' AND quantity >= 4;
UPDATE stock SET quantity = quantity - 2 WHERE product_id = 'PRODUCT_2' AND quantity >= 2;
UPDATE stock SET quantity = quantity - 5 WHERE product_id = 'PRODUCT_3' AND quantity >= 5;
COMMIT or ROLLBACK;
COMMIT if all UPDATE queries have 1 affected row, otherwise ROLLBACK
It should not be necessary to do the table-lock, unless you want to make sure concurrent clients running the same code don't run simultaneous updates. I'm not sure that matters, because each of your updates is working on distinct subsets of rows.
In any case, it's not necessary to set autocommit=0 after you start a transaction. Starting a transaction implicitly means that it won't commit until you use COMMIT (or some statement causes an implicit commit).
You might even wrap all three updates into one update, except that you said you want to make sure each update changes at least one row.
"1 affected row" is not what is tested. Failure, deadlock, timeout, etc are tested. If you want to fail if it is not exactly 1 row each, then you need to fetch rows_affected, test it for "1", then ROLLBACK or not.
Also, the
COMMIT or ROLLBACK
needs to be "if ... then ROLLBACK else COMMIT". This is best done in your application language.

MySQL FOR UPDATE exclusive access

I have several servers hitting a common MySQL box and I need exclusive access to a table of scheduled jobs.
After some reading here and elsewhere I was led to believe SELECT...FOR UPDATE was what I wanted, but now we are (very rarely) seeing multiple servers pick up the same record.
Here's the PROC (minus the BEGIN/END stuff because it was playing hell with my formatting):
CREATE DEFINER=`root`#`%` PROCEDURE `PopScheduledJob`(OUT `JobId` varchar(36) )
SELECT ScheduledJobId INTO JobId
FROM scheduledjob
WHERE
Status = 0
AND NextRun < UTC_TIMESTAMP()
ORDER BY StartDate
LIMIT 1
FOR UPDATE;
UPDATE scheduledjob
SET Status = 2
WHERE ScheduledJobId = JobId;
So the intent here is that it should only pick up a job with Status=0, and it sets it to 1 immediately.
My hope was that this would prevent any other thread/process from accessing the same record, but now it seems that's not the case.
EDIT: forgot to mention, we have an InnoDB backing store

Skip locked rows in mysql update to avoid lock timeout

My question is similar to:
Ignoring locked row in a MySQL query
except that I have already implemented a logic close to what's suggested in the accepted answer. My question is how to set the process id initially. All servers run a query like (the code is in ruby on rails but the resulting mysql query is):
UPDATE (some_table) SET process_id=(some process_id) WHERE (some condition on row_1) AND process_id is null ORDER BY (row_1) LIMIT 100
Now what happens is all processes try to update the same rows, they get locked and they timeout waiting for the lock. I would like the servers to ignore the rows that are locked (because after the lock is released the process_id won't be null anymore so there is no point for locking here).
I could try to randomize the batch of records to update but the problem is I want to prioritize the update based on row_1 as in the query above.
So my question is, is there a way in mysql to check if a record is locked and ignore it if it is?
No, there is no way to ignore already-locked rows. Your best bet will be to ensure that nothing locks any row for any extended period of time. That will ensure that any lock conflicts are very short in duration. That will generally mean "advisory" locking of rows by locking them within a transaction (using FOR UPDATE) and updating the row to mark it as "locked".
For example, first you want to find your candidate row(s) without locking anything:
SELECT id FROM t WHERE lock_expires IS NULL AND lock_holder IS NULL <some other conditions>;
Now lock only the row you want, very quickly:
START TRANSACTION;
SELECT * FROM t WHERE id = <id> AND lock_expires IS NULL AND lock_holder IS NULL;
UPDATE t SET lock_expires = <some time>, lock_holder = <me> WHERE id = <id>;
COMMIT;
(Technical note: If you are planning to lock multiple rows, always lock them in a specific order. Ascending order by primary key is a decent choice. Locking out-of-order or in random order will subject your program to deadlocks from competing processes.)
Now you can take as long as you want (less than lock_expires) to process your row(s) without blocking any other process (they won't match the row during the non-locking select, so will always ignore it). Once the row is processed, you can UPDATE or DELETE it by id, also without blocking anything.

MySQL Select... for update with index has concurrency issue

This is a follow up on my previous question (you can skip it as I explain in this post the issue):
MySQL InnoDB SELECT...LIMIT 1 FOR UPDATE Vs UPDATE ... LIMIT 1
Environment:
JSF 2.1 on Glassfish
JPA 2.0 EclipseLink and JTA
MySQL 5.5 InnoDB engine
I have a table:
CREATE TABLE v_ext (
v_id INT NOT NULL AUTO_INCREMENT,
product_id INT NOT NULL,
code VARCHAR(20),
username VARCHAR(30),
PRIMARY KEY (v_id)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;
It is populated with 20,000 records like this one (product_id is 54 for all records, code is randomly generated and unique, username is set to NULL):
v_id product_id code username
-----------------------------------------------------
1 54 '20 alphanumerical' NULL
...
20,000 54 '20 alphanumerical' NULL
When a user purchase product 54, he gets a code from that table. If the user purchases multiple times, he gets a code each times (no unique constraint on username). Because I am preparing for a high activity I want to make sure that:
No concurrency/deadlock can occur
Performance is not impacted by the locking mechanism which will be needed
From the SO question (see link above) I found that doing such a query is faster:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL LIMIT 1 FOR UPDATE;
// Use result for next query
UPDATE v_ext SET username=xxx WHERE v_id=...;
COMMIT;
However I found a deadlock issue ONLY when using an index on username column. I thought of adding an index would help in speeding up a little bit but it creates a deadlock after about 19,970 records (actually quite consistently at this number of rows). Is there a reason for this? I don't understand. Thank you.
From a purely theoretical point of view, it looks like you are not locking the right rows (different condition in the first statement than in the update statement; besides you only lock one row because of LIMIT 1, whereas you possibly update more rows later on).
Try this:
START TRANSACTION;
SELECT v_id FROM v_ext WHERE username IS NULL AND v_id=yyy FOR UPDATE;
UPDATE v_ext SET username=xxx WHERE v_id=yyy;
COMMIT;
[edit]
As for the reason for your deadlock, this is the probable answer (from the manual):
If you have no indexes suitable for your statement and MySQL must scan
the entire table to process the statement, every row of the table
becomes locked (...)
Without an index, the SELECT ... FOR UPDATE statement is likely to lock the entire table, whereas with an index, it only locks some rows. Because you didn't lock the right rows in the first statement, an additional lock is acquired during the second statement.
Obviously, a deadlock cannot happen if the whole table is locked (i.e. without an index).
A deadlock can certainly occur in the second setup.
First of all, the definition of the table is wrong. You have no tid column in the table, so i am suspecting the primary key is v_id.
Second of all, if you select for update, you lock the row. Any other select coming until the first transaction is done will wait for the row to be cleared, because it will hit the exact same record. So you will have waits for this row.
However, i pretty much doubt this can be a real serious problem in your case, because first of all, you have the username there, and second of all you have the product id there. It is extremly unlikely that you will have alot of hits on that exact same record you hit initially, and even if you do, the transaction should be running very fast.
You have to understand that by using transactions, you usually give up pretty much on concurrency for consistent data. There is no way to support consistency of data and concurrency at the same time.