I have several servers hitting a common MySQL box and I need exclusive access to a table of scheduled jobs.
After some reading here and elsewhere I was led to believe SELECT...FOR UPDATE was what I wanted, but now we are (very rarely) seeing multiple servers pick up the same record.
Here's the PROC (minus the BEGIN/END stuff because it was playing hell with my formatting):
CREATE DEFINER=`root`#`%` PROCEDURE `PopScheduledJob`(OUT `JobId` varchar(36) )
SELECT ScheduledJobId INTO JobId
FROM scheduledjob
WHERE
Status = 0
AND NextRun < UTC_TIMESTAMP()
ORDER BY StartDate
LIMIT 1
FOR UPDATE;
UPDATE scheduledjob
SET Status = 2
WHERE ScheduledJobId = JobId;
So the intent here is that it should only pick up a job with Status=0, and it sets it to 1 immediately.
My hope was that this would prevent any other thread/process from accessing the same record, but now it seems that's not the case.
EDIT: forgot to mention, we have an InnoDB backing store
Related
Suppose I ran this statement:
UPDATE Employees set country='AU'
On an InnoDB table, Employees, with about 10 million rows.
This table is also actively being updated by other users through SQL queries like the one below:
E.g. a User, ID = 20, changes their country to NZ:
UPDATE Employees set country='NZ' where id = 20
In that case, will any further updates to this table block until the
general update completes?
If so, is there a way to allow specific
updates and the general update to run concurrently, if they are not
updating the same row? (To clarify what I mean here: suppose the
general update finishes updating Employees with Id 1 - 50, and is
now updating Emplyoees 51 - ~10 million, a singular update on
Employee with id of 20 should go through without waiting for the
general update to finish)
Yes, the first update will place exclusive locks on all records in the table, blocking other queries from updating it. The locks are held until the transaction is commited.
No. The locks are held while the transaction is running and a released when the transaction is commited. You may want to update the table in chuncks, rather than in one big bang, avoiding the first update locking the entire table. Or execute the update outside of business hours, if possible.
Let's "think out of the box"...
Have 2 columns. One with the counter; one (dy) with the DATE of the last increment of the counter. Then make the bumping of the counter a little more complex -- namely to reset it to 1 if the date is before today. Also (always) update the date to CURDATE().
Something like
UPDATE t
SET counter = IF (dy = CURDATE(), counter + 1, 1),
dy = CURDATE()
WHERE id = 123
This eliminates the big, nightly, update.
To fetch the counter for the current day,
SELECT IF (dy = CURDATE(), counter, 0) AS counter
WHERE id = 123;
This technique also avoids having to run the big update at exactly midnight. And a second-order "bug" if the machine happened to be down at midnight (and the update fails to get run for the day).
I need to select, make manipulation and update a lot of data for less than 3 minutes. And was decided to create some kind of locking mechanism to make the ability to run separate processes (in parallel) and each process should lock, select and update own rows.
To make it possible was decided to add the column worker_id to the table.
Table structure:
CREATE TABLE offers
(
id int(10) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
offer_id int(11) NOT NULL,
offer_sid varchar(255) NOT NULL,
offer_name varchar(255),
account_name varchar(255),
worker_id varchar(255),
);
CREATE UNIQUE INDEX offers_offer_id_offer_sid_unique ON offers (offer_id, offer_sid);
CREATE INDEX offers_offer_id_index ON offers (offer_id);
CREATE INDEX offers_offer_sid_index ON offers (offer_sid);
Also, we decided to start from 5 parallel processes and to not allow selection of the same row by different processes we are using the formula: offer_id % max_amount_of_processes = process_number (process_number starting from 0, so first is 0 and last is 4)
Each process is following the steps:
set worker_id with current process id to the first 1000 rows using the query: update offers set worker_id =: process_id where worker_id is null and offer_id%5 =: process_number order by offer_id asc limit 1000
select those rows: select * from offers where worker_id =: process_id
order by offer_id asc limit 1000
make manipulation with data, store last offer_id to the variable and prepared data to another variable for further update
run the same query from step 1 to lock next 1000 rows
run the same query as we have in step 2 with additional where clause and offer_id > :last_selected_id to select next 1000 rows
make the same steps in the loop until we lock all rows
remove all locks update offers set worker_id = null where worker_id =: process_id
run the query to update all collected data
and the same steps for other 4 processes
The issue here is that I'm getting a deadlock when all 5 processes simultaneously run the query from step 1 to lock rows (set worker_id) but each process doing lock for own rows which depending on the formula. I tried to set transaction isolation level to READ COMMITED but still the same issue.
I'm a novice in the locking mechanism and I need a help to prevent deadlocks here or to create the better mechanism
The expression offer_id%5 = :process_number cannot use an index, so it can only scan all the rows matched by the first condition, worker_id is null.
You can prove this with two windows:
mysql1> begin;
mysql1> set #p=1;
mysql1> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
Don't commit the transaction in window 1 yet.
mysql2> set #p=2;
mysql2> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
...waits for about 50 seconds, or value of innodb_lock_wait_timeout, then...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
This demonstrates that each concurrent session locks overlapping sets of rows, not only the rows that match the modulus expression. So the sessions queue up against each other's locks.
This will get worse if you put all the steps into a transaction like #SloanThrasher suggests. Making the work of each worker take longer will make them hold only their locks longer, and further delay the other processes waiting on those locks.
I do not understand how updated_at field can cause the issue as I'm still updating other fields
I'm not sure because you haven't posted the InnoDB deadlock diagnostics from SHOW ENGINE INNODB STATUS.
I do notice that your table has a secondary UNIQUE KEY, which will also require locks. There are some cases of deadlocks that occur because of non-atomicity of the lock assignment.
Worker 1 Worker 2
UPDATE SET worker_id = 1
(acquires locks on PK)
UPDATE SET worker_id = 2
(waits for PK locks held by worker 1)
(waits for locks on UNIQUE KEY)
Both worker 1 and worker 2 can therefore be waiting on each other, and enter into a deadlock.
This is just a guess. Another possibility is that the ORM is doing a second UPDATE for the updated_at column, and this introduces another opportunity for a race condition. I haven't quite worked that out mentally, but I think it's possible.
Below is a recommendation for a different system that would avoid these problems:
There's another problem, that you're not really balancing the work over your processes to achieve the best completion time. There might not be an equal number of offers in each group when you split them by modulus. And each offer might not take the same amount of time to process anyway. So some of your workers could finish and have nothing to do, while the last worker is still processing its work.
You can solve both problems, the locking and the load-balancing:
Change the table columns in the following way:
ALTER TABLE offers
CHANGE worker_id work_state ENUM('todo', 'in progress', 'done') NOT NULL DEFAULT 'todo',
ADD INDEX (work_state),
ADD COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX (updated_at);
Create ONE process that reads from the table periodically, and adds the primary key id values of offers in a 'todo' state to a message queue. All the offers, regardless of their offer_id value, get queued in the same way.
SELECT id FROM offers WHERE work_state = 'todo'
/* push each id onto the queue */
Then each of the workers can pull one id at a time from the message queue. The worker does the following steps with each id:
UPDATE offers SET work_state = 'in progress' WHERE id = :id
The worker performs the work for its one offer.
UPDATE offers SET work_state = 'done' WHERE id = :id
These worker queries only reference one offer at a time, and they address the offers by primary key, which will use the PK index and only lock one row at a time.
Once it has finished one offer, then the worker pulls the next offer from the queue.
In this way, the workers will all finish at the same time, and the work will be balanced over the workers better. Also you can start or stop workers at any time, and you don't care about what worker number they are, because your offers don't need to be processed by a worker with the same number as the modulus of the offer_id.
When the workers finish all the offers, the message queue will be empty. Most message queues allow workers to do blocking reads, so while the queue is empty, the worker will just wait for the read to return. When you use a database, the workers have to poll frequently for new work.
There's a chance a worker will fail during its work, and never mark an offer 'done'. You need to check periodically for orphaned offers. Assume they are not going to be completed, and mark their state 'todo'.
UPDATE offers SET work_state = 'todo'
WHERE work_state = 'in progress' AND updated_at < NOW() - INTERVAL 5 MINUTE
Choose the interval length so it's certain that any worker would have finished it by that time unless something had gone wrong. You would probably do this "reset" before the dispatcher queries for current offers todo, so the offers that had been forgotten will be re-queued.
I found the issue. It was because my ORM is by default updating timestamp fields (to simplify the example above I removed them from table structure) while doing an update operation, and after I turn it off the deadlock disappeared. But still, I do not understand how updated_at field can cause the issue as I'm still updating other fields
Here is my Query:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
DROP TEMPORARY TABLE IF EXISTS taken;
CREATE TEMPORARY Table taken(
id int,
invoice_id int
);
INSERT INTO taken(id, invoice_id)
SELECT id, $invoice_id FROM `licenses` l
WHERE l.`status` = 0 AND `type` = $type
LIMIT $serial_count
FOR UPDATE;
UPDATE `licenses` SET `status` = 1
WHERE id IN (SELECT id FROM taken);
If I'm going to face high concurrency is the query above thread-safe? I mean I don't wanna assign records which has already assigned to another one.
With your FOR UPDATE statement, you are locking all selected licenses until you perform an update, so you can be sure that there will not be concurrency problem on those records.
the only problem i can see is that if your query requires a lot of time to perform (how many licenses do you expect to process at every query?) and other queries requires licenses (even read queries are locked) on the same time, your system will be slowed down.
(Please answer as generally as possible. But I am working in MS SQL Server and MySql, so if there is no general answer, go ahead and answer for one or both of those.)
Consider a reservation system implemented in a SQL database. I want to make sure that among many multiple users, only one user gets a reservation and no other user "thinks" they got it. It's a classic concurrency problem in DB work, but I'm not sure what the best answer is.
SPECIFICS:
Suppose each user has a UserID. We could imagine a few users are currently trying to make a reservation with UserID values of 1004, 1005, 1009, and 1011.
Suppose the resource and reservations are stored in a table, SEATS .
We could imagine at one point the SEATS table contains:
----- SEATS -----------------------------
SeatID UserID ResvTime
1 1017 2014.07.15 04:17:18.000
2 NULL NULL
3 NULL NULL
4 1012 2014.07.15 04:19:35.000
5 1003 2014.07.15 04:20:46.000
-----------------------------------------
Now suppose that "at the same time", users 1004 and 1005 try to grab SeatID 3.
I'm wanting to know what SQL will properly make sure that only one of them gets the seat and the other gets a refusal. The simplest version of the code I can think of, in T-SQL, would be:
PROC GRABSEAT #seatid INT, #userid INT, #obtained BIT OUTPUT
BEGIN
DECLARE #avail INT
SET #avail = (SELECT UserID FROM SEATS WHERE (SeatID = #seatid))
IF (#avail IS NULL)
BEGIN
UPDATE SEATS SET UserID = #userid, ResvTime = GETDATE() WHERE (SeatID = #seatid)
SET #obtained = 1
END
ELSE
SET #obtained = 0
END
But the question is how to prevent this from allowing multiple concurrent users, all executing this PROC, getting a TRUE return on the same seat (say SeatID = 3).
For example, if both users 1004 and 1005 execute this PROC nearly simultaneously, they could both do the SELECT and get #avail = NULL before either of them tries to set the UserID column. Then both of them would run the UPDATE statement. Assuming nothing even worse could result, then one of them would overwrite the other's set, both would think they got the seat, but actually only the one who ran the UPDATE statement last would have their data stored in the SEATS table. The other one would have their data overwritten. This is referred to as a "lost input" problem. But what is the way in a SQL database to prevent it? I have been assuming that each single SQL statement is executed as a TRANSACTION. A TRANSACTION has the four so-called "ACID" properties. These properties are what I need. So, I think the answer, in a SQL database, is:
BEGIN TRANSACTION
EXCEUTE GRABSEAT #seatid= <id1>, #userid = <id2>, #obtained
COMMIT
By doing so, the main property I need (isolation), will guarantee that the interleaved execution I'm worried about will not occur.
But I've seen articles that say it's not that simple at all. I think the big problem various articles point to is that not every TRANSACTION really runs in total atomicity and isolation. So, perhaps the above wrapping in a TRANSACTION will not achieve the desired result. If not, then what is needed?
A transaction is atomic by definition. But when a transaction's changes become visible to other users / connections / transactions depends on the isolation level. The default isolation in SQL Server is READ COMMITTED - see this question's answer for more info and links on how to change it.
For this type of scenario, you probably want SERIALIZABLE. The good news is that you can change the isolation level for a transaction with a SET TRANSACTION ISOLATION LEVEL statement in your stored proc. The bad news is that you have to be 100% sure that this is the only place in your code that ever updates the SEAT table.
Fundamentally, the issue you have is that there is a race condition. Just because you are in a transaction does not mean that two transactions can't both call the stored proc at the same time, then run the SELECT. Now both tx think it's ok to to do the UPDATE. Setting the isolation level to SERIALIZABLE locks the table for the tx that hits the SELECT first.
Instead of the SELECT statement, why don't you just do the update, with an extra filter on NULL so it can't replace if the value is null, and then return whether the query had any effect or not. That way, the transaction is atomic, since it's just one query.
PROC GRABSEAT #seatid INT, #userid INT, #obtained BIT OUTPUT
BEGIN
UPDATE SEATS SET UserID = #userid, ResvTime = GETDATE()
WHERE (SeatID = #seatid) AND UserID IS NULL
SET #obtained = ##ROWCOUNT
END
Due to rowlocking, two updates can't happen simultaneously, so one will work (return ##ROWCOUNT = 1, and the other will fail ##ROWCOUNT = 0.
I would like to get a suggestion on improving my setup that is causing the sql server to return the deadlock message. I have multiple threading application that actually uses the TaskParallel library and each task will use a stored procedure to select an id from a table to use in its processing. I immediately delete that id from the table in the same statement and I think that is what is causing the deadlocks. The table consists of one column of uniques ids with no indexes. I thought of doing a batch delete periodically but that means keeping a tally of used ids across multiple servers.
here is my sql stored procedure:
CREATE PROCEDURE [dbo].[get_Ids]
#id nvarchar(20) OUTPUT
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
Select top 1 #id = siteid from siteids
delete siteids where siteid = #id
END
Is there any better way to do this? My processes work very fast and I used to request this id from a webrequest service but this took 3 seconds.
Some things to try:
Maybe try hinting to the DB that you will delete the record you just selected, this way it will grab the lock early. For this to work you'll need to wrap the whole procedure in a transaction, then hint the select. Should look something like:
BEGIN TRANSACTION
SELECT TOP 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
DELETE siteids WHERE siteid = #id
COMMIT TRANSACTION
Also make sure the siteid column is indexed(or tag it as primary key, since you say it is unique), otherwise it would have to scan the table to get the record to delete, which could make deadlocking worse since it spends a bunch more time deleting.
For deadlocks in general, run the SQL profiler and see what the deadlock graph looks like - might be something else is going on.