MySQL row lock and atomic updates - mysql

I am building a "poor man's queuing system" using MySQL. It's a single table containing jobs that need to be executed (the table name is queue). I have several processes on multiple machines whose job it is to call the fetch_next2 sproc to get an item off of the queue.
The whole point of this procedure is to make sure that we never let 2 clients get the same job. I thought that by using the SELECT .. LIMIT 1 FOR UPDATE would allow me to lock a single row so that I could be sure it was only updated by 1 caller (updated such that it no longer fit the criteria of the SELECT being used to filter jobs that are "READY" to be processed).
Can anyone tell me what I'm doing wrong? I just had some instances where the same job was given to 2 different processes so I know it doesn't work properly. :)
CREATE DEFINER=`masteruser`#`%` PROCEDURE `fetch_next2`()
BEGIN
SET #id = (SELECT q.Id FROM queue q WHERE q.State = 'READY' LIMIT 1 FOR UPDATE);
UPDATE queue
SET State = 'PROCESSING', Attempts = Attempts + 1
WHERE Id = #id;
SELECT Id, Payload
FROM queue
WHERE Id = #id;
END

Code for the answer:
CREATE DEFINER=`masteruser`#`%` PROCEDURE `fetch_next2`()
BEGIN
SET #id := 0;
UPDATE queue SET State='PROCESSING', Id=(SELECT #id := Id) WHERE State='READY' LIMIT 1;
#You can do an if #id!=0 here
SELECT Id, Payload
FROM queue
WHERE Id = #id;
END
The problem with what you are doing is that there is no atomic grouping for the operations. You are using the SELECT ... FOR UPDATE syntax. The Docs say that it blocks "from reading the data in certain transaction isolation levels". But not all levels (I think). Between your first SELECT and UPDATE, another SELECT can occur from another thread. Are you using MyISAM or InnoDB? MyISAM might not support it.
The easiest way to make sure this works properly is to lock the table.
[Edit] The method I describe right here is more time consuming than using the Id=(SELECT #id := Id) method in the above code.
Another method would be to do the following:
Have a column that is normally set to 0.
Do an "UPDATE ... SET ColName=UNIQ_ID WHERE ColName=0 LIMIT 1. That will make sure only 1 process can update that row, and then get it via a SELECT afterwards. (UNIQ_ID is not a MySQL feature, just a variable)
If you need a unique ID, you can use a table with auto_increment just for that.
You can also kind of do this with transactions. If you start a transaction on a table, run UPDATE foobar SET LockVar=19 WHERE LockVar=0 LIMIT 1; from one thread, and do the exact same thing on another thread, the second thread will wait for the first thread to commit before it gets its row. That may end up being a complete table blocking operation though.

Related

Understanding MySQL concurrency/isolation levels

I am working on the backend of an application that needs to protect an external API from too many requests per user per month. So I need to keep track of number of requests from each user. I have a lot of experience with concurrent programming but almost no experience with db management or MySQL,
So, suppose I want to execute the equivalent of the following pseudocode, where I mix SQL statements with application-level logic, and where lookups is a table:
mutex mtx;
set #userid = 'usrid1';
set #date = CURDATE();
set #month = CONCAT_WS('-', YEAR(#date), MONTH(#date));
mtx.lock()
select counter from lookups where userid=#userid and month=#month;
if returned rows == 0:
insert into lookups set month=#month, userid=#userid, counter=1;
else:
update lookups set counter=counter+1;
mtx.unlock()
Except, of course, I don't have access to that mutex. At first I thought it would be enough to just wrap the whole thing inside a transaction, but upon closer inspection of the MySQL reference it seems that may not be enough to avoid possible race conditions, such as two threads/processes reading the same counter value. Is it good enough then, in mysql with default settings, to do the following:
set #userid = 'usrid1';
set #date = CURDATE();
set #month = CONCAT_WS('-', YEAR(#date), MONTH(#date));
start transaction;
select counter from lookups where userid=#userid and month=#month for update;
if returned rows == 0:
insert into lookups set month=#month, userid=#userid, counter=1;
else:
update lookups set counter=counter+1;
commit;
From what I can glean from the reference, it looks like it should be enough, and it should cause neither race conditions nor deadlocks, but the reference is long winded and complex, so I wanted to ask here to be sure. Performance isn't important. The reference states that MySQL's default isolation level is REPEATABLE READ.
I suggest this solution:
create table lookups (userid varchar(20), yearmonth date, counter int, primary key (userid, yearmonth));
insert into lookups set userid = 'usrid1',
yearmonth = date_format(curdate(), '%Y-%m-01'),
counter = last_insert_id(1)
on duplicate key update
counter = last_insert_id(counter + 1);
select last_insert_id(); -- returns the new value, whether 1 or the updated value.
This means you don't have to check if a row exists, it will either insert it or update it atomically.
The last_insert_id(<expression>) trick is documented at the end of the entry for that function: https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_last-insert-id

MySQL SET user variable locks rows and doesn't obey REPEATABLE READ

I've encountered an undocumented behavior of "SET #my_var = (SELECT ..)" inside a transaction:
The first one is that it locks rows ( depends whether it is a unique index or not ).
Example -
START TRANSACTION;
SET #my_var = (SELECT id from table_name where id = 1);
select trx_rows_locked from information_schema.innodb_trx;
ROLLBACKL;
The output is 1 row locked, which is strange, it shouldn't gain a reading lock.
Also, the equivalent statement SELECT id INTO #my_var won't produce a lock.
It can lead to a deadlock in case of an UPDATED after the SET statement ( for 2 concurrent requests )
In REPEATABLE READ -
The SELECT inside the SET statement gets a new snapshot of the data, instead of using the original SNAPSHOT.
SESSION 1:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START transaction;
SELECT data FROM my_table where id = 2; # Output : 2
SESSION 2:
UPDATE my_table set data = 3 where id = 2 ;
SESSION 1:
SET #data = (SELECT data FROM my_table where id = 2);
SELECT #data; # Output : 3, instead of 2
ROLLBACK;
However, I would expect that #data will contain the original value from the first snapshot ( 2 ).
If I use SELECT data into #data from my_table where id = 2 then I will get the expected value - 2;
Do you have an idea what is the source of the different behavior of SET = (SELECT ..) compared to SELECT data INTO #var FROM .. ?
Thanks.
Correct — when you SELECT in a context where you're copying the results into a variable or a table, it implicitly works as if you had used a locking read SELECT ... FOR SHARE.
This means it places a shared lock on the rows examined, and it also means that the statement reads only the most recently committed version of rows, as if your transaction were in READ-COMMITTED isolation level.
I'm not sure why SELECT ... INTO #var does not do the same kind of implicit locking in MySQL 8.0. My memory is that in older versions of MySQL it did do locking in that query form. I've searched the manual for an explanation but I can't find one yet.
Other cases that implicitly lock the rows examined by SELECT, and therefore reads data as if you transaction is READ-COMMITTED:
INSERT INTO <table> SELECT ...
UPDATE or DELETE multi-table, even if you don't update or delete a given table, the rows joined become locked.
SELECT inside a trigger

Is there a way for MySQL to wait for rows matching a condition to be inserted

Let's say i was writing an aplication where'd i'd need to get notifications in real time from a server, and let's say those notifications are stored on a mysql database.
For me to get them i'd have to keep polling the mysql server (keep repeating the same select
query till i actually get results) but i figure that is very unefficient way of doing it since most of the time the select would turn up empty . If i do it often it's unreasonable strain on the server if i do it rarely the notifications would come in very late.
So i was wondering if there is a way for say a mysql query to block until a result matching a condition becomes available.
list = query ("SELECT * FROM `notifications` WHERE `unread`=1") ;
instead of returning an empty list if there is no unread notifications , it would instead wait till there actually are unread notifications to return
I recommend using the producer consumer pattern, implemented with a new table as the "work queue". There is no need for a stored procedure, because the trigger is so simple.
A trigger would populate the work queue
Code would poll the work queue table. Because the table would be very small, the query would be fast and low-load.
Code would do whatever you need and delete rows from the table when finished - keeping it as small as possible
Create a table with the id of the notification to be processed and a "processing status" column, for example:
create table work_queue (
id int not null auto_increment,
notification_id int references notifications,
status enum ('ready', 'processing', 'failed')
);
Create a simple trigger that populates a the work queue table:
delimiter $
create trigger producer after insert on notifications
for each row begin
insert into work_queue (notification_id, status)
select new.id, 'ready'
where new.unread;
end; $
delimiter ;
Your code would have the pseudo code:
select * from work_queue where status = 'ready' order by id limit 1
update work_queue set status = 'processing' where id = <row.id>
Do what you need to notifications where id = <row.notification_id>
either delete from work_queue where id = <row.id> or update work_queue set status = 'failed' where id = <row.id> (you'll have to figure out what to do with failed items)
Sleep 1 second (this pause needs to be about the same as the peak arrival rate of notifications - you'll need to tune this to balance between work_queue size and server load)
goto 1.
If you have a single process polling, there is no need for locking worries. If you have multiple processes polling, you'll need to handle race conditions.

SQL (mySQL) update some value in all records processed by a select

I am using mySQL from their C API, but that shouldn't be relevant.
My code must process records from a table that match some criteria, and then update the said records to flag them as processed. The lines in the table are modified/inserted/deleted by another process I don't control. I am afraid in the following, the UPDATE might flag some records erroneously since the set of records matching might have changed between step 1 and step 3.
SELECT * FROM myTable WHERE <CONDITION>; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1 WHERE <CONDITION> # step 3
What's the smart way to ensure that the UPDATE updates all the lines processed, and only them? A transaction doesn't seem to fit the bill as it doesn't provide isolation of that sort: a recently modified record not in the originally selected set might still be targeted by the UPDATE statement. For the same reason, SELECT ... FOR UPDATE doesn't seem to help, though it sounds promising :-)
The only way I can see is to use a temporary table to memorize the set of rows to be processed, doing something like:
CREATE TEMPORARY TABLE workOrder (jobId INT(11));
INSERT INTO workOrder SELECT myID as jobId FROM myTable WHERE <CONDITION>;
SELECT * FROM myTable WHERE myID IN (SELECT * FROM workOrder);
<iterate over the selected set of lines. This may take some time.>
UPDATE myTable SET processed=1 WHERE myID IN (SELECT * FROM workOrder);
DROP TABLE workOrder;
But this seems wasteful and not very efficient.
Is there anything smarter?
Many thanks from a SQL newbie.
There are several options:
You could lock the table
You could add an AND foo_id IN (all_the_ids_you_processed) as the update condition.
you could update before selecting and then only selecting the updated rows (i.e. by processing date)
I eventually solved this issue by using a column in that table that flags lines according to their status. This column let's me implement a simple state machine. Conceptually, I have two possible values for this status:
kNoProcessingPlanned = 0; #default "idle" value
kProcessingUnderWay = 1;
Now my algorithm does something like this:
UPDATE myTable SET status=kProcessingUnderWay WHERE <CONDITION>; # step 0
SELECT * FROM myTable WHERE status=kProcessingUnderWay; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1, status=kNoProcessingPlanned WHERE status=kProcessingUnderWay # step 3
This idea of having rows in several states can be extended to as many states as needed.

MySQL UPDATE and SELECT in one pass

I have a MySQL table of tasks to perform, each row having parameters for a single task.
There are many worker apps (possibly on different machines), performing tasks in a loop.
The apps access the database using MySQL's native C APIs.
In order to own a task, an app does something like that:
Generate a globally-unique id (for simplicity, let's say it is a number)
UPDATE tasks
SET guid = %d
WHERE guid = 0 LIMIT 1
SELECT params
FROM tasks
WHERE guid = %d
If the last query returns a row, we own it and have the parameters to run
Is there a way to achieve the same effect (i.e. 'own' a row and get its parameters) in a single call to the server?
try like this
UPDATE `lastid` SET `idnum` = (SELECT `id` FROM `history` ORDER BY `id` DESC LIMIT 1);
above code worked for me
You may create a procedure that does it:
CREATE PROCEDURE prc_get_task (in_guid BINARY(16), OUT out_params VARCHAR(200))
BEGIN
DECLARE task_id INT;
SELECT id, out_params
INTO task_id, out_params
FROM tasks
WHERE guid = 0
LIMIT 1
FOR UPDATE;
UPDATE task
SET guid = in_guid
WHERE id = task_id;
END;
BEGIN TRANSACTION;
CALL prc_get_task(#guid, #params);
COMMIT;
If you are looking for a single query then it can't happen. The UPDATE function specifically returns just the number of items that were updated. Similarly, the SELECT function doesn't alter a table, only return values.
Using a procedure will indeed turn it into a single function and it can be handy if locking is a concern for you. If your biggest concern is network traffic (ie: passing too many queries) then use the procedure. If you concern is server overload (ie: the DB is working too hard) then the extra overhead of a procedure could make things worse.
I have the exact same issue. We ended up using PostreSQL instead, and UPDATE ... RETURNING:
The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.
Example: UPDATE 'my_table' SET 'status' = 1 WHERE 'status' = 0 LIMIT 1 RETURNING *;
Or, in your case: UPDATE 'tasks' SET 'guid' = %d WHERE 'guid' = 0 LIMIT 1 RETURNING 'params';
Sorry, I know this doesn't answer the question with MySQL, and it might not be easy to just switch to PostgreSQL, but it's the best way we've found to do it. Even 6 years later, MySQL still doesn't support UPDATE ... RETURNING. It might be added at some point in the future, but for now MariaDB only has it for DELETE statements.
Edit: There is a task (low priority) to add UPDATE ... RETURNING support to MariaDB.
I don't know about the single call part, but what you're describing is a lock. Locks are an essential element of relational databases.
I don't know the specifics of locking a row, reading it, and then updating it in MySQL, but with a bit of reading of the mysql lock documentation you could do all kinds of lock-based manipulations.
The postgres documenation of locks has a great example describing exactly what you want to do: lock the table, read the table, modify the table.
UPDATE tasks
SET guid = %d, params = #params := params
WHERE guid = 0 LIMIT 1;
It will return 1 or 0, depending on whether the values were effectively changed.
SELECT #params AS params;
This one just selects the variable from the connection.
From: here