How to manage single thread job using mysql database? - mysql

I need to run a php a script , I want to make sure there is no more than one script running at the same time .
I am using mysql , and I though about this solution :
I build the bellow database :
job_id | task_id | last_updated_time (AUTO UPDATE)
"sending_emails" 77238 2107-5-3 12:2:2
Before running the script I create random task id , then I run a query to update the task_id .
$task_id = generate_random_task_id();
$query = "
UPDATE
jobs
SET
task_id = $task_id
WHERE
task_id = $task_id
OR
NOW() - last_updated_time > 30
LIMIT 1
"
/*
Then I need to check if there was an update, if yes then I will run the script otherwise i will stop since there is already another script running
*/
$query = "SELECT JOB_ID WHERE taks_id = $task_id "
$result = run($query)
if( ! isset($result[JOB_ID])){
DIE();
}
is there any chance that two scripts run at the same time ?

No, they can't run at the same time, here's MySQL's documentation about UPDATE and SELECT, this is what it says:
UPDATE ... WHERE ... sets an exclusive next-key lock on every record
the search encounters. However, only an index record lock is required
for statements that lock rows using a unique index to search for a
unique row.
Here's more about Shared and Exclusive locks:
A shared (S) lock permits the transaction that holds the lock to read
a row.
An exclusive (X) lock permits the transaction that holds the lock to
update or delete a row.
If a transaction T1 holds an exclusive (X) lock on row r, a request
from some distinct transaction T2 for a lock of either type on r
cannot be granted immediately. Instead, transaction T2 has to wait for
transaction T1 to release its lock on row r.

Yes there's every chance you could run the same task again.
There are two obvious solutions.
One is to open a mysql connection then acquire a lock using GET_LOCK() using a short timeout - if you acquire the lock then you're good to go. You need to maintain the db connection for the lifetime of the script.
Alternatively you could create a table with a unique contraint on finish_time, INSERT a record with a null finish time to indicate the start (it will fail if there is already a record with a null finish time) then update the finish_time to NOW() when it completes.
However using the database to represent the state of a running task only makes sense when the task is running within a lossely coupled but highly available cluster - implying that the databse is also clustered. And the nature of the clustering (NDB, asych, semi-sync, multi-master) has a lot of impact on how this will behave in practice.
OTOH if that is not the case, then using the database to represent the state is the wrong way to solve the problem.

Yes, they can run at the same time.
If you want them to run one at a time SELECT
query should be changed to:
SELECT JOB_ID WHERE taks_id = $task_id LOCK IN SHARED MODE
In this case it uses a read lock.
This is the same whether you use NDB or InnoDB.

Related

How to resolve database deadlock issue caused by parallel goroutines using retry transaction? [duplicate]

I have a innoDB table which records online users. It gets updated on every page refresh by a user to keep track of which pages they are on and their last access date to the site. I then have a cron that runs every 15 minutes to DELETE old records.
I got a 'Deadlock found when trying to get lock; try restarting transaction' for about 5 minutes last night and it appears to be when running INSERTs into this table. Can someone suggest how to avoid this error?
=== EDIT ===
Here are the queries that are running:
First Visit to site:
INSERT INTO onlineusers SET
ip = 123.456.789.123,
datetime = now(),
userid = 321,
page = '/thispage',
area = 'thisarea',
type = 3
On each page refresh:
UPDATE onlineusers SET
ips = 123.456.789.123,
datetime = now(),
userid = 321,
page = '/thispage',
area = 'thisarea',
type = 3
WHERE id = 888
Cron every 15 minutes:
DELETE FROM onlineusers WHERE datetime <= now() - INTERVAL 900 SECOND
It then does some counts to log some stats (ie: members online, visitors online).
One easy trick that can help with most deadlocks is sorting the operations in a specific order.
You get a deadlock when two transactions are trying to lock two locks at opposite orders, ie:
connection 1: locks key(1), locks key(2);
connection 2: locks key(2), locks key(1);
If both run at the same time, connection 1 will lock key(1), connection 2 will lock key(2) and each connection will wait for the other to release the key -> deadlock.
Now, if you changed your queries such that the connections would lock the keys at the same order, ie:
connection 1: locks key(1), locks key(2);
connection 2: locks key(1), locks key(2);
it will be impossible to get a deadlock.
So this is what I suggest:
Make sure you have no other queries that lock access more than one key at a time except for the delete statement. if you do (and I suspect you do), order their WHERE in (k1,k2,..kn) in ascending order.
Fix your delete statement to work in ascending order:
Change
DELETE FROM onlineusers
WHERE datetime <= now() - INTERVAL 900 SECOND
To
DELETE FROM onlineusers
WHERE id IN (
SELECT id FROM onlineusers
WHERE datetime <= now() - INTERVAL 900 SECOND
ORDER BY id
) u;
Another thing to keep in mind is that MySQL documentation suggest that in case of a deadlock the client should retry automatically. you can add this logic to your client code. (Say, 3 retries on this particular error before giving up).
Deadlock happen when two transactions wait on each other to acquire a lock. Example:
Tx 1: lock A, then B
Tx 2: lock B, then A
There are numerous questions and answers about deadlocks. Each time you insert/update/or delete a row, a lock is acquired. To avoid deadlock, you must then make sure that concurrent transactions don't update row in an order that could result in a deadlock. Generally speaking, try to acquire lock always in the same order even in different transaction (e.g. always table A first, then table B).
Another reason for deadlock in database can be missing indexes. When a row is inserted/update/delete, the database needs to check the relational constraints, that is, make sure the relations are consistent. To do so, the database needs to check the foreign keys in the related tables. It might result in other lock being acquired than the row that is modified. Be sure then to always have index on the foreign keys (and of course primary keys), otherwise it could result in a table lock instead of a row lock. If table lock happen, the lock contention is higher and the likelihood of deadlock increases.
In case someone is still struggling with this issue:
I faced similar issue where 2 requests were hitting the server at the same time. There was no situation like below:
T1:
BEGIN TRANSACTION
INSERT TABLE A
INSERT TABLE B
END TRANSACTION
T2:
BEGIN TRANSACTION
INSERT TABLE B
INSERT TABLE A
END TRANSACTION
So, I was puzzled why deadlock is happening.
Then I found that there was parent child relation ship between 2 tables because of foreign key. When I was inserting a record in child table, the transaction was acquiring a lock on parent table's row. Immediately after that I was trying to update the parent row which was triggering elevation of lock to EXCLUSIVE one. As 2nd concurrent transaction was already holding a SHARED lock, it was causing deadlock.
Refer to: https://blog.tekenlight.com/2019/02/21/database-deadlock-mysql.html
It is likely that the delete statement will affect a large fraction of the total rows in the table. Eventually this might lead to a table lock being acquired when deleting. Holding on to a lock (in this case row- or page locks) and acquiring more locks is always a deadlock risk. However I can't explain why the insert statement leads to a lock escalation - it might have to do with page splitting/adding, but someone knowing MySQL better will have to fill in there.
For a start it can be worth trying to explicitly acquire a table lock right away for the delete statement. See LOCK TABLES and Table locking issues.
You might try having that delete job operate by first inserting the key of each row to be deleted into a temp table like this pseudocode
create temporary table deletetemp (userid int);
insert into deletetemp (userid)
select userid from onlineusers where datetime <= now - interval 900 second;
delete from onlineusers where userid in (select userid from deletetemp);
Breaking it up like this is less efficient but it avoids the need to hold a key-range lock during the delete.
Also, modify your select queries to add a where clause excluding rows older than 900 seconds. This avoids the dependency on the cron job and allows you to reschedule it to run less often.
Theory about the deadlocks: I don't have a lot of background in MySQL but here goes... The delete is going to hold a key-range lock for datetime, to prevent rows matching its where clause from being added in the middle of the transaction, and as it finds rows to delete it will attempt to acquire a lock on each page it is modifying. The insert is going to acquire a lock on the page it is inserting into, and then attempt to acquire the key lock. Normally the insert will wait patiently for that key lock to open up but this will deadlock if the delete tries to lock the same page the insert is using because thedelete needs that page lock and the insert needs that key lock. This doesn't seem right for inserts though, the delete and insert are using datetime ranges that don't overlap so maybe something else is going on.
http://dev.mysql.com/doc/refman/5.1/en/innodb-next-key-locking.html
For Java programmers using Spring, I've avoided this problem using an AOP aspect that automatically retries transactions that run into transient deadlocks.
See #RetryTransaction Javadoc for more info.
cron is dangerous. If one instance of cron fails to finish before the next is due, they are likely to fight each other.
It would be better to have a continuously running job that would delete some rows, sleep some, then repeat.
Also, INDEX(datetime) is very important for avoiding deadlocks.
But, if the datetime test includes more than, say, 20% of the table, the DELETE will do a table scan. Smaller chunks deleted more often is a workaround.
Another reason for going with smaller chunks is to lock fewer rows.
Bottom line:
INDEX(datetime)
Continually running task -- delete, sleep a minute, repeat.
To make sure that the above task has not died, have a cron job whose sole purpose is to restart it upon failure.
Other deletion techniques: http://mysql.rjweb.org/doc.php/deletebig
#Omry Yadan's answer ( https://stackoverflow.com/a/2423921/1810962 ) can be simplified by using ORDER BY.
Change
DELETE FROM onlineusers
WHERE datetime <= now() - INTERVAL 900 SECOND
to
DELETE FROM onlineusers
WHERE datetime <= now() - INTERVAL 900 SECOND
ORDER BY ID
to keep the order in which you delete items consistent. Also if you are doing multiple inserts in a single transaction, make sure they are also always ordered by id.
According to the mysql delete documentation:
If the ORDER BY clause is specified, the rows are deleted in the order that is specified.
You can find a reference here: https://dev.mysql.com/doc/refman/8.0/en/delete.html
I have a method, the internals of which are wrapped in a MySqlTransaction.
The deadlock issue showed up for me when I ran the same method in parallel with itself.
There was not an issue running a single instance of the method.
When I removed MySqlTransaction, I was able to run the method in parallel with itself with no issues.
Just sharing my experience, I'm not advocating anything.

Make sure the cron job won't do the same job twice

I've got a list of similar tasks in mysql database and a PHP-script that takes out 1 task at a time and executes it. When it's done it changes the flag from pending to done
I want to speed up my performance by adding more scripts(up to 20) running on the same database. How do i make sure these scripts won't be executing the same task twice, ie. processing the same row in the table
Thanks in advance!
One possible approach is:
You can change the datatype of flag column to ENUM type (if it is not already). It will have three possible Enum values: pending, in_process, done.
When selecting a pending task to do, do an explicit LOCK on the table; so that no other session can update it.
Code example:
LOCK TABLES tasks_table WRITE; -- locking the table for read/write
-- Selecting a pending task to do
SELECT * FROM tasks_table
WHERE flag = 'pending'
LIMIT 1;
-- In application code (PHP) - get the Primary key value of the selected task.
-- Now update the flag to in_process for the selected task
UPDATE tasks_table
SET flag = 'in_process'
WHERE primary_key_field = $selected_value;
At the end, do not forget to release the explicit lock.
Code:
-- Release the explicit Lock
UNLOCK TABLES;

Select only unlocked rows mysql

I have locked one row in one transaction by following query
START TRANSACTION;
SELECT id FROM children WHERE id=100 FOR UPDATE;
And in another transaction i have a query as below
START TRANSACTION;
SELECT id FROM children WHERE id IN (98,99,100) FOR UPDATE;
It gives error lock wait timeout exceeded.
Here 100 is already locked (in first transaction ) But the ids 98,99 are not locked.Is there any possibility return records of 98,99 if only 100 is row locked in above query.So result should be as below
Id
===
98
99
===
Id 100 should be ignored because 100 is locked by a transaction.
Looks like SKIP LOCKED option mentioned in a previous answer is now available in MySQL. It does not wait to acquire a row lock and allows you to work with rows that are not currently locked.
From MySQL 8.0.0 Release Notes/Changes in MySQL 8.0.1:
InnoDB now supports NOWAIT and SKIP LOCKED options with SELECT ... FOR SHARE and SELECT ... FOR UPDATE locking read statements. NOWAIT causes the statement to return immediately if a requested row is locked by another transaction. SKIP LOCKED removes locked rows from the result set. See Locking Read Concurrency with NOWAIT and SKIP LOCKED.
Sample usage (complete example with outputs can be found in the link above):
START TRANSACTION;
SELECT * FROM tableName FOR UPDATE SKIP LOCKED;
Also, it might be good to include the warning in the Reference Manual here as well:
Queries that skip locked rows return an inconsistent view of the data. SKIP LOCKED is therefore not suitable for general transactional work. However, it may be used to avoid lock contention when multiple sessions access the same queue-like table.
MySQL does not have a way to ignore locked rows in a SELECT. You'll have to find a different way to set a row aside as "already processed".
The simplest way is to lock the row briefly in the first query just to mark it as "already processed", then unlock it and lock it again for the rest of the processing - the second query will wait for the short "marker" query to complete, and you can add an explicit WHERE condition to ignore already-marked rows. If you don't want to rely on the first operation being able to complete successfully, you may need to add a bit more complexity with timestamps and such to clean up after those failed operations.
MySQL does not have this feature. For anyone searching for this topic in general, some RDBMS have better/smarter locking features than others.
For developers constrained to MySQL, the best approach is to add a column (or use an existing, e.g., status column) that can be set to "locked" or "in progress" or similar, execute a SELECT ID, * ... WHERE IN_PROGRESS != 1 FOR UPDATE; to get the row ID you want to lock, issue UPDATE .. SET IN_PROGRESS = 1 WHERE ID = XX to unlock the records.
Using LOCK IN SHARE MODE is almost never the solution because while it'll let you read the old value, but the old value is in the process of being updated so unless you are performing a non-atomic task, there's no point in even looking at that record.
Better* RDBMS recognize this pattern (select one row to work on and lock it, work on it, unlock it) and provide a smarter approach that lets you only search unlocked records. For example, PostgreSQL 9.5+ provide SELECT ... SKIP LOCKED which only selects from within the unlocked subset of rows matching the query. That lets you obtain an exclusive lock on a row, service that record to completion, then update & unlock the record in question without having to block other threads/consumers from being able to work independent of yourself.
*Here "better" means from the perspective of atomic updates, multi-consumer architecture, etc. and not necessarily "better designed" or "overall better." Not trying to start a flamewar here.
As per http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
The solution is to perform the SELECT in a locking mode using LOCK IN SHARE MODE:
SELECT * FROM parent WHERE NAME = 'Jones' LOCK IN SHARE MODE;

Read Lock on a row of a Database table

I am trying to build a job scheduler.I have a list of jobs to be executed on 2-3 different machines on time basis. So any machine can pick any job and will execute it if its next_execution_time < current_time. I am storing all the jobs in a database table and I am using SELECT.... FOR UPDATE query in SQL to select a job for execution.
But the problem with this approach is that, if a machine1 has picked a job, since there is only write lock, other machines will also select the same job for for execution, but can't execute as they will wait for the lock to be released or lock timeout will occur. So is there any way so that other machine skips this job and execute other jobs using SQL locks. No other column should be added in the database?
Flow is something like this :
select a job and lock it -> execute the job -> release the lock
I am using ruby-on-rails for developing this. If there is no-wait or set_lock_timeout = 0 in rails. it can probably solve the problem. If there exists ... what is the syntax?
Actually you have a simple way of doing this with your current table in mysql, you need to temporarily lock the table when selecting the next task. I'm assuming you have a column in the table to flag already started/done tasks, otherwise you can use the same column with the datetime to start the job to flag that it is already done/started:
lock tables jobs write;
select * from jobs where start_time < current_time and status = 'pending' order by start_time;
-- be carefull here to check for SQL errors in your code and run unlock tables if an exception is thrown or something like that
update jobs set status = 'started' where id = the_one_you_selected_adobe;
unlock tables;
And that's it, multiple concurrent threads/processes cab use the jobs table to execute tasks without having 2 threads running the same task.

Row lock for update status

I have a table of "commands to do" with a status ('toprocess', 'processing', 'done')
I have several instances (amazon ec2) with a daemon asking for "commands to do".
The daemon asks for rows with status 'toprocess', then it processes, and at the end of each loop it changes the status to 'done'.
The thing is that, before starting that loop, I need to change all rows 'toprocess' to status 'processing', so other instances will not take the same rows, avoiding conflict.
I've read about innodb row locks, but I don't understand them very well ...
SELECT * from commands where status = 'toprocess'
then I need to take the ID's of these results, and update status to 'processing' , locking these rows until they are updated.
How can i do it ?
Thank you
You'd use a transaction , and read the data with FOR UPDATE, which will block other selects that include the FOR UPDATE on the rows that gets selected
begin transaction;
select * from commands where status = 'toprocess' for update;
for each row in the result:
add the data to an array/list for processing later.
update commands set status='processing' where id = row.id;
commit;
process all the data
Read a bit about the FOR UPDATE , and InnoDB isolation levels.
A possible (yet not very elegant) solution may be to first UPDATE the record, then read its data:
Each deamon will have a unique ID, and the table will have a new column named 'owner' for that ID.
Then the deamon will run something like "UPDATE table SET status='processing', owner='theDeamonId' where status='toprocess' ... LIMIT 1"
While the update runs the row is locked, so no other deamon can read it.
After the update this row is Owned by a specific deamon, then it can run a SELECT to fetch all necessary data from that row (WHERE status='processing' AND owner= 'theDeamonId').
Finally, the last UPDATE will set the row to 'processed', and may (or may not) remove the owner field. Keeping it there will also enable some statistics about the deamons' work.
As far as I know you can't use MySQL to lock a row (using a built-in method). You have two options though:
If your table should not be read by any other process until the locks are released then you can use table level locking as described here
You can implement your own basic row locking by updating a value in each row you're processing, and then have all your other daemons checking whether this property is set (a BIT data type would suffice).
InnoDB locks at a row level for reading and updating anyway, but if you want to lock the rows for an arbitrary period then you may have to go with the second option.