Make sure the cron job won't do the same job twice - mysql

I've got a list of similar tasks in mysql database and a PHP-script that takes out 1 task at a time and executes it. When it's done it changes the flag from pending to done
I want to speed up my performance by adding more scripts(up to 20) running on the same database. How do i make sure these scripts won't be executing the same task twice, ie. processing the same row in the table
Thanks in advance!

One possible approach is:
You can change the datatype of flag column to ENUM type (if it is not already). It will have three possible Enum values: pending, in_process, done.
When selecting a pending task to do, do an explicit LOCK on the table; so that no other session can update it.
Code example:
LOCK TABLES tasks_table WRITE; -- locking the table for read/write
-- Selecting a pending task to do
SELECT * FROM tasks_table
WHERE flag = 'pending'
LIMIT 1;
-- In application code (PHP) - get the Primary key value of the selected task.
-- Now update the flag to in_process for the selected task
UPDATE tasks_table
SET flag = 'in_process'
WHERE primary_key_field = $selected_value;
At the end, do not forget to release the explicit lock.
Code:
-- Release the explicit Lock
UNLOCK TABLES;

Related

How to manage single thread job using mysql database?

I need to run a php a script , I want to make sure there is no more than one script running at the same time .
I am using mysql , and I though about this solution :
I build the bellow database :
job_id | task_id | last_updated_time (AUTO UPDATE)
"sending_emails" 77238 2107-5-3 12:2:2
Before running the script I create random task id , then I run a query to update the task_id .
$task_id = generate_random_task_id();
$query = "
UPDATE
jobs
SET
task_id = $task_id
WHERE
task_id = $task_id
OR
NOW() - last_updated_time > 30
LIMIT 1
"
/*
Then I need to check if there was an update, if yes then I will run the script otherwise i will stop since there is already another script running
*/
$query = "SELECT JOB_ID WHERE taks_id = $task_id "
$result = run($query)
if( ! isset($result[JOB_ID])){
DIE();
}
is there any chance that two scripts run at the same time ?
No, they can't run at the same time, here's MySQL's documentation about UPDATE and SELECT, this is what it says:
UPDATE ... WHERE ... sets an exclusive next-key lock on every record
the search encounters. However, only an index record lock is required
for statements that lock rows using a unique index to search for a
unique row.
Here's more about Shared and Exclusive locks:
A shared (S) lock permits the transaction that holds the lock to read
a row.
An exclusive (X) lock permits the transaction that holds the lock to
update or delete a row.
If a transaction T1 holds an exclusive (X) lock on row r, a request
from some distinct transaction T2 for a lock of either type on r
cannot be granted immediately. Instead, transaction T2 has to wait for
transaction T1 to release its lock on row r.
Yes there's every chance you could run the same task again.
There are two obvious solutions.
One is to open a mysql connection then acquire a lock using GET_LOCK() using a short timeout - if you acquire the lock then you're good to go. You need to maintain the db connection for the lifetime of the script.
Alternatively you could create a table with a unique contraint on finish_time, INSERT a record with a null finish time to indicate the start (it will fail if there is already a record with a null finish time) then update the finish_time to NOW() when it completes.
However using the database to represent the state of a running task only makes sense when the task is running within a lossely coupled but highly available cluster - implying that the databse is also clustered. And the nature of the clustering (NDB, asych, semi-sync, multi-master) has a lot of impact on how this will behave in practice.
OTOH if that is not the case, then using the database to represent the state is the wrong way to solve the problem.
Yes, they can run at the same time.
If you want them to run one at a time SELECT
query should be changed to:
SELECT JOB_ID WHERE taks_id = $task_id LOCK IN SHARED MODE
In this case it uses a read lock.
This is the same whether you use NDB or InnoDB.

Read Lock on a row of a Database table

I am trying to build a job scheduler.I have a list of jobs to be executed on 2-3 different machines on time basis. So any machine can pick any job and will execute it if its next_execution_time < current_time. I am storing all the jobs in a database table and I am using SELECT.... FOR UPDATE query in SQL to select a job for execution.
But the problem with this approach is that, if a machine1 has picked a job, since there is only write lock, other machines will also select the same job for for execution, but can't execute as they will wait for the lock to be released or lock timeout will occur. So is there any way so that other machine skips this job and execute other jobs using SQL locks. No other column should be added in the database?
Flow is something like this :
select a job and lock it -> execute the job -> release the lock
I am using ruby-on-rails for developing this. If there is no-wait or set_lock_timeout = 0 in rails. it can probably solve the problem. If there exists ... what is the syntax?
Actually you have a simple way of doing this with your current table in mysql, you need to temporarily lock the table when selecting the next task. I'm assuming you have a column in the table to flag already started/done tasks, otherwise you can use the same column with the datetime to start the job to flag that it is already done/started:
lock tables jobs write;
select * from jobs where start_time < current_time and status = 'pending' order by start_time;
-- be carefull here to check for SQL errors in your code and run unlock tables if an exception is thrown or something like that
update jobs set status = 'started' where id = the_one_you_selected_adobe;
unlock tables;
And that's it, multiple concurrent threads/processes cab use the jobs table to execute tasks without having 2 threads running the same task.

application setup to avoid timeouts and deadlocks in SQL server DB table

My application accesses a local DB where it inserts records into a table (+- 30-40 million a day). I have processes that run and process data and do these inserts. Part of the process involves selecting an id from an IDs table which is unique and this is done using a simple
Begin Transaction
Select top 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
delete siteids where siteid = #id
Commit Transaction
I then immediately delete that id with a separate statement from that very table so that no other process grabs it. This is causing tremendous timeout issues and with only 4 processes accessing it, I am surprised though. I also get timeout issues when checking my main post table to see if a record was inserted using the above id. It runs fast but with all the deadlocks and timeouts I think this indicates poor design and is a recipe for disaster.
Any advice?
EDIT
this is the actual statement that someone else here helped with. I then removed the delete and included it in my code as a separately executed statement. Will the order by clause really help here?

Simulating the execution of a stored procedure by multiple users

I have this trigger in SQL Server
ALTER TRIGGER [dbo].[myTrigger]
ON [dbo].[Data]
AFTER INSERT
AS
BEGIN
declare #number int
begin transaction
select top 1 #number = NextNumber FROM settings
Update Settings
set NextNumber = NextNumber + 1
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
commit transaction
END
It works as expected but I wonder if it will work as expected when multiple users add a new row in the table Data at the same time?
Let's analyze this code for a minute:
begin transaction
You begin a transaction using the default READCOMMITTED setting.
select top 1 #number = NextNumber FROM settings
You're selecting the highest number from the Settings table (btw: you should by all means add an ORDER BY clause - otherwise, no ordering is guaranteed! You might get unexpected results here).
This operation however isn't blocking - two or more threads can read the same value of e.g. 100 at the same time - the SELECT only takes a shared lock for a very brief period of time, and shared locks are compatible - multiple readers can read the value simultaneously.
Update Settings
set NextNumber = NextNumber + 1
Now here, one thread gets the green light and writes back the new value - 101 in our example - to the table. The table has an UPDATE lock (later escalated to an exclusive lock) which is exclusive - only one thread can write at the same time
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
Same thing - that one lucky thread gets to update the Data table, sets number to 100 and that table's row(s) it's updating are locked until the end of the transaction.
commit transaction
Now that lucky thread commits his transaction and is done.
HOWEVER: that second (and possibly third, fourth, fifth .....) thread that had read the same original value of 100 is still "in the loops" - now that thread #1 has completed, a second one of those threads gets to do its thing - which it does. It updates the Settings table correctly, to a new value of 102, and goes on doing its second update to the Data table, here using the "current" value of 100 that it has read into its #number variable, too....
In the end, you might have multiple threads that all read the same original value (100) from the Settings table), and each one of those will update the Settings table to the same "new" value (101).
This method you're using here is not safe under load.
Possible solutions:
first and foremost - the recommended way to do this: let the database handle this itself, by using a INT IDENTITY column in your table (or if you're already using SQL Server 2012 - use a SEQUENCE object to handle all the synchronization)
if you cannot do this - for whatever reasons - then at least make sure your code works even on a busy system! You need to e.g. use SELECT .... WITH (UPDLOCK) to put an (exclusive) UPDATE lock on the Settigns table when the first thread comes and reads the current value - that'll block all other threads from even reading the "current" value until the first thread has completed. Or there are alternatives like updating and assigning the old value in a single UPDATE operation.
Simulating the execution of a stored procedure by multiple users
You can use two (or more) edit windows in SQL Server Management Studio and execute something like this simultaneously in each window.
insert into Data(ColName) values ('Value')
go 10000
go 10000 will execute the batch 10000 times. Adjust that to whatever value you think is appropriate.

Row lock for update status

I have a table of "commands to do" with a status ('toprocess', 'processing', 'done')
I have several instances (amazon ec2) with a daemon asking for "commands to do".
The daemon asks for rows with status 'toprocess', then it processes, and at the end of each loop it changes the status to 'done'.
The thing is that, before starting that loop, I need to change all rows 'toprocess' to status 'processing', so other instances will not take the same rows, avoiding conflict.
I've read about innodb row locks, but I don't understand them very well ...
SELECT * from commands where status = 'toprocess'
then I need to take the ID's of these results, and update status to 'processing' , locking these rows until they are updated.
How can i do it ?
Thank you
You'd use a transaction , and read the data with FOR UPDATE, which will block other selects that include the FOR UPDATE on the rows that gets selected
begin transaction;
select * from commands where status = 'toprocess' for update;
for each row in the result:
add the data to an array/list for processing later.
update commands set status='processing' where id = row.id;
commit;
process all the data
Read a bit about the FOR UPDATE , and InnoDB isolation levels.
A possible (yet not very elegant) solution may be to first UPDATE the record, then read its data:
Each deamon will have a unique ID, and the table will have a new column named 'owner' for that ID.
Then the deamon will run something like "UPDATE table SET status='processing', owner='theDeamonId' where status='toprocess' ... LIMIT 1"
While the update runs the row is locked, so no other deamon can read it.
After the update this row is Owned by a specific deamon, then it can run a SELECT to fetch all necessary data from that row (WHERE status='processing' AND owner= 'theDeamonId').
Finally, the last UPDATE will set the row to 'processed', and may (or may not) remove the owner field. Keeping it there will also enable some statistics about the deamons' work.
As far as I know you can't use MySQL to lock a row (using a built-in method). You have two options though:
If your table should not be read by any other process until the locks are released then you can use table level locking as described here
You can implement your own basic row locking by updating a value in each row you're processing, and then have all your other daemons checking whether this property is set (a BIT data type would suffice).
InnoDB locks at a row level for reading and updating anyway, but if you want to lock the rows for an arbitrary period then you may have to go with the second option.