I am using Gearman workers to accept jobs from a MySQL database. The procedure for these workers getting a job is something like:
SELECT foo, bar FROM jobs WHERE job_id = 'foobarbaz' AND status = 'WAITING';
Now, if that query returns one row, then the worker knows its been given a valid job and proceeds to work on it. But of course the risk is while working on it another worker might take up the job, too.
To prevent this I'm thinking how can I atomically SELECT the data needed to proceed with the job as well as update the status of it if it was valid? I thought perhaps of doing the UPDATE on the row status with the job ID and then testing affected rows, but wasn't sure if there was a more sensible way to go about it.
Thanks.
Related
I'm using an Aurora DB (ie MySQL version 5.6.10) as a queue, and I'm using a stored procedure to pull records out of a table in batches. The sproc works with the following steps...
Select the next batch of data into a temptable
Write the IDs from the records from the temp table into to a log table
Output the records
Once a record has been added to the log, the sproc won't select it again next time it's called, so multiple servers can call this sproc, and both deal with batches of data from the queue without stepping on each others toes.
The sproc runs in a fraction of a second, but my company is now spinning up servers automatically, and these cloned servers are calling the sproc at exactly the same time, and the result is the same records are being selected twice
Is there a way I can make this sproc be limited to one call at a time? Ideally, any additional calls should wait until the first call is finished, and then they can run
Unfortunately, I have very little experience working with MySQL, so I'm not really sure where to start. I'd much appreciate it if anyone could point me in the right direction
This is a job for MySQL table locking. Try something like this. (You didn't show us your queries so there's a lot of guesswork here.)
SET autocommit = 0;
LOCK TABLES logtable WRITE;
CREATE TEMPORARY TABLE temptable AS
SELECT whatever FROM whatevertable FOR UPDATE;
INSERT INTO logtable (id)
SELECT id FROM temptable;
COMMIT;
UNLOCK TABLES;
If more than one connection tries to run this sequence concurrently, one will wait for the other's UNLOCK TABLES; to proceed. You say your SP is quick, so probably nobody will notice the short wait.
Pro tip: When you have the same timed code running on lots of servers, it's best to put in a short random delay before running the job. That way the shared resources (like your MySQL database) won't get hammered by a whole lot of requests precisely timed to be simultaneous.
I'm not sure if this is an issue with phpMyAdmin, or that I'm not fully understanding how transactions work, but I want to be able to step through a series of queries within a transaction, and either ROLLBACK or COMMIT based on the returned results. I'm using the InnoDB storage engine.
Here's a basic example;
START TRANSACTION;
UPDATE students
SET lastname = "jones"
WHERE studentid = 1;
SELECT * FROM students;
ROLLBACK;
As a single query, this works entirely fine, and if I'm happy with the results, I could re-run the entire query with COMMIT.
However, if all these queries can be ran seperately, why does phpMyAdmin lose the transaction?
For example, if I do this;
START TRANSACTION;
UPDATE students
SET lastname = "jones"
WHERE studentid = 1;
SELECT * FROM students;
Then this;
COMMIT;
SELECT * FROM students;
The update I made in the transaction is lost, and lastname retains its original value, as if the update never took place. I was under the impression that transactions can span multiple queries, and I've seen a couple of examples of this;
1: Entirely possible in Navicat, a different IDE
2: Also possible in PHP via MySQLi
Why then am I losing the transaction in phpMyAdmin, if transactions are able to span multiple individual queries?
Edit 1: After doing a bit of digging, it appears that there are two other ways a transaction can be implicitly ended in MySQL;
Disconnecting a client session will implicitly end the current
transaction. Changes will be rolled back.
Killing a client session will implicitly end the current
transaction. Changes will be rolled back.
Is it possible that phpMyAdmin is ending the client session after Go is hit and a query is submitted?
Edit 2:
Just to confirm this is just a phpMyAdmin-specific issue, I ran the same query across multiple seperate queries in MySQL Workbench, and it worked exactly as intended, retaining the transaction, so it appears to be a failure on phpMyAdmin's part.
Is it possible that phpMyAdmin is ending the client session after Go is hit and a query is submitted?
That is pretty much how PHP works. You send the request, it get's processed, and once done, everything (including MySQL connections) gets thrown away. With next request, you start afresh.
There is a feature called persistent connections, but that is as well doing it's clean up. Otherwise the code would have to somehow handle giving the same user the same connection. Which could prove very difficult given the way PHP works.
My application accesses a local DB where it inserts records into a table (+- 30-40 million a day). I have processes that run and process data and do these inserts. Part of the process involves selecting an id from an IDs table which is unique and this is done using a simple
Begin Transaction
Select top 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
delete siteids where siteid = #id
Commit Transaction
I then immediately delete that id with a separate statement from that very table so that no other process grabs it. This is causing tremendous timeout issues and with only 4 processes accessing it, I am surprised though. I also get timeout issues when checking my main post table to see if a record was inserted using the above id. It runs fast but with all the deadlocks and timeouts I think this indicates poor design and is a recipe for disaster.
Any advice?
EDIT
this is the actual statement that someone else here helped with. I then removed the delete and included it in my code as a separately executed statement. Will the order by clause really help here?
I have a mysql table in which I store jobs to be processed. mainly text fields of raw data the will take around a minute each to process.
I have 2 servers pulling data from that table processing it then deleting.
To manage the job allocation between the 2 servers I am currently using amazon SQS. I store all the row IDS that need processing in SQS, the worker servers poll SQS to get new rows to work on.
The system currently works but SQS adds a layer of complexity and costs that I feel are overkill to achieve what I am doing.
I am trying to implement the same thing without SQS and was wondering if there is any way to read lock a row so that if one worker is working on one row, no other worker can select that row. Or if there's any better way to do it.
A simple workaround: add one more column to your jobs table, is_taken_by INT.
Then in your worker you do something like this:
select job_id from jobs where is_taken_by is null limit 1 for update;
update jobs set is_taken_by = worker_pid where id = job_id;
SELECT ... FOR UPDATE sets exclusive locks on rows it reads. This way you ensure that no other worker can take the same job.
Note: you have to run those two lines in an explicit transaction.
Locking of rows for update using SELECT FOR UPDATE only applies when autocommit is disabled (either by beginning transaction with START TRANSACTION or by setting autocommit to 0. If autocommit is enabled, the rows matching the specification are not locked.
I have a table that contains 2.5 million rows, each row has one column of type xml. All records should be deleted and enqueued in a sqlserver service broker queue when a message arrives in another queue (triggerqueue). Performance is very important and now it's too slow. What would be the best way to achieve this?
currently we use an activated sp on the triggerqueue which does in a while(#message <> null) loop:
begin transaction
delete top (1) from table output #tempTable
select top 1 #message = message from #tempTable
send on conversation #message
commit transaction
are there faster ways to tackle this problem?
By the way: before someone asks: we need to start from the table, because it is filled with the output from an earlier calculated merge statement
So your performance problem is on the send side rather than receive side, right? (it's a bit unclear from your question). In this case, you'll want to start with trying:
Batch many operations in a single transaction. You're most likely getting hit the most by synchronous log flushes at commit time.
Try processing the table more efficiently (e.g. select more rows at once into the temp table and then use cursors to iterate over it and send messages)
In case you're experiencing problems on the receive side, take a look at this great article by Remus.