I have a MySQL table of tasks to perform, each row having parameters for a single task.
There are many worker apps (possibly on different machines), performing tasks in a loop.
The apps access the database using MySQL's native C APIs.
In order to own a task, an app does something like that:
Generate a globally-unique id (for simplicity, let's say it is a number)
UPDATE tasks
SET guid = %d
WHERE guid = 0 LIMIT 1
SELECT params
FROM tasks
WHERE guid = %d
If the last query returns a row, we own it and have the parameters to run
Is there a way to achieve the same effect (i.e. 'own' a row and get its parameters) in a single call to the server?
try like this
UPDATE `lastid` SET `idnum` = (SELECT `id` FROM `history` ORDER BY `id` DESC LIMIT 1);
above code worked for me
You may create a procedure that does it:
CREATE PROCEDURE prc_get_task (in_guid BINARY(16), OUT out_params VARCHAR(200))
BEGIN
DECLARE task_id INT;
SELECT id, out_params
INTO task_id, out_params
FROM tasks
WHERE guid = 0
LIMIT 1
FOR UPDATE;
UPDATE task
SET guid = in_guid
WHERE id = task_id;
END;
BEGIN TRANSACTION;
CALL prc_get_task(#guid, #params);
COMMIT;
If you are looking for a single query then it can't happen. The UPDATE function specifically returns just the number of items that were updated. Similarly, the SELECT function doesn't alter a table, only return values.
Using a procedure will indeed turn it into a single function and it can be handy if locking is a concern for you. If your biggest concern is network traffic (ie: passing too many queries) then use the procedure. If you concern is server overload (ie: the DB is working too hard) then the extra overhead of a procedure could make things worse.
I have the exact same issue. We ended up using PostreSQL instead, and UPDATE ... RETURNING:
The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.
Example: UPDATE 'my_table' SET 'status' = 1 WHERE 'status' = 0 LIMIT 1 RETURNING *;
Or, in your case: UPDATE 'tasks' SET 'guid' = %d WHERE 'guid' = 0 LIMIT 1 RETURNING 'params';
Sorry, I know this doesn't answer the question with MySQL, and it might not be easy to just switch to PostgreSQL, but it's the best way we've found to do it. Even 6 years later, MySQL still doesn't support UPDATE ... RETURNING. It might be added at some point in the future, but for now MariaDB only has it for DELETE statements.
Edit: There is a task (low priority) to add UPDATE ... RETURNING support to MariaDB.
I don't know about the single call part, but what you're describing is a lock. Locks are an essential element of relational databases.
I don't know the specifics of locking a row, reading it, and then updating it in MySQL, but with a bit of reading of the mysql lock documentation you could do all kinds of lock-based manipulations.
The postgres documenation of locks has a great example describing exactly what you want to do: lock the table, read the table, modify the table.
UPDATE tasks
SET guid = %d, params = #params := params
WHERE guid = 0 LIMIT 1;
It will return 1 or 0, depending on whether the values were effectively changed.
SELECT #params AS params;
This one just selects the variable from the connection.
From: here
Related
I am building a "poor man's queuing system" using MySQL. It's a single table containing jobs that need to be executed (the table name is queue). I have several processes on multiple machines whose job it is to call the fetch_next2 sproc to get an item off of the queue.
The whole point of this procedure is to make sure that we never let 2 clients get the same job. I thought that by using the SELECT .. LIMIT 1 FOR UPDATE would allow me to lock a single row so that I could be sure it was only updated by 1 caller (updated such that it no longer fit the criteria of the SELECT being used to filter jobs that are "READY" to be processed).
Can anyone tell me what I'm doing wrong? I just had some instances where the same job was given to 2 different processes so I know it doesn't work properly. :)
CREATE DEFINER=`masteruser`#`%` PROCEDURE `fetch_next2`()
BEGIN
SET #id = (SELECT q.Id FROM queue q WHERE q.State = 'READY' LIMIT 1 FOR UPDATE);
UPDATE queue
SET State = 'PROCESSING', Attempts = Attempts + 1
WHERE Id = #id;
SELECT Id, Payload
FROM queue
WHERE Id = #id;
END
Code for the answer:
CREATE DEFINER=`masteruser`#`%` PROCEDURE `fetch_next2`()
BEGIN
SET #id := 0;
UPDATE queue SET State='PROCESSING', Id=(SELECT #id := Id) WHERE State='READY' LIMIT 1;
#You can do an if #id!=0 here
SELECT Id, Payload
FROM queue
WHERE Id = #id;
END
The problem with what you are doing is that there is no atomic grouping for the operations. You are using the SELECT ... FOR UPDATE syntax. The Docs say that it blocks "from reading the data in certain transaction isolation levels". But not all levels (I think). Between your first SELECT and UPDATE, another SELECT can occur from another thread. Are you using MyISAM or InnoDB? MyISAM might not support it.
The easiest way to make sure this works properly is to lock the table.
[Edit] The method I describe right here is more time consuming than using the Id=(SELECT #id := Id) method in the above code.
Another method would be to do the following:
Have a column that is normally set to 0.
Do an "UPDATE ... SET ColName=UNIQ_ID WHERE ColName=0 LIMIT 1. That will make sure only 1 process can update that row, and then get it via a SELECT afterwards. (UNIQ_ID is not a MySQL feature, just a variable)
If you need a unique ID, you can use a table with auto_increment just for that.
You can also kind of do this with transactions. If you start a transaction on a table, run UPDATE foobar SET LockVar=19 WHERE LockVar=0 LIMIT 1; from one thread, and do the exact same thing on another thread, the second thread will wait for the first thread to commit before it gets its row. That may end up being a complete table blocking operation though.
I'm experiencing a race condition because I'm dealing with a lot of concurrency.
I'm trying to combine these two mysql statements to execute at the same time.
I need to select a row and update the same one...
SELECT id_file FROM filenames WHERE pending=1 LIMIT 1;
UPDATE filenames SET pending=2 WHERE id_file=**id of select query**;
Another solution to the race-condition I'm experiencing would be to perform an UPDATE query where pending=1 and somehow get the ID of the updated row, but I'm not sure if that's even possible?
Thanks
To deal with concurrency is one of the basic functions of transactions.
Wrap your queries into one transaction and tell the DBMS, that you need the row not to change in between with FOR UPDATE:
BEGIN;
SELECT id_file FROM filenames WHERE pending=1 LIMIT 1 FOR UPDATE;
# do whatever you like
UPDATE filenames SET pending=2 WHERE id_file=**id of select query**;
COMMIT;
You can execute these statements with 4 mysqli_query calls, and do whatever you want in between, without need to worry about the consistency of your database. The selected row is save until you release it.
You can avoid the "race" condition by performing just an UPDATE statement on the table, allow that to identify the row to modified, and then subsequently retrieve values of columns from the row.
There's a "trick" returning values of columns, in your case, the value of the id_file column from the row that was just updated. You can use either the LAST_INSERT_ID() function (only if the column is integer type), or a MySQL user-defined variable.
If the value of the column you want to retrieve is integer, you can use LAST_INSERT_ID() function (which supports a BIGINT-64 value).
For example:
UPDATE filenames
SET pending = 2
, id_file = LAST_INSERT_ID(id_file)
WHERE pending = 1
LIMIT 1;
Following the successful execution of the UPDATE statement, you'll want to verify that at least one row was affected. (If any rows satisfied the WHERE, and the statement succeeded, we know that one row will be affected. Then you can retrieve that value, in the same session:
SELECT LAST_INSERT_ID();
to retrieve the value of id_file column of the last row processed by the UPDATE statement. Note that if the UPDATE processes multiple rows, only the value of last row that was processed by the UPDATE will be available. (But that won't be an issue for you, since there's a LIMIT 1 clause.)
Again, you'll want to ensure that a row was actually updated, before you rely on the value returned by the LAST_INSERT_ID() function.
For non-integer columns, you can use a MySQL user-defined variable in a similar way, assigning the value of the column to a user-defined variable, and then immediately retrieve the value stored in the user-defined variable.
-- initialize user-defined variable, to "clear" any previous value
SELECT #id_file := NULL;
-- save value of id_file column into user-defined variable
UPDATE filenames
SET pending = 2
, id_file = (SELECT #id_file := id_file)
WHERE pending = 1
LIMIT 1;
-- retrieve value stored in user-defined variable
SELECT #id_file;
Note that the value of this variable is maintained within the session. If the UPDATE statement doesn't find any rows that satisfy the predicate (WHERE clause), the value of the user-defined variable will be unaffected... so, to make sure you don't inadvertently get an "old" value, you may want to first initialize that variable with a NULL.
Note that it's important that a subsequently fired trigger doesn't modify the value of that user defined variable. (The user-defined variable is "in scope" in the current session.)
It's also possible to do the assignment to the user-defined variable within in a trigger, but I'm not going to demonstrate that, and I would not recommend you do it in a trigger.
I have to read 460,000 records from one database and update those records in another database. Currently, I read all of the records in (select * from...) and then loop through them sending an update command to the second database for each record. This process is slower than I hoped and I was wondering if there is a faster way. I match up the records by the one column that is indexed (primary key) in the table.
Thanks.
I would probably optimize the fetch size for reads (e.g. setFetchSize(250)) and JDBC - Batch Processing for writes (e.g. a batch size of 250 records).
I am assuming your "other database" is on a separate server, so can't just be directly joined.
The key is to have fewer update statements. It can often be faster to insert your data into a new table like this:
create table updatevalues ( id int(11), a int(11), b int(11), c int(11) );
insert into updatevalues (id,a,b,c) values (1,1,2,3),(2,4,5,6),(3,7,8,9),...
update updatevalues u inner join targettable t using (id) set t.a=u.a,t.b=u.b,t.c=u.c;
drop table updatevalues;
(batching the inserts into however many statements you can fit in however big your maximum size is configured at, usually in the megabytes).
Alternatively, find unique values and update them together:
update targettable set a=42 where id in (1,3,7);
update targettable set a=97 where id in (2,5);
...
update targettable set b=1 where id in (1,7);
...
1. USE MULTI QUERY
aha. 'another db' means remote database.. in this case you SHOULD reduce number of interaction with remote DB. I suggest that use MULTIPLE QUERY. e.g to execute 1,000 UPDATE at once,
$cnt = 1;
for ($row in $rows)
{
$multi_query .= "UPDATE ..;";
if ($cnt % 1000 == 0)
{
mysql_query($multi_query);
$cnt = 0;
$multi_query = "";
}
++$cnt;
}
Normally Multi query feature is disable (for security reason), To use Multi query
PHP : http://www.php.net/manual/en/mysqli.quickstart.multiple-statement.php
C API : http://dev.mysql.com/doc/refman/5.0/en/c-api-multiple-queries.html
VB : http://www.devart.com/dotconnect/mysql/docs/MultiQuery.html (I'm not a VB user, so not sure this is for MULTI Query for VB)
2. USE Prepared Statement
(When you are already using prepared stmt. skip this)
You are running 460K same structured Queries. So If you use PREPARED STATEMENT, you can obtain two advantages.
Reduce query compile time
without prepared stmt. All queries are compiled, but just one time with prepared stmt.
Reduce Network Cost
Assuming each UPDATE query is 100 bytes long, and there are 4 parameters (each is 4 bytes long)
without prepare stmt : 100 bytes * 460K = 46M
with prepare stmt : 16 bytes * 460K = 7.3M
it doesn't reduce dramatically
Here is how to use prepared statement in VB.
What I ended up doing was using a loop to concatenate my queries together. So instead of sending one query at a time, I would send a group at a time separated by semicolons:
update sometable set x=1 where y =2; update sometable set x = 5 where y = 6; etc...
This ended up improving my time by about 40%. My update went from 3 min 23 secs to 2 min 1 second.
But there was a threshold, where concatenating too many together started to slow it down again when the string got too long. So I had to tweak it until I found just the right mix. It ended up being 100 strings concatenated together that gave the best performance.
Thanks for the responses.
I have a need for unique identifiers in my application. To that end, I created a table in my database that only contains 1 column 'unique_id" (BIGINT) and 1 row.
The idea is to use a stored procedure to get the next identifier when I need it. I figured a 1-line operation like this would do the job:
UPDATE identifier_table SET unique_id = unique_id + 1 OUTPUT INSERTED.unique_id
Can someone confirm if this operation is atomic, or do I need to setup a lock on the table?
Thanks!
It is atomic. It is just a single update statement, and will have no problem at all with concurrency since that will be managed by the engine with update locks. You can use OUTPUT as shown, or you can do something like this:
DECLARE #unique_id bigint;
UPDATE identifier_table
SET
#unique_id = unique_id + 1,
unique_id = unique_id + 1;
SELECT #unique_id uniqueid;
If you make #unique_id an OUTPUT parameter, then you can get the value without a select statement or use it easily in another stored procedure.
Is there any way to select a record and update it in a single query?
I tried this:
UPDATE arrc_Voucher
SET ActivatedDT = now()
WHERE (SELECT VoucherNbr, VoucherID
FROM arrc_Voucher
WHERE ActivatedDT IS NULL
AND BalanceInit IS NULL
AND TypeFlag = 'V'
LIMIT 1 )
which I hoped would run the select query and grab the first record that matches the where clause, the update the ActivatedDT field in that record, but I got the following error:
1241 - Operand should contain 1 column(s)
Any ideas?
How about:
UPDATE arrc_Voucher
SET ActivatedDT = NOW()
WHERE ActivatedDT IS NULL
AND BalanceInit IS NULL
AND TypeFlag = 'V'
LIMIT 1;
From the MySQL API documentation :
UPDATE returns the number of rows that were actually changed
You cannot select a row and update it at the same time, you will need to perform two queries to achieve it; fetch your record, then update it.
If you are worrying about concurrent processes accessing the same row through some kind of race condition (supposing your use case involve high traffic), you may consider other alternatives such as locking the table (note that other processes will need to recover--retry--if the table is locked while accessing it)
Or if you can create stored procedure, you may want to read this article or the MySQL API documentation.
But about 99% of the time, this is not necessary and the two queries will execute without any problem.