I ran into a problem and can't choose the right solution.
I have a SELECT query that selects records from table.
These records has an status column as seen below.
SELECT id, <...>, status FROM table WHERE something
Now, right after this SELECT I have to UPDATE the status column.
How can I do it to avoid a race condition?
What I want to achieve is once somebody (session) selected something, this something cannot be selected by anybody else until I do not release it manually (for example using a status column).
Thoughts?
There is some mysql documentation, thar may be interesting to solve your task, not sure if it fit you needs, but it describes right way to do select followed by update.
The technique described does not prevent other sessions reading, but prevent writing of selected record until the end of transaction.
It contains an example similar to your problem:
SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;
It is required that you tables use Innodb engine and your programs use transactions.
If you need locking only for short time, i.e. one session select row with lock, update it, and release lock in one session, then you do not need field status at all, just use select ... for update and select ... lock in share mode so if all sessions will use these two with conjunction with transactions select... for update then update to modify, and select ... with shared lock to just read - this will solve your requirements.
If you need to lock for long time, select and lock in one session and then update and release in another, then right you use some storage to keep lock statuses and all session should use as described below: select ... for update and set status and status owner in one session, then in another session select for update check status and owner, update and remove status - for updating scenario, and for read scenario: select ... with shared lock check status.
You can do it with some preparations. Add a column sessionId to your table. It has to be NULL-able and it will contain the unique ID of the session that acquires the row. Also add an index on this new column; we'll use the column to search for rows in the table.
ALTER TABLE `tbl`
ADD COLUMN `sessionId` CHAR(32) DEFAULT NULL,
ADD INDEX `sessionId`(`sessionId`)
When a session needs to acquire some rows (based on some criteria) run:
UPDATE `tbl`
SET `sessionId` = 'aaa'
WHERE `sessionId` IS NULL
AND ...
LIMIT bbb
Replace aaa with the current session ID and ... with the conditions you need to select the correct rows. Replace bbb with the number of rows you need to acquire. Add an ORDER BY clause if you need to process the rows in a certain order (if some of them have higher priority than others). You can also add status = ... in the UPDATE clause to change the status of the acquired rows (to pending f.e.) to let other instances of the code know those rows are processed right now.
The query above acquires some rows. Next, run:
SELECT *
FROM `tbl`
WHERE `sessionId` = 'aaa'
This query gets the acquired rows to be processed in the client code.
After each row is processed, you either DELETE the row or UPDATE it and set sessionId to NULL (release the row) and status to reflect its new status.
Also you should release the rows (using the same procedure as above) when the session is closed.
Related
I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down
It is unclear to me (by reading MySQL docs) if the following query ran on INNODB tables on MySQL 5.1, would create WRITE LOCK for each of the rows the db updates internally (5000 in total) or LOCK all the rows in the batch. As the database has really heavy load, this is very important.
UPDATE `records`
INNER JOIN (
SELECT id, name FROM related LIMIT 0, 5000
) AS `j` ON `j`.`id` = `records`.`id`
SET `name` = `j`.`name`
I'd expect it to be per row but as I do not know a way to make sure it is so, I decided to ask someone with deeper knowledge. If this is not the case and the db would LOCK all the rows in the set, I'd be thankful if you give me explanation why.
The UPDATE is running in transaction - it's an atomic operation, which means that if one of the rows fails (because of unique constrain for example) it won't update any of the 5000 rows. This is one of the ACID properties of a transactional database.
Because of this the UPDATE hold a lock on all of the rows for the entire transaction. Otherwise another transaction can further update the value of a row, based on it's current value (let's say update records set value = value * '2'). This statement should produce different result depending if the first transaction commits or rollbacks. Because of this it should wait for the first transaction to complete all 5000 updates.
If you want to release the locks, just do the update in (smaller) batches.
P.S. autocommit controls if each statement is issued in own transaction, but does not effect the execution of a single query
My question is similar to:
Ignoring locked row in a MySQL query
except that I have already implemented a logic close to what's suggested in the accepted answer. My question is how to set the process id initially. All servers run a query like (the code is in ruby on rails but the resulting mysql query is):
UPDATE (some_table) SET process_id=(some process_id) WHERE (some condition on row_1) AND process_id is null ORDER BY (row_1) LIMIT 100
Now what happens is all processes try to update the same rows, they get locked and they timeout waiting for the lock. I would like the servers to ignore the rows that are locked (because after the lock is released the process_id won't be null anymore so there is no point for locking here).
I could try to randomize the batch of records to update but the problem is I want to prioritize the update based on row_1 as in the query above.
So my question is, is there a way in mysql to check if a record is locked and ignore it if it is?
No, there is no way to ignore already-locked rows. Your best bet will be to ensure that nothing locks any row for any extended period of time. That will ensure that any lock conflicts are very short in duration. That will generally mean "advisory" locking of rows by locking them within a transaction (using FOR UPDATE) and updating the row to mark it as "locked".
For example, first you want to find your candidate row(s) without locking anything:
SELECT id FROM t WHERE lock_expires IS NULL AND lock_holder IS NULL <some other conditions>;
Now lock only the row you want, very quickly:
START TRANSACTION;
SELECT * FROM t WHERE id = <id> AND lock_expires IS NULL AND lock_holder IS NULL;
UPDATE t SET lock_expires = <some time>, lock_holder = <me> WHERE id = <id>;
COMMIT;
(Technical note: If you are planning to lock multiple rows, always lock them in a specific order. Ascending order by primary key is a decent choice. Locking out-of-order or in random order will subject your program to deadlocks from competing processes.)
Now you can take as long as you want (less than lock_expires) to process your row(s) without blocking any other process (they won't match the row during the non-locking select, so will always ignore it). Once the row is processed, you can UPDATE or DELETE it by id, also without blocking anything.
I have innodb table read by lot of different instances (cloud)
Daemon in each instance takes 100 rows to "do things" of this table, but I don't want 2 (or more) instances to take the same things.
So I have a "status" column ("todo", "doing", "done").
INSTANCE 1: It takes 100 rows where status = "todo" ... Then I need to UPDATE these rows asap to status "doing", so INSTANCE 2,3,..x can't take the same rows.
How can i do it ?
Please, I would like a solution without LOCKING WHOLE table, but locking just the rows (that's because I use innodb) ... I have read a lot about that (LOCK SHARE MODE, FOR UPDATE, COMMITs ... ) but I do not get the right way ...
You should use LOCK TABLES and UNLOCK TABLES functions to do this:
http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html
use a transaction and then SELECT ... FOR UPDATE when you read the records.
This way the records you read are locked. When you get all the data update the records to "doing" and COMMIT the transaction.
Maybe what you were missing is the use of a transaction, or the correct order of commands. Here is a basic example:
BEGIN TRANSACTION;
SELECT * FROM table WHERE STATUS = 'todo' FOR UPDATE;
// Loop over results in code, save necessary data to array/list..
UPDATE table SET STATUS ='doing' WHERE ...;
COMMIT;
// process the data...
UPDATE table SET STATUS ='done' WHERE ...;
I'm converting a webapp from mysql to SQL Server. Now I want to convert the following code (this is a simplified version):
LOCK TABLES media WRITE, deleted WRITE;
INSERT INTO deleted (stuff) SELECT stuff FROM media WHERE id=1 OR id=2;
DELETE FROM media WHERE id=1 OR id=2;
UNLOCK TABLES;
Because I'm copying stuff which is going to be deleted I want to make sure any reads to 'media' or 'deleted' will wait until this whole operation is ready. Otherwise these reads will see stuff that isnt there anymore a sec later.
How can I replicate this behavior in SQL Server? I read some pages on transactions and isolation levels but I can't figure out if I can disable any read to table 'media' and 'deleted' (or on row-level).
Thanx!
You could use lock hints in your query. If you specify a table lock and hold it until the end of the transaction, that should be equivalent.
begin transaction;
INSERT INTO deleted
SELECT stuff FROM media WITH (tablock holdlock)
WHERE id = 1 or id = 2;
DELETE FROM media where id = 1 or id = 2;
commit;
As a DB-agnostic approach, you might consider having a "deleted" or "inactive" column indicating whether or not results should be returned to users. For example, you could use an integer for the column, excluding the record from user's view if the value of the column is not zero. So, instead of the select and insert above, you could do (all examples are in the MySQL SQL dialect):
UPDATE media SET inactive=1 WHERE stuff=1 OR stuff=2;
This would exclude the records from user view. You could then copy the inactive records to the "deleted" table and delete them from the media table if desired, based on the time you last updated inactive records:
INSERT INTO deleted (stuff) SELECT stuff FROM media WHERE inactive = 1;
DELETE from media WHERE inactive <= 1;
The integer could be used to identify the "inactive job" that "deleted" the records.
Based on how you've described the schema, this scenario doesn't quite match what the locking approach because the "media" table could be modified during the execution of the UPDATE statement. That could be solved (or at least mitigated) if there were a column, such as a time stamp, that could be used to further define the records to mark inactive.