I'm converting a webapp from mysql to SQL Server. Now I want to convert the following code (this is a simplified version):
LOCK TABLES media WRITE, deleted WRITE;
INSERT INTO deleted (stuff) SELECT stuff FROM media WHERE id=1 OR id=2;
DELETE FROM media WHERE id=1 OR id=2;
UNLOCK TABLES;
Because I'm copying stuff which is going to be deleted I want to make sure any reads to 'media' or 'deleted' will wait until this whole operation is ready. Otherwise these reads will see stuff that isnt there anymore a sec later.
How can I replicate this behavior in SQL Server? I read some pages on transactions and isolation levels but I can't figure out if I can disable any read to table 'media' and 'deleted' (or on row-level).
Thanx!
You could use lock hints in your query. If you specify a table lock and hold it until the end of the transaction, that should be equivalent.
begin transaction;
INSERT INTO deleted
SELECT stuff FROM media WITH (tablock holdlock)
WHERE id = 1 or id = 2;
DELETE FROM media where id = 1 or id = 2;
commit;
As a DB-agnostic approach, you might consider having a "deleted" or "inactive" column indicating whether or not results should be returned to users. For example, you could use an integer for the column, excluding the record from user's view if the value of the column is not zero. So, instead of the select and insert above, you could do (all examples are in the MySQL SQL dialect):
UPDATE media SET inactive=1 WHERE stuff=1 OR stuff=2;
This would exclude the records from user view. You could then copy the inactive records to the "deleted" table and delete them from the media table if desired, based on the time you last updated inactive records:
INSERT INTO deleted (stuff) SELECT stuff FROM media WHERE inactive = 1;
DELETE from media WHERE inactive <= 1;
The integer could be used to identify the "inactive job" that "deleted" the records.
Based on how you've described the schema, this scenario doesn't quite match what the locking approach because the "media" table could be modified during the execution of the UPDATE statement. That could be solved (or at least mitigated) if there were a column, such as a time stamp, that could be used to further define the records to mark inactive.
Related
I have a table EMPLOYEE with the following columns in my MySQL (innoDB) database,
internal_employee_id (auto incrementing PK)
external_employee_id
name
gender
exported (boolean field)
In a distributed system I want to ensure that multiple processes in the cluster read the top 100 distinct rows from the table each time for which the exported column is set to false. The rows read by the process should remain locked during calculation such that if process1 reads row 1-100, process2 should not be able to see the rows from 1-100 and should then pick up the next available 100 rows.
For this, I tried using pessimistic_write locks but they don't seem to serve the purpose. They do block multiple processes from updating at the same time but multiple processes can read the same locked rows.
I tried using the following java code,
Query query = entityManager.createNativeQuery("select * from employee " +
"where exported = 0 limit 100 for update");
List<Employee> employeeListLocked = query.getResultList();
EDIT: Found the answer finally
What I needed was to use the "Skip Locked" feature. So my updated code has become:
Query query = entityManager.createNativeQuery("select * from employee " +
"where exported = 0 limit 100 for update skip locked");
with the help of 'skip locked' all the rows that are in a locked state are ignored/skipped by the db engine when running a select. Hope this helps you all.
You could add a new column in the table
for example, a column named 'processed' (boolean field) and update all the records with the false value
update EMPLOYEE set processed = 0;
When a process starts, in the same transaction, you can select for update and then update in these 100 rows the column processed to 1.
Query query = entityManager.createNativeQuery("select * from employee " +
"where exported = 0 and processed = 0
order by internal_employee_id desc limit 100 for update");
List<Employee> employeeListLocked = query.getResultList();
make an update on these 100 rows
UPDATE EMPLOYEE eUpdate INNER JOIN (select internal_employee_id
from EMPLOYEE where exported = 0 and processed = 0
order by internal_employee_id desc limit 100) e
ON eUpdate.internal_employee_id = e.internal_employee_id
SET eUpdate.processed = 1 ;
Then, the next process will not process the same list
There are a couple of ways to block reads:
The session that wants to update the tables first does:
LOCK TABLES employee WRITE;
This acquires an exclusive metadata lock on the table. Then other sessions are blocked, even if they only try to read that table. They must wait for a metadata lock. See https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html for more information on this.
The downside of table locks is that they lock the whole table. There's no way to use this to lock individual rows or sets of rows.
Another solution is that you must code all reads to require a shared lock:
SELECT ... FROM employee WHERE ... LOCK IN SHARE MODE;
MySQL 8.0 changes the syntax, but it works the same way:
SELECT ... FROM employee WHERE ... FOR SHARE;
These are not metadata locks, they're row locks. So you can lock individual rows or sets of rows.
A request for a shared lock on some rows won't conflict with other shared locks on those rows, but if there's an exclusive lock on the rows, the SELECT FOR SHARE waits. The reverse is true too -- if there's any SELECT FOR SHARE on the rows uncommitted, the request for exclusive lock waits.
The downside of this method is that it only works if all queries that read that table have the FOR SHARE option.
All that said, I post this just to answer your question directly. I do think that the system described in the answer from Perkilis is good. I implemented a system like that recently, and it works.
Sometimes the implementation you have in mind is not the best solution, and you need to consider another way to solve the problem.
-- In a transaction by itself:
UPDATE t
SET who_has = $me -- some indicate of the process locking the rows
WHERE who_has IS NULL
LIMIT 100;
-- Grab some or all rows that you have and process them.
-- You should not need to lock them further (at least not for queue management)
SELECT ... WHERE who_has = $me ...
-- Eventually, release them, either one at a time, or all at once.
-- Here's the bulk release:
UPDATE t SET who_has = NULL
WHERE who_has = $me
-- Again, this UPDATE is in its own transaction.
Note that this general mechanism has no limitations on how long it takes to "process" the items.
Also, the use of that extra who_has column helps you if there is a crash without releasing the items. It should be augmented by a timestamp of when the items were grabbed. A cron job (or equivalent) should look around for any unprocessed items that have been locked for "too long".
FOUND THE ANSWER:
What I needed was to use the "Skip Locked" feature. So my updated code has become:
Query query = entityManager.createNativeQuery("select * from employee " +
"where exported = 0 limit 100 for update skip locked");
List<Employee> employeeListLocked = query.getResultList();
with the help of 'skip locked' all the rows that are in a locked state are ignored/skipped by the db engine when running a select. Hope this helps you all.
I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down
I ran into a problem and can't choose the right solution.
I have a SELECT query that selects records from table.
These records has an status column as seen below.
SELECT id, <...>, status FROM table WHERE something
Now, right after this SELECT I have to UPDATE the status column.
How can I do it to avoid a race condition?
What I want to achieve is once somebody (session) selected something, this something cannot be selected by anybody else until I do not release it manually (for example using a status column).
Thoughts?
There is some mysql documentation, thar may be interesting to solve your task, not sure if it fit you needs, but it describes right way to do select followed by update.
The technique described does not prevent other sessions reading, but prevent writing of selected record until the end of transaction.
It contains an example similar to your problem:
SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;
It is required that you tables use Innodb engine and your programs use transactions.
If you need locking only for short time, i.e. one session select row with lock, update it, and release lock in one session, then you do not need field status at all, just use select ... for update and select ... lock in share mode so if all sessions will use these two with conjunction with transactions select... for update then update to modify, and select ... with shared lock to just read - this will solve your requirements.
If you need to lock for long time, select and lock in one session and then update and release in another, then right you use some storage to keep lock statuses and all session should use as described below: select ... for update and set status and status owner in one session, then in another session select for update check status and owner, update and remove status - for updating scenario, and for read scenario: select ... with shared lock check status.
You can do it with some preparations. Add a column sessionId to your table. It has to be NULL-able and it will contain the unique ID of the session that acquires the row. Also add an index on this new column; we'll use the column to search for rows in the table.
ALTER TABLE `tbl`
ADD COLUMN `sessionId` CHAR(32) DEFAULT NULL,
ADD INDEX `sessionId`(`sessionId`)
When a session needs to acquire some rows (based on some criteria) run:
UPDATE `tbl`
SET `sessionId` = 'aaa'
WHERE `sessionId` IS NULL
AND ...
LIMIT bbb
Replace aaa with the current session ID and ... with the conditions you need to select the correct rows. Replace bbb with the number of rows you need to acquire. Add an ORDER BY clause if you need to process the rows in a certain order (if some of them have higher priority than others). You can also add status = ... in the UPDATE clause to change the status of the acquired rows (to pending f.e.) to let other instances of the code know those rows are processed right now.
The query above acquires some rows. Next, run:
SELECT *
FROM `tbl`
WHERE `sessionId` = 'aaa'
This query gets the acquired rows to be processed in the client code.
After each row is processed, you either DELETE the row or UPDATE it and set sessionId to NULL (release the row) and status to reflect its new status.
Also you should release the rows (using the same procedure as above) when the session is closed.
My question is similar to:
Ignoring locked row in a MySQL query
except that I have already implemented a logic close to what's suggested in the accepted answer. My question is how to set the process id initially. All servers run a query like (the code is in ruby on rails but the resulting mysql query is):
UPDATE (some_table) SET process_id=(some process_id) WHERE (some condition on row_1) AND process_id is null ORDER BY (row_1) LIMIT 100
Now what happens is all processes try to update the same rows, they get locked and they timeout waiting for the lock. I would like the servers to ignore the rows that are locked (because after the lock is released the process_id won't be null anymore so there is no point for locking here).
I could try to randomize the batch of records to update but the problem is I want to prioritize the update based on row_1 as in the query above.
So my question is, is there a way in mysql to check if a record is locked and ignore it if it is?
No, there is no way to ignore already-locked rows. Your best bet will be to ensure that nothing locks any row for any extended period of time. That will ensure that any lock conflicts are very short in duration. That will generally mean "advisory" locking of rows by locking them within a transaction (using FOR UPDATE) and updating the row to mark it as "locked".
For example, first you want to find your candidate row(s) without locking anything:
SELECT id FROM t WHERE lock_expires IS NULL AND lock_holder IS NULL <some other conditions>;
Now lock only the row you want, very quickly:
START TRANSACTION;
SELECT * FROM t WHERE id = <id> AND lock_expires IS NULL AND lock_holder IS NULL;
UPDATE t SET lock_expires = <some time>, lock_holder = <me> WHERE id = <id>;
COMMIT;
(Technical note: If you are planning to lock multiple rows, always lock them in a specific order. Ascending order by primary key is a decent choice. Locking out-of-order or in random order will subject your program to deadlocks from competing processes.)
Now you can take as long as you want (less than lock_expires) to process your row(s) without blocking any other process (they won't match the row during the non-locking select, so will always ignore it). Once the row is processed, you can UPDATE or DELETE it by id, also without blocking anything.
I have innodb table read by lot of different instances (cloud)
Daemon in each instance takes 100 rows to "do things" of this table, but I don't want 2 (or more) instances to take the same things.
So I have a "status" column ("todo", "doing", "done").
INSTANCE 1: It takes 100 rows where status = "todo" ... Then I need to UPDATE these rows asap to status "doing", so INSTANCE 2,3,..x can't take the same rows.
How can i do it ?
Please, I would like a solution without LOCKING WHOLE table, but locking just the rows (that's because I use innodb) ... I have read a lot about that (LOCK SHARE MODE, FOR UPDATE, COMMITs ... ) but I do not get the right way ...
You should use LOCK TABLES and UNLOCK TABLES functions to do this:
http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html
use a transaction and then SELECT ... FOR UPDATE when you read the records.
This way the records you read are locked. When you get all the data update the records to "doing" and COMMIT the transaction.
Maybe what you were missing is the use of a transaction, or the correct order of commands. Here is a basic example:
BEGIN TRANSACTION;
SELECT * FROM table WHERE STATUS = 'todo' FOR UPDATE;
// Loop over results in code, save necessary data to array/list..
UPDATE table SET STATUS ='doing' WHERE ...;
COMMIT;
// process the data...
UPDATE table SET STATUS ='done' WHERE ...;