mysql select for delete - mysql

Edit:
I found a solution here http://mysql.bigresource.com/Track/mysql-8TvKWIvE/
assuming select takes a long time to execute, will this lock the table for a long time?
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT foo FROM bar WHERE wee = 'yahoo!';
DELETE FROM bar WHERE wee = 'yahoo!';
COMMIT;
I wish to use a criteria to select the rows in mysql, return them to my app as resultset, and then delete these rows. How can this be done? I know I can do the following but it's too inefficient:
select * from MyTable t where _critera_.
//get the resultset and then
delete from MyTable t where t.id in(...result...)
Do I need to use a transaction? Is there a single query solution?

I needed to SELECT some rows by some criteria, do something with the data, and then DELETE those same rows atomically, that is, without deleting any rows that meet the criteria but were inserted after the SELECT.
Contrary to other answers, REPEATABLE READ is not sufficient. Refer to Consistent Nonlocking Reads. In particular note this callout:
The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them.
You can try it yourself:
First create a table:
CREATE TABLE x (i INT NOT NULL, PRIMARY KEY (i)) ENGINE = InnoDB;
Start a transaction and examine the table (this will be called session 1 now):
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM x;
Start another session (session 2) and insert a row. Note this session is in auto commit mode.
INSERT INTO x VALUES (1);
SELECT * FROM x;
You will see your newly inserted row. Then back in session 1 again:
SELECT * FROM x;
DELETE FROM x;
COMMIT;
In session 2:
SELECT * FROM x;
You'll see that even though you get nothing from the SELECT in session 1, you delete one row. In session 2 you will see the table is empty at the end. Note the following output from session 1 in particular:
mysql> SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)
mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM x;
Empty set (0.00 sec)
/* --- insert in session 2 happened here --- */
mysql> SELECT * FROM x;
Empty set (0.00 sec)
mysql> DELETE FROM x;
Query OK, 1 row affected (0.00 sec)
mysql> COMMIT;
Query OK, 0 rows affected (0.06 sec)
mysql> SELECT * FROM x;
Empty set (0.00 sec)
This testing was done with MySQL 5.5.12.
For a correct solution
Use SERIALIZABLE transaction isolation level. However note that session 2 will block on the INSERT.
It seems that SELECT...FOR UPDATE will also do the trick. I have not studied the manual 100% in depth to understand this but it worked when I tried it. The advantage is you don't have to change the transaction isolation level. Again, session 2 will block on the INSERT.
Delete the rows individually after the SELECT. Basically you'd have to include a unique column (the primary key would be good) in the SELECT and then use DELETE FROM x WHERE i IN (...), or something similar, where IN contains a list of keys from the SELECT's result set. The advantage is you don't need to use a transaction at all and session 2 will not be blocked at any time. The disadvantage is that you have more data to send back and forth to the SQL server. Also I don't know if deleting the rows individually is as efficient as using the same WHERE clause as the original SELECT, but if the original SELECT's WHERE clause was complicated or slow the individual deletion may well be faster, so that could be another advantage.
To editorialize, this is one of those things that is so dangerous that even though it's documented it could almost be considered a "bug." But hey, the MySQL designers didn't ask me (or anyone else, apparently).

Do I need to use a transaction? Is there a single query solution?
Yes, you need to use a transaction. You cannot delete and select rows in a single query (i.e., there is no way to "return" or "select" the rows you have deleted).
You don't necessarily need to do the REPEATABLE READ option - I believe you could also select the rows FOR UPDATE, although this is a higher level of locking. REPEATABLE READ does seem to be the lowest level of locking you could use to execute this transaction safely. It happens to be the default for InnoDB.
How much this affects your table depends on whether you have an index on the wee column or not. Without it, I believe MySQL would have to lock writes the entire table.
Further reading:
Wikipedia - Isolation (database systems)
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html

Do a select statement. While looping through it, create a list string of unique IDs. Then pass this list back to mySQL using IN.

You could select your rows into a temporary table, then delete using the same criteria as your select. Since SELECT FROM WHERE FOR UPDATE also returns a result set, you could alter the SELECT FOR UPDATE to a SELECT INTO tmp_table FOR UPDATE. Then delete your selected rows, either using your original criteria, or by using the data in the temporary table as the criteria.
Something like this (but haven't checked it for syntax)
START TRANSACTION;
SELECT a,b into TMP_TABLE FROM table_a WHERE a=1 FOR UPDATE;
DELETE FROM table_a
USING table_a JOIN TMP_TABLE ON (table_a.a=TMP_TABLE.a, table_a.b=TMP_TABLE.b)
WHERE 1=1;
COMMIT;
Now your records are gone from the original table, but you also have a copy in your temporary table, which you can keep, or delete.

There is no single query solution. Use
select * from MyTable t where _critera_
//get the resultset and then
delete from MyTable where _critera_

Execute the SELECT statement with the WHERE clause and then use the same WHERE clause in the DELETE statement as well. Assuming there was no interim changes to the data, the same rows should be deleted.
EDIT: Yes, you could set this up as a single transaction so there's no modification to the tables while you're doing this.

Related

Transactions - Locks on INSERT INTO...SELECT/UPATE...SELECT type queries

What does the bold text refer to? The "SELECT part acts like READ COMMITTED" part I already understand with this sql
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION; -- snapshot 1 for this transaction is created
SELECT * FROM t1; -- result is 1 row, snapshot 1 is used
-- another transaction (different session) inserts and commits new row into t1 table
SELECT * FROM t1; -- result is still 1 row, because its REPEATABLE READ, still using snapshot 1
INSERT INTO t2 SELECT * FROM t1; -- this SELECT creates new snapshot 2
SELECT * FROM t2; -- result are 2 rows
SELECT * FROM t1; -- result is still 1 row, using snapshot 1
Here: https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
The type of read varies for selects in clauses like INSERT INTO ...
SELECT, UPDATE ... (SELECT), and CREATE TABLE ... SELECT that do not
specify FOR UPDATE or FOR SHARE:
By default, InnoDB uses stronger locks for those statements and the
SELECT part acts like READ COMMITTED, where each consistent read, even
within the same transaction, sets and reads its own fresh snapshot.
I do not understand THIS, what does a stronger block mean?
InnoDB uses stronger locks for those statements
This question helped me, but I still don't understand that part of the sentence.
Prevent INSERT INTO ... SELECT statement from creating its own fresh snapshot

Table lock during transaction contains max()

If I understand correctly this code
START TRANSACTION;
SELECT field FROM table WHERE ... FOR UPDATE ; // single row
UPDATE table SET field = ... ;
COMMIT;
will lock the SELECT row until COMMIT.
But if I use MAX()
START TRANSACTION;
SELECT MAX(field) FROM table WHERE ... FOR UPDATE ; // whole table
UPDATE table SET field = ... ;
COMMIT;
will this code lock the whole table until COMMIT?
EDIT
Sorry, I have my question wrong.
Obviously above code will lock rows affected by WHERE. But it wouldn't lock the table. Meaning
INSERT INTO table() VALUES();
could still took place regardless of COMMIT.
That would mean the return value of
SELECT MAX(field) FROM table WHERE ... FOR UPDATE ;
is now no longer valid.
How to lock the table during transaction so neither INSERT nor UPDATE could took place before COMMIT?
It doesn't matter what you're selecting. FOR UPDATE locks all the rows that have to be examined to evaluate the WHERE clause. Otherwise, another transaction could change the columns that are mentioned there, so the later UPDATE would assign to different rows.
And since inserting a new row can change the value of MAX(field), it actually locks the entire table. When I try your example, and try to insert a new from another transaction, the second transaction blocks until I commit the first transaction.

How to understand "not necessarily to DML statements" from mysql docs?

http://dev.mysql.com/doc/refman/5.5/en/innodb-consistent-read.html
Note
The snapshot of the database state applies to SELECT statements
within a transaction, not necessarily to DML statements. If you insert
or modify some rows and then commit that transaction, a DELETE or
UPDATE statement issued from another concurrent REPEATABLE READ
transaction could affect those just-committed rows, even though the
session could not query them. If a transaction does update or delete
rows committed by a different transaction, those changes do become
visible to the current transaction. For example, you might encounter a
situation like the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz'; -- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz'; -- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc'; -- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc'; -- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba'; -- Returns 10: this txn can now see the rows it just updated.
From the Glossary:
DML
Data manipulation language, a set of SQL statements for performing insert, update, and delete operations.
In other words, they're the SQL statements that modify data in tables, as opposed to just retrieving it as SELECT does.
What that paragraph is saying, as shown in the example below it, is that modification queries can affect rows that were committed in another transaction, even if that took place after you started your current transaction. And when that happens, your transaction's snapshot is updated to include those rows.

How to Test Select for Update in MySQL

I am performing SELECT ... FOR UPDATE or row level locking with InnoDB tables.
My intention is to only one request can read the same row. So if two users make request for the same data as the same time. Only one of them get data, who fires the query first.
But How can i test that locking is placed or not. as I am testing it by retrieving the same data at same time and both users getting the data.
Note: My tables are InnoDB, My query executes in transaction, my query as below:
SELECT * FROM table_name WHERE cond FOR UPDATE;
Any other thing I have to check for this to make work?
open 2 mysql client session.
on session 1:
mysql> start transaction;
mysql> SELECT * FROM table_name WHERE cond FOR UPDATE;
... (result here) ...
1 row in set (0.00 sec)
on session 2:
mysql> start transaction;
mysql> SELECT * FROM table_name WHERE cond FOR UPDATE;
... (no result yet, will wait for the lock to be released) ...
back to session 1, to update selected record (and release the lock):
mysql> UPDATE table_name SET something WHERE cond;
mysql> commit;
back to session 2:
1) either showing lock timeout error
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
2) or showing result
... (result here) ...
1 row in set (0.00 sec)
3) or showing no result (because corresponding record has been modified, so specified condition was not met)
Empty set (0.00 sec)
You can use own lock mechanizm with lock_by column.
UPDATE table_name SET locked_by=#{proccess_id} WHERE cond and locked_by IS NULL
Now in your program you will get count of affected rows:
if(affected_rows==0)
return 'rows locked'
else
//do your staff with locked_by=#{process_id} rows
With this mechanism you can control locked rows and locking processes. You can also add in UPDATE statement locked_at=NOW() to get more info about locked row.
Don't forget to add some index on locked_by column.
Here is MySQL docs about working with locks.
Before update you can put lock, releasing it after. In another transaction you can check lock using it unique name. Strategy for naming you can choose yourself.

SQL (mySQL) update some value in all records processed by a select

I am using mySQL from their C API, but that shouldn't be relevant.
My code must process records from a table that match some criteria, and then update the said records to flag them as processed. The lines in the table are modified/inserted/deleted by another process I don't control. I am afraid in the following, the UPDATE might flag some records erroneously since the set of records matching might have changed between step 1 and step 3.
SELECT * FROM myTable WHERE <CONDITION>; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1 WHERE <CONDITION> # step 3
What's the smart way to ensure that the UPDATE updates all the lines processed, and only them? A transaction doesn't seem to fit the bill as it doesn't provide isolation of that sort: a recently modified record not in the originally selected set might still be targeted by the UPDATE statement. For the same reason, SELECT ... FOR UPDATE doesn't seem to help, though it sounds promising :-)
The only way I can see is to use a temporary table to memorize the set of rows to be processed, doing something like:
CREATE TEMPORARY TABLE workOrder (jobId INT(11));
INSERT INTO workOrder SELECT myID as jobId FROM myTable WHERE <CONDITION>;
SELECT * FROM myTable WHERE myID IN (SELECT * FROM workOrder);
<iterate over the selected set of lines. This may take some time.>
UPDATE myTable SET processed=1 WHERE myID IN (SELECT * FROM workOrder);
DROP TABLE workOrder;
But this seems wasteful and not very efficient.
Is there anything smarter?
Many thanks from a SQL newbie.
There are several options:
You could lock the table
You could add an AND foo_id IN (all_the_ids_you_processed) as the update condition.
you could update before selecting and then only selecting the updated rows (i.e. by processing date)
I eventually solved this issue by using a column in that table that flags lines according to their status. This column let's me implement a simple state machine. Conceptually, I have two possible values for this status:
kNoProcessingPlanned = 0; #default "idle" value
kProcessingUnderWay = 1;
Now my algorithm does something like this:
UPDATE myTable SET status=kProcessingUnderWay WHERE <CONDITION>; # step 0
SELECT * FROM myTable WHERE status=kProcessingUnderWay; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1, status=kNoProcessingPlanned WHERE status=kProcessingUnderWay # step 3
This idea of having rows in several states can be extended to as many states as needed.