How to understand "not necessarily to DML statements" from mysql docs? - mysql

http://dev.mysql.com/doc/refman/5.5/en/innodb-consistent-read.html
Note
The snapshot of the database state applies to SELECT statements
within a transaction, not necessarily to DML statements. If you insert
or modify some rows and then commit that transaction, a DELETE or
UPDATE statement issued from another concurrent REPEATABLE READ
transaction could affect those just-committed rows, even though the
session could not query them. If a transaction does update or delete
rows committed by a different transaction, those changes do become
visible to the current transaction. For example, you might encounter a
situation like the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz'; -- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz'; -- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc'; -- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc'; -- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba'; -- Returns 10: this txn can now see the rows it just updated.

From the Glossary:
DML
Data manipulation language, a set of SQL statements for performing insert, update, and delete operations.
In other words, they're the SQL statements that modify data in tables, as opposed to just retrieving it as SELECT does.
What that paragraph is saying, as shown in the example below it, is that modification queries can affect rows that were committed in another transaction, even if that took place after you started your current transaction. And when that happens, your transaction's snapshot is updated to include those rows.

Related

MySQL SET user variable locks rows and doesn't obey REPEATABLE READ

I've encountered an undocumented behavior of "SET #my_var = (SELECT ..)" inside a transaction:
The first one is that it locks rows ( depends whether it is a unique index or not ).
Example -
START TRANSACTION;
SET #my_var = (SELECT id from table_name where id = 1);
select trx_rows_locked from information_schema.innodb_trx;
ROLLBACKL;
The output is 1 row locked, which is strange, it shouldn't gain a reading lock.
Also, the equivalent statement SELECT id INTO #my_var won't produce a lock.
It can lead to a deadlock in case of an UPDATED after the SET statement ( for 2 concurrent requests )
In REPEATABLE READ -
The SELECT inside the SET statement gets a new snapshot of the data, instead of using the original SNAPSHOT.
SESSION 1:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START transaction;
SELECT data FROM my_table where id = 2; # Output : 2
SESSION 2:
UPDATE my_table set data = 3 where id = 2 ;
SESSION 1:
SET #data = (SELECT data FROM my_table where id = 2);
SELECT #data; # Output : 3, instead of 2
ROLLBACK;
However, I would expect that #data will contain the original value from the first snapshot ( 2 ).
If I use SELECT data into #data from my_table where id = 2 then I will get the expected value - 2;
Do you have an idea what is the source of the different behavior of SET = (SELECT ..) compared to SELECT data INTO #var FROM .. ?
Thanks.
Correct — when you SELECT in a context where you're copying the results into a variable or a table, it implicitly works as if you had used a locking read SELECT ... FOR SHARE.
This means it places a shared lock on the rows examined, and it also means that the statement reads only the most recently committed version of rows, as if your transaction were in READ-COMMITTED isolation level.
I'm not sure why SELECT ... INTO #var does not do the same kind of implicit locking in MySQL 8.0. My memory is that in older versions of MySQL it did do locking in that query form. I've searched the manual for an explanation but I can't find one yet.
Other cases that implicitly lock the rows examined by SELECT, and therefore reads data as if you transaction is READ-COMMITTED:
INSERT INTO <table> SELECT ...
UPDATE or DELETE multi-table, even if you don't update or delete a given table, the rows joined become locked.
SELECT inside a trigger

Understanding InnoDB Repeatable Read isolation level snapshots

I have the following table:
CREATE TABLE `accounts` (
`name` varchar(50) NOT NULL,
`balance` int NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
And it has two accounts in it. "Bob" has a balance of 100. "Jim" has a balance of 200.
I run this query to transfer 50 from Jim to Bob:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT * FROM accounts;
SELECT SLEEP(10);
SET #bobBalance = (SELECT balance FROM accounts WHERE name = 'bob' FOR UPDATE);
SET #jimBalance = (SELECT balance FROM accounts WHERE name = 'jim' FOR UPDATE);
UPDATE accounts SET balance = #bobBalance + 50 WHERE name = 'bob';
UPDATE accounts SET balance = #jimBalance - 50 WHERE name = 'jim';
COMMIT;
While that query is sleeping, I run the following query in a different session to set Jim's balance to 500:
UPDATE accounts SET balance = 500 WHERE name = 'jim';
What I thought would happen is that this would cause a bug. The transaction would set Jim's balance to 150, because the first read in the transaction (before the SLEEP) would establish a snapshot in which Jim's balance is 200, and that snapshot would be used in the later query to get Jim's balance. So we would subtract 50 from 200 even though Jim's balance has actually been changed to 500 by the other query.
But that's not what happens. Actually, the end result is correct. Bob has 150 and Jim has 450. But I don't understand why this is.
The MySQL documentation says about Repeatable Read:
This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other. See Section 15.7.2.3, “Consistent Nonlocking Reads”.
So what am I missing here? Why does it seem like the SELECT statements in the transaction are not all using a snapshot established by the first SELECT statement?
The repeatable-read behavior only works for non-locking SELECT queries. It reads from the snapshot established by the first query in the transaction.
But any locking SELECT query reads the latest committed version of the row, as if you had started your transaction in READ-COMMITTED isolation level.
A SELECT is implicitly a locking read if it's involved in any kind of SQL statement that modifies data.
For example:
INSERT INTO table2 SELECT * FROM table1 WHERE ...;
The above locks examined rows in table1, even though the statement is just copying them to table2.
SET #myvar = (SELECT ... FROM table1 WHERE ...);
This is also copying a value from table1, into a variable. It locks the examined row in table1.
Likewise SELECT statements that are invoked in a trigger, or as part of a multi-table UPDATE or DELETE, and so on. Anytime the SELECT is part of a larger statement that modifies any data (in a table or in a variable), it locks the rows examined by the SELECT.
And therefore it's a locking read, and behaves like an UPDATE with respect to which row version it reads.

Difference in Repeatable Read Semantics in MySQL and PostgreSQL

I understand that in both MySQL and PostgreSQL, the REPEATABLE READ isolation level will make the reads see the snapshot at the beginning of the transaction. But in the MySQL documentation at https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
following Note is mentioned with an example
The snapshot of the database state applies to SELECT statements within
a transaction, not necessarily to DML statements. If you insert or
modify some rows and then commit that transaction, a DELETE or UPDATE
statement issued from another concurrent REPEATABLE READ transaction
could affect those just-committed rows, even though the session could
not query them. If a transaction does update or delete rows committed
by a different transaction, those changes do become visible to the
current transaction. For example, you might encounter a situation like
the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz';
-- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz';
-- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc';
-- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc';
-- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba';
-- Returns 10: this txn can now see the rows it just updated.
Does the same examples hold true for PostgreSQL or it will not allow such behaviour?
This cannot happen in PostgreSQL.
If a REPEATABLE READ transaction A tries to modify a row that has been modified by a concurrent transaction B after A's snapshot has been taken, A will receive a “serialization error”.

Consistent read

This is from MySQL docs(link provided below)
Note
The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them. If a transaction does update or delete rows committed by a different transaction, those changes do become visible to the current transaction. For example, you might encounter a situation like the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz'; -- Returns 0: no rows match.
DELETE
FROM t1
WHERE c1 = 'xyz'; -- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc'; -- Returns 0: no rows match.
UPDATE t1
SET c2 = 'cba'
WHERE c2 = 'abc'; -- Affects 10 rows: another txn just
-- committed 10 rows with 'abc' values.
SELECT COUNT(c2)
FROM t1
WHERE c2 = 'cba'; -- Returns 10: this txn can now see the rows it just
Link to docs
Could someone authoritatively answer on such question: in example above we could see that SELECT after UPDATE ables to see changes commited by different concurrent transaction. Looks like UPDATE statement freshes snapshot view for SELECT statement, right? Does it fresh whole snapshot view for subsequent SELECT statements or just fresh snapshot view of "t1" table?

mysql select for delete

Edit:
I found a solution here http://mysql.bigresource.com/Track/mysql-8TvKWIvE/
assuming select takes a long time to execute, will this lock the table for a long time?
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT foo FROM bar WHERE wee = 'yahoo!';
DELETE FROM bar WHERE wee = 'yahoo!';
COMMIT;
I wish to use a criteria to select the rows in mysql, return them to my app as resultset, and then delete these rows. How can this be done? I know I can do the following but it's too inefficient:
select * from MyTable t where _critera_.
//get the resultset and then
delete from MyTable t where t.id in(...result...)
Do I need to use a transaction? Is there a single query solution?
I needed to SELECT some rows by some criteria, do something with the data, and then DELETE those same rows atomically, that is, without deleting any rows that meet the criteria but were inserted after the SELECT.
Contrary to other answers, REPEATABLE READ is not sufficient. Refer to Consistent Nonlocking Reads. In particular note this callout:
The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them.
You can try it yourself:
First create a table:
CREATE TABLE x (i INT NOT NULL, PRIMARY KEY (i)) ENGINE = InnoDB;
Start a transaction and examine the table (this will be called session 1 now):
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM x;
Start another session (session 2) and insert a row. Note this session is in auto commit mode.
INSERT INTO x VALUES (1);
SELECT * FROM x;
You will see your newly inserted row. Then back in session 1 again:
SELECT * FROM x;
DELETE FROM x;
COMMIT;
In session 2:
SELECT * FROM x;
You'll see that even though you get nothing from the SELECT in session 1, you delete one row. In session 2 you will see the table is empty at the end. Note the following output from session 1 in particular:
mysql> SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)
mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM x;
Empty set (0.00 sec)
/* --- insert in session 2 happened here --- */
mysql> SELECT * FROM x;
Empty set (0.00 sec)
mysql> DELETE FROM x;
Query OK, 1 row affected (0.00 sec)
mysql> COMMIT;
Query OK, 0 rows affected (0.06 sec)
mysql> SELECT * FROM x;
Empty set (0.00 sec)
This testing was done with MySQL 5.5.12.
For a correct solution
Use SERIALIZABLE transaction isolation level. However note that session 2 will block on the INSERT.
It seems that SELECT...FOR UPDATE will also do the trick. I have not studied the manual 100% in depth to understand this but it worked when I tried it. The advantage is you don't have to change the transaction isolation level. Again, session 2 will block on the INSERT.
Delete the rows individually after the SELECT. Basically you'd have to include a unique column (the primary key would be good) in the SELECT and then use DELETE FROM x WHERE i IN (...), or something similar, where IN contains a list of keys from the SELECT's result set. The advantage is you don't need to use a transaction at all and session 2 will not be blocked at any time. The disadvantage is that you have more data to send back and forth to the SQL server. Also I don't know if deleting the rows individually is as efficient as using the same WHERE clause as the original SELECT, but if the original SELECT's WHERE clause was complicated or slow the individual deletion may well be faster, so that could be another advantage.
To editorialize, this is one of those things that is so dangerous that even though it's documented it could almost be considered a "bug." But hey, the MySQL designers didn't ask me (or anyone else, apparently).
Do I need to use a transaction? Is there a single query solution?
Yes, you need to use a transaction. You cannot delete and select rows in a single query (i.e., there is no way to "return" or "select" the rows you have deleted).
You don't necessarily need to do the REPEATABLE READ option - I believe you could also select the rows FOR UPDATE, although this is a higher level of locking. REPEATABLE READ does seem to be the lowest level of locking you could use to execute this transaction safely. It happens to be the default for InnoDB.
How much this affects your table depends on whether you have an index on the wee column or not. Without it, I believe MySQL would have to lock writes the entire table.
Further reading:
Wikipedia - Isolation (database systems)
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
Do a select statement. While looping through it, create a list string of unique IDs. Then pass this list back to mySQL using IN.
You could select your rows into a temporary table, then delete using the same criteria as your select. Since SELECT FROM WHERE FOR UPDATE also returns a result set, you could alter the SELECT FOR UPDATE to a SELECT INTO tmp_table FOR UPDATE. Then delete your selected rows, either using your original criteria, or by using the data in the temporary table as the criteria.
Something like this (but haven't checked it for syntax)
START TRANSACTION;
SELECT a,b into TMP_TABLE FROM table_a WHERE a=1 FOR UPDATE;
DELETE FROM table_a
USING table_a JOIN TMP_TABLE ON (table_a.a=TMP_TABLE.a, table_a.b=TMP_TABLE.b)
WHERE 1=1;
COMMIT;
Now your records are gone from the original table, but you also have a copy in your temporary table, which you can keep, or delete.
There is no single query solution. Use
select * from MyTable t where _critera_
//get the resultset and then
delete from MyTable where _critera_
Execute the SELECT statement with the WHERE clause and then use the same WHERE clause in the DELETE statement as well. Assuming there was no interim changes to the data, the same rows should be deleted.
EDIT: Yes, you could set this up as a single transaction so there's no modification to the tables while you're doing this.