Consider a table R(A) containing {(1),(2)}.
Suppose transactions
T1: UPDATE R SET A = 2*A
T2: SELECT AVG(A) FROM R
If transaction T2 executes using "read uncommitted", what are the possible values it returns?
My guess is that it can return either 1.5, 2, 2.5 or 3 because of read uncommitted and the fact that the order of tuples doesn't matter as I understood from the courses.Am I right?
From the MySQL documentation for READ UNCOMMITTED:
SELECT statements are performed in a nonlocking fashion, but a possible earlier version of a row might be used. Thus, using this isolation level, such reads are not consistent. This is also called a dirty read. Otherwise, this isolation level works like READ COMMITTED.
A dirty read might view the table before, during, or after the update takes place. This means that, as you pointed out, the apparent average could be 1.5, 2, 2.5, or 3. Consider:
R1 | R2 | average
1 | 2 | 1.5 <-- update not yet started
2 | 2 | 2 <-- update in progress
1 | 4 | 2.5 <-- update still in progress
2 | 4 | 3 <-- update completed
Related
I have the following memberships table with some initial data.
CREATE TABLE memberships (
id SERIAL PRIMARY KEY,
user_id INT,
group_id INT
);
INSERT INTO memberships(user_id, group_id)
VALUES (1, 1), (2, 1), (1, 2), (2, 2);
I have two transactions (repeatable read isolation level) deleting all the rows whose group_id is 2 from the memberships table and retrieving the result using a select query, but the result I get is surprising.
time
transaction 1
transaction 2
T1
start transaction
T2
delete from memberships where group_id = 2
start transaction
T3
select * from memberships this is to make MySQL believe that transaction 2 starts before transaction 1 finishes
T4
select * from memberships this prints only rows whose group_id is 1
T5
commit
T6
delete from memberships where group_id = 2
T7
select * from memberships surprisingly, this prints all rows including rows whose group_id is 2
Below is the result I get from T7.
select * from memberships;
+----+---------+----------+
| id | user_id | group_id |
+----+---------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
+----+---------+----------+
4 rows in set (0.00 sec)
This is really surprising since this select query is immediately preceded by a delete query which should remove all the rows whose group_id is 2.
I tried this on MySQL 5.7 and 8.0, and both of them have this issue.
I also tried this on Postgres 14 (also repeatable read isolation level), fortunately, Postgres doesn't have this issue. At timestamp T6, I get an error could not serialize access due to concurrent delete.
Can someone explain to me:
Why MySQL has the issue I described above? How does MySQL implement deletion and how does it work with the MySQL MVCC scheme?
Why Postgres doesn't have the issue? How does Postgres implement deletion and how does it work with the Postgres MVCC implementation?
Thanks a lot!
The repeatable read behavior you are seeing is mentioned in the MySQL documentation:
This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read.
This means that the repeatable snapshot which the second transaction sees throughout its transaction is established at T3. Keep in mind that repeatable read is the default isolation level for MySQL.
On Postgres, the default isolation level is not repeatable read but rather read committed. Under this isolation level, attempting the delete from the second transaction which interleaves with the first transaction generates the serialize access error. If you explicitly set the isolation level in Postgres you should get similar behavior:
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
I'm working on an application in a concurrent situation where multiple instances of the application concurrently update rows in database.
Each application instance creates an update event in update event table, an update event can have status of either IN_PROGRESS/NEW/CANCELED.
I want to create a query to update an update event if:
no update event on the same itemId with status = IN_PROGRESS
no update event on the same itemId with status = NEW and timestamp > current update event time stamp.
Table:
UpdateId | itemId | status | time_stamp
1 | 1 | IN_PROGRESS | 1.1
2 | 1 | NEW | 1.2
3 | 1 | NEW | 1.3
4 | 1 | NEW | 1.4
With update 1, 2, 3, 4 as above basically I want 2 to wait until 1 is done, if 3, 4 come then 2 -> canceled. Same for 3.
Something like:
Update UPDATE_EVENT SET status = IN_PROGRESS IF {
SELECT count (*) FROM UPDATE_EVENT where status=IN_PROGRESS & itemId=item1 = 0
&&
SELECT count (*) FROM UPDATE_EVENT where status=NEW & timestamp > updateId_abc123.timestamp = 0
} WHERE updateId=abc123
The updates are not very frequent, also latency is not an issue.
Any ides on how I can build the query and is it thread safe?
The main question is how frequently and what performance requirements do you have over this process. There are a shortcut and a very long way.
The very long way would require you to use an ordered/single thread processor that would receive the requests and queue them. Use a stream processor and other ideas to control these requests. Using a stream processor would scale very well if you have a large number of updates in a show time.
For smaller applications, it is possible to check a concurrency isolation level. Concurrency use some locking mechanism to ensure the first one to start the transaction will finish it and only after that other instance would be able to do their changes too.
Both are not quick solutions and would require you to read a bit about them. How to set the isolation level on your SGBD, in the application code, etc.
In my tests I have seen that when using MariaDB, executing the same query in REPETEABLE_READ isolation doesn't produce phantom reads, when it should.
For instance:
I have two rows in the bank_account table:
ID | OWNER | MONEY
------------------------
1 | John | 1000
2 | Louis | 2000
The expected flow should be as shown below:
THREAD 1 (REPETEABLE_READ) THREAD 2 (READ_UNCOMMITED)
| |
findAll()->[1|John|1000,2|Louis|2000] |
| |
| updateAccount(1, +100)
| createAccount("Charles", 3000)
| flush()
| |
| commitTx()
| |_
|
findAll()->[1|John|1000,2|Louis|2000,
| 3|Charles|3000]
|
|
commitTx()
|_
To sum up, after Thread2.createAccount("Charles", 3000); and its flush, Thread1 would search all rows and would get
ID | OWNER | MONEY
------------------------
1 | John | 1000
2 | Louis | 2000
3 | Charles | 3000
Thread1 is protected from uncommited changes seeing [1, John, 1000] instead of [1, John, 1100] but it is supposed to see the new inserted row.
However, what Thread1 retrieves in the second findAll are the exact same results as the ones from the first findAll():
ID | OWNER | MONEY
------------------------
1 | John | 1000
3 | Charles | 3000
It doesn't have phantom reads. Why?????
This is the code executed by Thread1:
#Transactional(readOnly=true, isolation=Isolation.REPEATABLE_READ)
#Override
public Iterable<BankAccount> findAllTwiceRepeteableRead(){
printIsolationLevel();
Iterable<BankAccount> accounts = baDao.findAll();
logger.info("findAllTwiceRepeteableRead() 1 -> {}", accounts);
//PAUSE HERE
...
}
I pause the execution where it sais //PAUSE HERE.
Then Thread2 executes:
bankAccountService.addMoneyReadUncommited(ba.getId(), 200);
bankAccountService.createAccount("Carlos", 3000);
And then Thread1 resumes:
//PAUSE HERE
...
Iterable<BankAccount> accounts = baDao.findAll();
logger.info("findAllTwiceRepeteableRead() 2 -> {}", accounts);
UPDATE:
I've updated the thread transaction flows with what I'm really doing (I am commiting the second transaction after the new row insert).
This matches what, according to wikipedia is a phantom read and I think is the very same scenario. So I still don't get why I'm not getting the phantom read [3|Charles,3000]
A phantom read occurs when, in the course of a transaction, two
identical queries are executed, and the collection of rows returned by
the second query is different from the first.
This can occur when range locks are not acquired on performing a
SELECT ... WHERE operation. The phantom reads anomaly is a special
case of Non-repeatable reads when Transaction 1 repeats a ranged
SELECT ... WHERE query and, between both operations, Transaction 2
creates (i.e. INSERT) new rows (in the target table) which fulfill
that WHERE clause.
Transaction 1 Transaction 2
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
/* Query 2 */
INSERT INTO users(id,name,age) VALUES ( 3, 'Bob', 27 );
COMMIT;
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
COMMIT;
What you described as the actual behaviour is in fact the correct behaviour for repeatable_read. The behaviour you are expecting can be achieved by using read_committed.
As mariadb documentation on repeatable_read says (bolding is mine):
there is an important difference from the READ COMMITTED isolation
level: All consistent reads within the same transaction read the
snapshot established by the first read.
In thread 1 the 1st FindAll() call returning John and Louis established the snapshot. The 2nd FindAll() simply used the same snapshot.
This is further corroborated by a Percona blog post on Differences between READ-COMMITTED and REPEATABLE-READ transaction isolation levels:
In REPEATBLE READ, a ‘read view’ ( trx_no does not see trx_id >= ABC,
sees < ABB ) is created at the start of the transaction, and this
read view (consistent snapshot in Oracle terms) is held open for the
duration of the transaction. If you execute a SELECT statement at 5AM,
and come back in an open transaction at 5PM, when you run the same
SELECT, then you will see the exact same resultset that you saw at
5AM. This is called MVCC (multiple version concurrency control) and
it is accomplished using row versioning and UNDO information.
UPDATE
Caveat: The following references are from the MySQL documentation. However, since these references relate to the innodb storage engine, I firmly believe that they apply to mariadb's innodb storage engine as well.
So, in innodb storage engine under repeatable read isolation level, the non-locking selects within the same transaction read from the snapshot established by the first read. No matter how many records were inserted / updated / deleted in concurrent committed transactions, the reads will be consistent. Period.
This is the scenario described by the OP in the question. This would imply that a non-locking read in repeatable read isolation level would not be able to produce a phantom read, right? Well, not exactly.
As MySQL documentation on InnoDB Consistent Nonlocking Reads says:
The snapshot of the database state applies to SELECT statements within
a transaction, not necessarily to DML statements. If you insert or
modify some rows and then commit that transaction, a DELETE or UPDATE
statement issued from another concurrent REPEATABLE READ transaction
could affect those just-committed rows, even though the session could
not query them. If a transaction does update or delete rows committed
by a different transaction, those changes do become visible to the
current transaction. For example, you might encounter a situation like
the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz';
-- Returns 0: no rows match. DELETE FROM t1 WHERE c1 = 'xyz';
-- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc';
-- Returns 0: no rows match. UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc';
-- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba';
-- Returns 10: this txn can now see the rows it just updated.
To sum up: if you use innodb with repeatable read isolation mode, then phantom reads may occur if data modification statements in concurrent committed transactions interact with data modification statements within the current transaction.
The linked Wikipedia article on isolation levels describes a general theoretical model. You always need to read the actual product manual how a certain feature is implemented because there may be differences.
In the Wikipedia article only locks are described as a mean of preventing the phantom reads. However, innodb uses the creation of the snapshot to prevent the phantom reads in most of the cases, thus there is no need to rely on locks.
How are the rules in context of table row encountering when UPDATE with WHERE is performed on non-unique indexed column ?
I have a test table with col column as non-unique index:
id | col
----------
1 | 1
----------
2 | 2
----------
3 | 2
----------
22 | 3
UPDATE tab SET col=1 WHERE col=1;
// OR
UPDATE tab SET col=3 WHERE col=3;
// OR
UPDATE tab SET col=2 WHERE col=2;
// These updates encounter ONLY rows where col=1, col=3 or col=2
Same table and same updates, but with one more record in the table where col=2:
id | col
----------
1 | 1
----------
2 | 2
----------
3 | 2
----------
4 | 2
----------
22 | 3
UPDATE tab SET col=1 WHERE col=1;
// OR
UPDATE tab SET col=3 WHERE col=3;
// Both updates encounter ONLY rows where col=1 or col=3.
UPDATE tab SET col=2 WHERE col=2;
// This update encounters ALL the rows in the table even those where col IS NOT 2.
// WHY ?
In short, every row encountered in the processing of an UPDATE is exclusively row-locked. This means that the locking impact of an UPDATE depends on how the query is processed to read the rows to be updated. If your UPDATE query uses no index, or a bad index, it may lock many or all rows. (Note that the order in which rows are locked also depends on the index used.) In your case, since your table is very small and you're materially changing the distribution of the rows in your index, it is choosing to use a full table scan for the query in question.
You can test the performance and behavior of most UPDATE queries by converting them to a SELECT and using EXPLAIN SELECT on them (in newer versions you can even EXPLAIN UPDATE).
In short, though: You should have tables with a realistic distribution of data bInefore testing performance or locking behavior, not a very small table with a few test rows.
There is a wonderful article out there.I believe this would answer your queries.
http://www.mysqlperformanceblog.com/2012/11/23/full-table-scan-vs-full-index-scan-performance/
Imagine we have a table as follows,
+----+---------+--------+
| id | Name | Bunnies|
+----+---------+--------+
| 1 | England | 1000 |
| 2 | Russia | 1000 |
+----+---------+--------+
And we have multiple users removing bunnies, for a specified period, such as 2 hours. (So minimum 0 bunnies, max 1000 bunnies, bunnies are returned, not added by users)
I'm using two basic transaction queries like
BEGIN;
UPDATE `BunnyTracker` SET `Bunnies`=`Bunnies`+1 where `id`=1;
COMMIT;
When someone returns a bunny and,
BEGIN;
UPDATE `BunnyTracker` SET `Bunnies`=`Bunnies`-1 where `id`=1 AND `Bunnies` > 0;
COMMIT;
When someone attempts to take a bunny. I'm assuming those queries will implement some sort of atomicity under the hood
It's imperative that users cannot take more bunnies than each country has, (ie. -23 bunnies if 23 users transact concurrently)
My issue is, how do I maintain ACID safety in this case, while being able to concurrently add/increment/decrement the bunnies field, while staying within the bounds (0-1000)
I could set the isolation level to serialized, but I'm worried that would kill performance.
Any tips?
Thanks in advance
I believe you need to implement some additional logic to prevent concurrent increment and decrement transactions from both reading the same initial value.
As it stands, if Bunnies = 1, you could have simultaneous increment and decrement transactions that both read the initial value of 1. If the increment then completes first, its results will be ignored, since the decrement has already read the initial value of 1 and will decrement the value to 0. Whichever of these operations completes last would effectively cancel the other operation.
To resolve this issue, you need to implement a locking read using SELECT ... FOR UPDATE, as
described here. For example:
BEGIN;
SELECT `Bunnies` FROM `BunnyTracker` where `id`=1 FOR UPDATE;
UPDATE `BunnyTracker` SET `Bunnies`=`Bunnies`+1 where `id`=1;
COMMIT;
Although it looks to the users like multiple transactions occur simultaneously within the DB they are actually sequential. (E.g. entries get written to the redo/transaction logs one at a time).
Would it therefore work for you to put a constraint on the table "bunnies >= 0" and catch the failure of a transaction which attempts to breach that constraint?