mysql isolation levels - mysql

I'm a bit confused by the documentation here. I have a transaction, which
start transaction
does some updates
does some selects
does some more updates
commit
I want my selects at step 3 to see the results of updates in step 2 but I want to be able to roll back the whole thing.
read committed seems to imply that selects only show data that's been committed, and repeatable read seems to imply that all subsequent selects will see the same data as existed at the time of the 1st select - thus ignoring my updates. read uncommitted seems to do the right thing, but: "but a possible earlier version of a row might be used" -- this is also not acceptable, as my selects MUST see the result of my updates.
is serializable really my only hope here?
I'm working off the documentation here

Transaction isolation levels describe only the interaction between concurrent transactions. With any isolation level, stuff that you have updated within the same transaction will be updated when you re-select them from that transaction.
The right isolation level in your case seems to be read commited, so you can rollback at any point and uncommited data is not visible in other transactions.

Related

Django MySQL REPEATABLE READ "data loss"

I'm looking for information about what is behind the entry in Django 2.0 release notes. Stating:
MySQL’s default isolation level, repeatable read, may cause data loss in typical Django usage. To prevent that and for consistency with other databases, the default isolation level is now read committed. You can use the DATABASES setting to use a different isolation level, if needed.
As I understand repeatable read is "stricter" than read commited, so what is Django doing to produce "data loss" is a question bugging me for sometime now.
Is it stuff like prefetch_related? Or maybe in general making an UPDATE based on potentially stale (SELECTED earlier in a thread) is or can be considered data loos? Or even better - maybe there's something that only MySQL does or has a bug that makes it dangerous on repeatable read?
Thank you.
There are actually two issues with REPEATABLE READ worth noting here.
One is that Django's internal code is written with the expectation that transactions obey the READ COMMITTED semantics. The fact that REPEATABLE READ is actually a higher isolation level doesn't matter; the point is that it violates the expectations of Django's code, leading to incorrect behavior. (In fact, adding a setting to change the isolation level was initially resisted because "it would imply that Django works correctly under any isolation level, and I don't think that's true".)
A straightforward example (first noted in the issue tracker 9 years ago) is the behavior of get_or_create(). It works by first trying to read the row; then, if that fails, trying to create it. If that creation operation fails it's presumably because some other transaction has created the row in the meantime. Therefore, it tries again to read and return it. That works as expected in READ COMMITTED, but in REPEATABLE READ that final read won't find anything because the read must return the same results (none) that it found the first time.
That's not data loss, though. The second issue is specific to MySQL and the non-standard way it defines the behavior of REPEATABLE READ. Roughly speaking, reads will behave as in REPEATABLE READ but writes will behave as in READ COMMITTED, and that combination can lead to data loss. This is best demonstrated with an example, so let me quote this one, provided by core contributor Shai Berger:
(1) BEGIN TRANSACTION
(2) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned)
(3) (some other transactions commit)
(4) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)
(5) DELETE some_table WHERE some_field=some_value
(answer: 1 row deleted)
(6) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)
(7) COMMIT
(the row that was returned earlier is no longer in the database)
Take a minute to read this. Up to step (5), everything is as you would expect;
you should find steps (6) and (7) quite surprising.
This happens because the other transactions in (3) deleted the row that is
returned in (2), (4) & (6), and inserted another one where
some_field=some_value; that other row is the row that was deleted in (5). The
row that this transaction selects was not seen by the DELETE, and hence not
changed by it, and hence continues to be visible by the SELECTs in our
transaction. But when we commit, the row (which has been deleted) no longer
exists.
The tracker issue that led to this change in the default isolation level gives additional detail on the problem, and it links to other discussions and issues if you want to read more.

InnoDB MySQL Select Query Locking

I have an isolation level of Repeatable Read and I am making a:
Select * From examplequery. I read in https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html that select...from queries use consistent reads from snapshot and therefore set no locks on rows or table. Does that mean, an update, insert, or delete initiated after the select but before the select query ends would still be able to run even though the modification won't show up in the select results?
Yes, you can update/insert/delete while an existing transaction holds a repeatable-read snapshot on the data.
This is implemented by Multi-Version Concurrency Control or MVCC.
It's a fancy way of saying that the RDBMS keeps multiple versions of the same row(s), so that repeatable-read snapshots can continue reading the older version as long as they need to (that is, as long as their transaction snapshot exists).
If a row version exists that was created by a transaction that committed after your transaction started, you shouldn't be able to see that row version. Every row version internally keeps some metadata about the transaction that created it, and every transaction knows how to use this to determine if it should see the row version or not.
Eventually, all transactions that may be interested in the old row versions finish, and the MVCC can "clean up" the obsolete row versions.
Basically, yes, this is the case, with some complication.
By default, in repeatable read a select ... from ... does not place any locks on the underlying data and establishes a snapshot.
If another transaction changes the underlying data, then these changes are not reflected if the same records are selected again in the scope of the first transaction. So far so good.
However, if your first transaction modifies records that were affected by other committed transactions after the snapshot was established, then those modifications done by other transactions will be also become visible to the 1st transaction, so your snapshot may not be that consistent after all.
See the 1st notes section in Consistent Nonlocking Reads chapter of MySQL manual on further details of this feature.

What transactions does commit / rollback affect?

Does it only affect whatever commands were after the relevant BEGIN transaction?
For example:
BEGIN TRAN
UPDATE orders SET orderdate = '01-08-2013' WHERE orderno > '999'
Now, assume someone else performs a data import that inserts 10,000 new records into another table.
If I subsequently issue a ROLLBACK command, do those records get discarded or is it just the command above that gets rolled back?
Sorry if this a stupid question, I'm only just starting to use COMMIT and ROLLBACK.
Any transaction is confined to the connection it was opened on.
One of the four ACID properties of any relational database management system is Isolation. That means your actions are isolated from other connections and vice versa until you commit. Any change you do is invisible to other connections and if you roll it back they will never know it happened. That means in turn that changes that happened somewhere else are invisible to you until they are committed. Particularly that means that you can't ROLLBACK anyone else's changes.
The Isolation is achieved in one of two ways. One way is to "lock" the resource (e.g. the row). If that happens any other connection trying to read from that row has to wait until you finish your transaction.
The other way is to create a copy of the row that contains the old values. In this case all other connections will see the old version until you commit your transaction.
SQL Server can use both isolation methods. Which one is used depends on the isolation level you choose. The two Snapshot Isolation Levels provide the "copy method" the other four use the "lock method". The default isolation level of "read committed" is one of the "lock method" isolation levels.
Be aware however that the isolation level "READ UNCOMMITTED" basically circumvents these mechanisms and allows you to read changes that others started and have not yet committed. This is a special isolation level that can be helpful when diagnosing a problem but should be avoided in production code.

do database transactions prevent other users from interfering with it

Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.

MySQL Isolation levels, Measuring their impact on deadlocks

I'm trying to generate a few graphs using the sysbench benchmark (default configuration) trying to show the relationship between deadlocks and isolation level in MySQL.
But I get some strage results: I was under the impression that repeatable read would have more deadlocks than read committed (which is the case), but significantly more than read uncommitted. In fact it turns out that read uncommitted has more deadlocks than either.
Is this normal? And if so, why?
Deadlock can happen in any isolation level. It's hard to tell without the actual tests, but I guess that in case of read commited / repeatable read, if you have to read a value of a row being updated, the value is being read from the rollback log, but in case of read uncommited rollback log is not used, so it the row is locked for update, the read has to wait for the actual value to be written. But it's a wild guess, having more deadlocks in read uncommited is a strange behaviour and most likely - implementation dependent. I would be interested if you could provide the actual tests, and if the test can be repeated in different versions of MySQL.