Repeatable read implementation in MySql server side? - mysql

I understand what Repeatable Read Transaction Isolation Level means.
During a repeatable read transaction, whatever transactions have committed data after my transaction has started won't be seen by my transaction.
However I am having a tough time understanding how is it actually implemented at sql server side. Is it that at the start of every transaction a snapshot of the database is taken and set aside for that particular transaction ?
If that is so, then the amount of memory resource would be huge in case if multiple repeatable read transactions are started at any point of time ?
Also can someone throw light on the role of Shared/Exclusive lock role in repeatable read
?

I seek the same answer for a while, and after some search work, I think it basically implemented by using MVCC(Snapshot Read) + GAP LOCK + NEXT KEY LOCK.
I am not quite sure I understand it correctly, but I think some keyword may help in further search work.
By the way, if anyone understand Chinese very well, here are some good explanation written in Chinese:
http://hedengcheng.com/?p=771
https://www.cnblogs.com/kismetv/p/10331633.html

Related

Django MySQL REPEATABLE READ "data loss"

I'm looking for information about what is behind the entry in Django 2.0 release notes. Stating:
MySQL’s default isolation level, repeatable read, may cause data loss in typical Django usage. To prevent that and for consistency with other databases, the default isolation level is now read committed. You can use the DATABASES setting to use a different isolation level, if needed.
As I understand repeatable read is "stricter" than read commited, so what is Django doing to produce "data loss" is a question bugging me for sometime now.
Is it stuff like prefetch_related? Or maybe in general making an UPDATE based on potentially stale (SELECTED earlier in a thread) is or can be considered data loos? Or even better - maybe there's something that only MySQL does or has a bug that makes it dangerous on repeatable read?
Thank you.
There are actually two issues with REPEATABLE READ worth noting here.
One is that Django's internal code is written with the expectation that transactions obey the READ COMMITTED semantics. The fact that REPEATABLE READ is actually a higher isolation level doesn't matter; the point is that it violates the expectations of Django's code, leading to incorrect behavior. (In fact, adding a setting to change the isolation level was initially resisted because "it would imply that Django works correctly under any isolation level, and I don't think that's true".)
A straightforward example (first noted in the issue tracker 9 years ago) is the behavior of get_or_create(). It works by first trying to read the row; then, if that fails, trying to create it. If that creation operation fails it's presumably because some other transaction has created the row in the meantime. Therefore, it tries again to read and return it. That works as expected in READ COMMITTED, but in REPEATABLE READ that final read won't find anything because the read must return the same results (none) that it found the first time.
That's not data loss, though. The second issue is specific to MySQL and the non-standard way it defines the behavior of REPEATABLE READ. Roughly speaking, reads will behave as in REPEATABLE READ but writes will behave as in READ COMMITTED, and that combination can lead to data loss. This is best demonstrated with an example, so let me quote this one, provided by core contributor Shai Berger:
(1) BEGIN TRANSACTION
(2) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned)
(3) (some other transactions commit)
(4) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)
(5) DELETE some_table WHERE some_field=some_value
(answer: 1 row deleted)
(6) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)
(7) COMMIT
(the row that was returned earlier is no longer in the database)
Take a minute to read this. Up to step (5), everything is as you would expect;
you should find steps (6) and (7) quite surprising.
This happens because the other transactions in (3) deleted the row that is
returned in (2), (4) & (6), and inserted another one where
some_field=some_value; that other row is the row that was deleted in (5). The
row that this transaction selects was not seen by the DELETE, and hence not
changed by it, and hence continues to be visible by the SELECTs in our
transaction. But when we commit, the row (which has been deleted) no longer
exists.
The tracker issue that led to this change in the default isolation level gives additional detail on the problem, and it links to other discussions and issues if you want to read more.

InnoDB Isolation Level for single SELECT query

I know that every single query sent to MySQL (with InnoDB as engine) is made as a separate transaction. However my concerns is about the default isolation level (Repeatable Read).
My question is: as SELECT query are sent one by one, what is the need to made the transaction in repeatable read ? In this case, InnoDB doesn't add overhead for nothing ?
For instance, in my Web Application, I have lot of single read queries but the accuracy doesn't matter: as an example, I can retreive the number of books at a given time, even if some modifications are being processed, because I precisely know that such number can evolve after my HTTP request.
In this case READ UNCOMMITED seems appropriate. Do I need to turn every similar transaction-with-single-request to such ISOLATION LEVEL or InnoDB handle it automatically?
Thanks.
First of all your question is a part of wider topic re performance tuning. It is hard to answer just like that - knowing only this. But i try to give you at least some overview.
The fact that Repeatable Read is good enough for most database, does not mean it is also best for you! That’s holly true!
BTW, I think only in MySQL this is at this level defaultly. In most database this is at Read Committed (e.g. Oracle). In my opinion it is enough for most cases.
My question is: as SELECT query are sent one by one, what is the need
to made the transaction in repeatable read ?
Basically no need. Repeatable read level ensure you are not allowing for dirty reads, not repeatable reads and phantom rows (but maybe this is a little different story). And basically these are when you run DMLs. So when query only pure SELECTs one by one -this simply does not apply to.
In this case, InnoDB doesn't add overhead for nothing ?
Another yep. It does not do it for nothing. In general ACID model in InnoDB is at cost of having data consistently stored without any doubts about data reliability. And this is not free of charge. It is simply trade off between performance and data consistency and reliability.
In more details MySQL uses special segments to store snapshots and old row values for rollback purposes. And refers to them if necessary. As I said it costs.
But also worth to mention that performance increase/decrease is visible much more when doing INSERT, UPDATE, DELETE. SELECT does not cost so much. But still.
If you do not need to do it, this is theoretically obvious benefit. How big? You need to assess it by yourself, measuring your query performance in your environment.
Cause many depends also on individual incl. scale, how many reads/writes are there, how often, reference application design, database and much, much more .
And with the same problem in different environments the answer could be simply different.
Another alternative here you could consider is to simply change engine to MyISAM (if you do not need any foreign keys for example). From my experience it is very good choice for heavy reads needs. Again all depends- but in many cases is faster than InnoDB. Of course less safer but if you are aware of possible threats - it is good solution.
In this case READ UNCOMMITED seems appropriate. Do I need to turn
every similar transaction-with-single-request to such ISOLATION LEVEL
or InnoDB handle it automatically?
You can set the isolation level globally, for the current session, or for the next transaction.
Set your transaction level globally for next sessions.
SET GLOBAL tx_isolation = 'READ-UNCOMMITTED';
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

MySQL Isolation levels, Measuring their impact on deadlocks

I'm trying to generate a few graphs using the sysbench benchmark (default configuration) trying to show the relationship between deadlocks and isolation level in MySQL.
But I get some strage results: I was under the impression that repeatable read would have more deadlocks than read committed (which is the case), but significantly more than read uncommitted. In fact it turns out that read uncommitted has more deadlocks than either.
Is this normal? And if so, why?
Deadlock can happen in any isolation level. It's hard to tell without the actual tests, but I guess that in case of read commited / repeatable read, if you have to read a value of a row being updated, the value is being read from the rollback log, but in case of read uncommited rollback log is not used, so it the row is locked for update, the read has to wait for the actual value to be written. But it's a wild guess, having more deadlocks in read uncommited is a strange behaviour and most likely - implementation dependent. I would be interested if you could provide the actual tests, and if the test can be repeated in different versions of MySQL.

Avoiding deadlock by using NOLOCK hint

Once in a while I get following error in production enviornment which goes away on running the same stored procedure again.
Transaction (Process ID 86) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction
Someone told me that if I use NOLOCK hint in my stored procedures, it will ensure it will never be deadlocked. Is this correct? Are there any better ways of handling this error?
Occasional deadlocks on an RDBMS that locks like SQL Server/Sybase are expected.
You can code on the client to retry as recommended my MSDN "Handling Deadlocks".
Basically, examine the SQLException and maybe a half second later, try again.
Otherwise, you should review your code so that all access to tables are in the same order. Or you can use SET DEADLOCK_PRIORITY to control who becomes a victim.
On MSDN for SQL Server there is "Minimizing Deadlocks" which starts
Although deadlocks cannot be completely avoided
This also mentions "Use a Lower Isolation Level" which I don't like (same as many SQL types here on SO) and is your question. Don't do it is the answer... :-)
What can happen as a result of using (nolock) on every SELECT in SQL Server?
https://dba.stackexchange.com/q/2684/630
Note: MVCC type RDBMS (Oracle, Postgres) don't have this problem. See http://en.wikipedia.org/wiki/ACID#Locking_vs_multiversioning but MVCC has other issues.
While adding NOLOCK can prevent readers and writers from blocking each other (never mind all of the negative side effects it has), it is not a magical fix for deadlocks. Many deadlocks have nothing at all to do with reading data, so applying NOLOCK to your read queries might not cause anything to change at all. Have you run a trace and examined the deadlock graph to see exactly what the deadlock is? This should at least let you know which part of the code to look at. For example, is the stored procedure deadlocking because it is being called by multiple users concurrently, or is it deadlocking with a different piece of code?
Here is a good link on learning to troubleshoot deadlocks. I always try avoid using nolock for the reasons above. Also you might want to better understand Lock Compatibility.

Read changes from within a transaction

Whatever changes made to the MySQL database, are those changes readable within the same transaction? Or should I commit the transaction to read the changes?
I could easily test this. But putting a question in SO brings up a lot of good suggestions. Thanks for any input.
Assuming you're using InnoDB, the answer to your first question is generally yes, implying the answer to your second is generally no.
By default MySQL's InnoDB uses a technique called consistent non-locking reads:
The query sees the changes made by
transactions that committed before
that point of time, and no changes
made by later or uncommitted
transactions. The exception to this
rule is that the query sees the
changes made by earlier statements
within the same transaction.
That being said, there's a lot of stuff to know about transactions. You can change the isolation level of a transaction in order to control the transaction results more thoroughly.
The chapter on the InnoDB Transaction Model is a great place to start.