InnoDB Isolation Level for single SELECT query - mysql

I know that every single query sent to MySQL (with InnoDB as engine) is made as a separate transaction. However my concerns is about the default isolation level (Repeatable Read).
My question is: as SELECT query are sent one by one, what is the need to made the transaction in repeatable read ? In this case, InnoDB doesn't add overhead for nothing ?
For instance, in my Web Application, I have lot of single read queries but the accuracy doesn't matter: as an example, I can retreive the number of books at a given time, even if some modifications are being processed, because I precisely know that such number can evolve after my HTTP request.
In this case READ UNCOMMITED seems appropriate. Do I need to turn every similar transaction-with-single-request to such ISOLATION LEVEL or InnoDB handle it automatically?
Thanks.

First of all your question is a part of wider topic re performance tuning. It is hard to answer just like that - knowing only this. But i try to give you at least some overview.
The fact that Repeatable Read is good enough for most database, does not mean it is also best for you! That’s holly true!
BTW, I think only in MySQL this is at this level defaultly. In most database this is at Read Committed (e.g. Oracle). In my opinion it is enough for most cases.
My question is: as SELECT query are sent one by one, what is the need
to made the transaction in repeatable read ?
Basically no need. Repeatable read level ensure you are not allowing for dirty reads, not repeatable reads and phantom rows (but maybe this is a little different story). And basically these are when you run DMLs. So when query only pure SELECTs one by one -this simply does not apply to.
In this case, InnoDB doesn't add overhead for nothing ?
Another yep. It does not do it for nothing. In general ACID model in InnoDB is at cost of having data consistently stored without any doubts about data reliability. And this is not free of charge. It is simply trade off between performance and data consistency and reliability.
In more details MySQL uses special segments to store snapshots and old row values for rollback purposes. And refers to them if necessary. As I said it costs.
But also worth to mention that performance increase/decrease is visible much more when doing INSERT, UPDATE, DELETE. SELECT does not cost so much. But still.
If you do not need to do it, this is theoretically obvious benefit. How big? You need to assess it by yourself, measuring your query performance in your environment.
Cause many depends also on individual incl. scale, how many reads/writes are there, how often, reference application design, database and much, much more .
And with the same problem in different environments the answer could be simply different.
Another alternative here you could consider is to simply change engine to MyISAM (if you do not need any foreign keys for example). From my experience it is very good choice for heavy reads needs. Again all depends- but in many cases is faster than InnoDB. Of course less safer but if you are aware of possible threats - it is good solution.
In this case READ UNCOMMITED seems appropriate. Do I need to turn
every similar transaction-with-single-request to such ISOLATION LEVEL
or InnoDB handle it automatically?
You can set the isolation level globally, for the current session, or for the next transaction.
Set your transaction level globally for next sessions.
SET GLOBAL tx_isolation = 'READ-UNCOMMITTED';
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

Related

How do databases maintain results if another query edits those results

I was just wondering how most relational databases handled maintaining your set of results if another query has edited those rows that you were working on. For instance if I do a select of like 100k rows and while I am still fetching those another query comes in and does an update on 1 of the rows that hasn't been read yet the update is not going to be seen in the fetching of those rows and I was wondering how the database engine handles that. If you only have specifics for one type of database thats fine I would like to hear it anyway.
Please lookup Multi Version Concurrency Control. Different databases have different approaches to managing this. For MySQL, InnoDB, you can try http://dev.mysql.com/doc/refman/5.0/en/innodb-multi-versioning.html. PostgreSQL - https://wiki.postgresql.org/wiki/MVCC. A great presentation here - http://momjian.us/main/writings/pgsql/mvcc.pdf. It is explained in stackoverflow in this thread Database: What is Multiversion Concurrency Control (MVCC) and who supports it?
The general goal you are describing in concurrent programming (Wikipedia concurrency control) is serialization (Wikipedia serializability): an implementation manages the database as if transactions occurred without overlap in some order.
The importance of that is that only then does the system act in a way described by the code as we normally interpret it. Otherwise the results of operations are a combination of all processes acting concurrently. Nevertheless by having limited categories of non-normal non-isolated so-called anomalous behaviours arise transaction throughput can be increased. So those implementation techniques are also apropos. (Eg MVCC.) But understand: such non-serialized behaviour is not isolating one transaction from another. (Ie so-called "isolation" levels are actually non-isolation levels.)
Isolation is managed by breaking transaction implementations into pieces based on reading and writing shared resources and executing them interlaced with pieces from other transactions in such a way that the effect is the same as some sequence of non-overlapped execution. Roughly speaking, one can "pessimistically" "lock" out other processes from changed resources and have them wait or "optimistically" "version" the database and "roll back" (throw away) some processes' work when changes are unreconcilable (unserializable).
Some techniques based on an understanding of serializability by an implementer for a major product are in this answer. For relevant notions and techniques, see the Wikipedia articles or a database textbook. Eg Fundamentals of database systems by Ramez Elmasri & Shamkant B. Navathe. (Many textbooks, slides and courses are free online.)
(Two answers and a comment to your question mention MVCC. As I commented, not only is MVCC just one implementation technique, it doesn't even support transaction serialization, ie actually isolating transactions as if each was done all at once. It allows certain kinds of errors (aka anomalies). It must be mixed with other techniques for isolation. The MVCC answers, comments and upvoting reflects a confusion between a popular and valuable technique for a useful and limited failure to isolate per your question vs the actual core issues and means.)
As Jayadevan notes, the general principle used by most widely used databases that permit you to modify values while they're being read is multi-version concurrency control or MVCC. All widely used modern RDBMS implementations that support reading rows that're being updated rely on some concept of row versioning.
The details differ between implementations. (I'll explain PostgreSQL's a little more here, but you should accept Jayadevan's answer not mine).
PostgreSQL uses transaction ID ranges to track row visibility. So there'll be multiple copies of a tuple in a table, but any given transaction can only "see" one of them. Each transaction has a unique ID, with newer transactions having newer IDs. Each tuple has hidden xmin and xmax fields that track which transactions can "see" the tuple. Insertion is implimented by setting the tuple's xmin so that transactions with lower xids know to ignore the tuple when reading the table. Deletion is implimented by setting the tuple's xmax so that transactions with higher xids know to ignore the tuple when reading the table. Updates are implemented by effectively deleting the old tuple (setting xmax) then inserting a new one (setting xmin) so that old transactions still see the old tuple, but new transactions see the new tuple. When no running transaction can still see a deleted tuple, vacuum marks it as free space to be overwritten.
Oracle uses undo and redo logs, so there's only one copy of the tuple in the main table, and transactions that want to find old versions have to go and find it in the logs. Like PostgreSQL it uses row versioning, though I'm less sure of the details.
Pretty much every other DB uses a similar approach these days. Those that used to rely on locking, like older MS-SQL versions, have moved to MVCC.
MySQL uses MVCC with InnoDB tables, which are now the default. MyISAM tables still rely on table locking (but they'll also eat your data, so don't use them for anything you care about).
A few embedded DBs, like SQLite, still rely only on table locking - which tends to require less wasted disk space and I/O overhead, at the cost of greatly reduced concurrency. Some databases let you bypass MVCC if you take an exclusive lock on a table.
(Marked community wiki, since I also close-voted this question).
You should also read the PostgreSQL docs on transaction isolation and locking, and similar documentation for other DBs you use. See the Wikipedia article on isolation.
Snapshot isolation solves the problem you are describing. If you use locking, you can see the same record twice as the iterator through the database, as the unlocked records change underneath your feet as you're doing the scan.
Read committed isolation level with locking suffers from this problem.
Depending on the granularity of the lock, the WHERE predicate may lock matching pages and tuples for locking so that the running read query doesn't see phantom data appearing (phantom reads)
I implemented multiversion concurrency control in my Java project. A transaction is given a monotonically increasing timestamp which starts at 0 and goes up by 1 each time the transaction is aborted. Later transactions have higher timestamps. When a transaction goes to read, it can only see data that has a timestamp that is less than or equal to itself and is committed for that key (or column of that tuple). (Equal to so it can see its own writes)
When a transaction writes, it updates the committed timestamp for that key to that of that transactions timestamp.

Many Individual Queries v. One Big One

I'm in a situation where an entire column in a table (used for user tokens) needs to be wiped, i.e., all user tokens are reset simultaneously. There are two ways of going about it: reset each user's token individually with a separate UPDATE query; or make one big query that affects all rows.
The advantage of one big query is that it will obviously be much faster, but I'm worried about the implications of a large UPDATE query when the database is big. Will requests that occur during the query be affected?
Afraid it's not that simple. Even if you enable dirty reads, running one big update has a lot of drawbacks:
long running transaction that updates one column will effectively block other insert, update and delete transactions.
long running transaction causes enourmous load on disk because server is having to write to a log file everything that is taking place so that you can roll back that huge transaction.
if a transaction fails, you would have to rerun it entirely, it is not restartable.
So if simultaneous requirement can be interpreted "in one batch that may take a while to run", I would opt for batching it. A good research write up on performance of DELETEs in MySql is here: http://mysql.rjweb.org/doc.php/deletebig, and I think most of the findings are applicable to UPDATE.
The trick will be finding the optimal "batch size".
Added benefits of batching is that you can make this process resilient to failures and restart-friendly.
The answer depends on the transaction and isolation level you've established.
You can set isolation to allow "dirty reads", "phantom reads", or force serialization of reads and writes.
However you do that UPDATE, you'll want it to be a single unit of work.
I'd recommend minimizing network latency and updating all the user tokens in one network roundtrip. This means either writing a single query or batching many into one request.

Which isolation level to use in a basic MySQL project?

Well, I got an assignment [mini-project] in which one of the most important issues is the database consistency.
The project is a web application, which allows multiple users to access and work with it. I can expect concurrent querying and updating requests into a small set of tables, some of them connected one to the other (using FOREIGN KEYS).
In order to keep the database as consistent as possible, we were advised to use isolation levels. After reading a bit (maybe not enough?) about them, I figured the most useful ones for me are READ COMMITTED and SERIALIZABLE.
I can divide the queries into three kinds:
Fetching query
Updating query
Combo
For the first one, I need the data to be consistent of course, I don't want to present dirty data, or uncommitted data, etc. Therefore, I thought to use READ COMMITTED for these queries.
For the updating query, I thought using SERIALIZABLE will be the best option, but after reading a bit, i found myself lost.
In the combo, I'll probably have to read from the DB, and decide whether I need/can update or not, these 2-3 calls will be under the same transaction.
Wanted to ask for some advice in which isolation level to use in each of these query options. Should I even consider different isolation levels for each type? or just stick to one?
I'm using MySQL 5.1.53, along with MySQL JDBC 3.1.14 driver (Requirements... Didn't choose the JDBC version)
Your insights are much appreciated!
Edit:
I've decided I'll be using REPEATABLE READ which seems like the default level.
I'm not sure if it's the right way to do, but I guess REPEATABLE READ along with LOCK IN SHARE MODE and FOR UPDATE to the queries should work fine...
What do you guys think?
I would suggest READ COMMITTED. It seems natural to be able to see other sessions' committed data as soon as they're committed.
Its unclear why MySQL has a default of REPEATABLE READ.
I think you worry too much about the isolation level.
If you have multiple tables to update you need to do:
START TRANSACTION;
UPDATE table1 ....;
UPDATE table2 ....;
UPDATE table3 ....;
COMMIT;
This is the important stuff, the isolation level is just gravy.
The default level of repeatable read will do just fine for you.
Note that select ... for update will lock the table, this can result in deadlocks, which is worse than the problem you may be trying to solve.
Only use this if you are deleting rows in your DB.
To be honest I rarely see rows being deleted in a DB, if you are just doing updates, then just use normal selects.
Anyway see: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html

Locking mySQL tables/rows

can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.

Read changes from within a transaction

Whatever changes made to the MySQL database, are those changes readable within the same transaction? Or should I commit the transaction to read the changes?
I could easily test this. But putting a question in SO brings up a lot of good suggestions. Thanks for any input.
Assuming you're using InnoDB, the answer to your first question is generally yes, implying the answer to your second is generally no.
By default MySQL's InnoDB uses a technique called consistent non-locking reads:
The query sees the changes made by
transactions that committed before
that point of time, and no changes
made by later or uncommitted
transactions. The exception to this
rule is that the query sees the
changes made by earlier statements
within the same transaction.
That being said, there's a lot of stuff to know about transactions. You can change the isolation level of a transaction in order to control the transaction results more thoroughly.
The chapter on the InnoDB Transaction Model is a great place to start.