InnoDB MySQL Select Query Locking - mysql

I have an isolation level of Repeatable Read and I am making a:
Select * From examplequery. I read in https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html that select...from queries use consistent reads from snapshot and therefore set no locks on rows or table. Does that mean, an update, insert, or delete initiated after the select but before the select query ends would still be able to run even though the modification won't show up in the select results?

Yes, you can update/insert/delete while an existing transaction holds a repeatable-read snapshot on the data.
This is implemented by Multi-Version Concurrency Control or MVCC.
It's a fancy way of saying that the RDBMS keeps multiple versions of the same row(s), so that repeatable-read snapshots can continue reading the older version as long as they need to (that is, as long as their transaction snapshot exists).
If a row version exists that was created by a transaction that committed after your transaction started, you shouldn't be able to see that row version. Every row version internally keeps some metadata about the transaction that created it, and every transaction knows how to use this to determine if it should see the row version or not.
Eventually, all transactions that may be interested in the old row versions finish, and the MVCC can "clean up" the obsolete row versions.

Basically, yes, this is the case, with some complication.
By default, in repeatable read a select ... from ... does not place any locks on the underlying data and establishes a snapshot.
If another transaction changes the underlying data, then these changes are not reflected if the same records are selected again in the scope of the first transaction. So far so good.
However, if your first transaction modifies records that were affected by other committed transactions after the snapshot was established, then those modifications done by other transactions will be also become visible to the 1st transaction, so your snapshot may not be that consistent after all.
See the 1st notes section in Consistent Nonlocking Reads chapter of MySQL manual on further details of this feature.

Related

Which version of record will be returned in read committed isolation of MYSQL

I have a scenario where my cluster is in read committed isolation mode and the use case is like below:
A select statement when executed takes around 1 minutes to run the query and get the response back.
During which updates(Committed) to data can happen during this time frame of 1 minute.
So my question is will i get the updated record in the response or the old record??
I read the documentation and it's mentioned any phantom reads are allowed.
I am confused here so just want some clarity, please help.
Using READ COMMITTED has additional effects(Reference MYSQL docs):
For UPDATE or DELETE statements, InnoDB holds locks only for rows
that it updates or deletes. Record locks for nonmatching rows are
released after MySQL has evaluated the WHERE condition. This greatly
reduces the probability of deadlocks, but they can still happen.
For UPDATE statements, if a row is already locked, InnoDB performs a
“semi-consistent” read, returning the latest committed version to
MySQL so that MySQL can determine whether the row matches the WHERE
condition of the UPDATE. If the row matches (must be updated), MySQL
reads the row again and this time InnoDB either locks it or waits
for a lock on it.
There is no way concurrent updates to data can modify a given query while it is executing. It's as if every query runs in its own REPEATABLE READ snapshot, even if your transaction is READ COMMITTED.
It will return rows that had been committed at the time the statement began executing. It will not include any rows committed after the statement began.
Re your comment:
No, there is no transaction isolation level that can change this. Even if you use READ UNCOMMITTED, a given query reads only rows that were committed at the time the query began executing.
If you want to query recent updates, you can only do it by starting a new query.
If you're concerned that you aren't getting notified about recent updates, then you need to optimize your query so it doesn't take 60 seconds to execute.
This is starting to sound like you're polling the database. Running frequent expensive queries to poll a database is an indication that perhaps you need to use a message queue instead.
Re your second comment:
Locking SQL statements, including UPDATE and DELETE and also locking SELECT statements do function like READ COMMITTED even when your transaction is REPEATABLE READ. Locking statements always read the most recent row that was committed at the time the statement started.
But they still cannot read new rows committed after the statement started. If for no other reason than they can't get the locks on those rows.
Your original question was about SELECT statements, and I assumed you meant non-locking SELECT (that is, without the options of FOR UPDATE or LOCK IN SHARE MODE). Those SELECT statements also cannot view rows added after the SELECT started.
P.S. I have never found a good use of READ UNCOMMITTED for any purpose.
By default, INNOBD will lock the tables during processing, but there are ways to do an UNLOCKED SELECT. In that case, it will run on a versioned snapshot of the table, so any COMMIT during the processing won't alter the result.
For more information:
https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
In all cases, the ACID property of databases will always prevent unstable functions: https://en.wikipedia.org/wiki/ACID

Are changes made to DB only through transactions?

I am not able to get a clear complete understanding regarding the role of transactions in databases.
I know operations clubbed in a transactions will be executed together and then either committed or rolled back.
But then what about about any other query that I write to the database without manually creating a transaction.
Is a transaction created internally for them?
Also what about select statements then? Are transactions created for them too?
I have been using database and sql for some time now, alas I am not clear on these
Are changes to DBs happening only through transactions? Short answer is yes.
There is always a transaction involved:
It might be automatically started (before) and commited (after) every single DML statement you issue, if you're relying on AUTOCOMMIT behaviour of your database session
Or you may explictly start one with BEGIN, execute your statements and end it with COMMIT
I like to think a transaction as a boundary that imposes clear semantics of ATOMICITY and ISOLATION to the statements that are contained within.
You describe atomicity (all or nothing behaviour) but that is not the only guarantee that a transaction can give you: there's also isolation (and this has to do with reads you within transactions (E.g. SELECTs).
In a concurrent application (many clients writing and reading to the same db/table at the same time), transaction ISOLATION is the property that defines "how much of the effects of other operations" can be observed in the current one. For example, assume you need to perform a transaction that involves doing the same SELECT multiple times: do you want this SELECT to return (possibly) different results each time (because some modification happened concurrently) or not?
For single statements is:
A single DML (UPDATE, INSERT...) statement by itself is effectively "As if it was in a transaction with a single statement, that gets immediately commited after execution" (Either it works like this because you're in AUTOCOMMIT, or you wrapped a single statement within BEGIN...COMMIT)
For a single SELECT it's the same. The transaction in this case (implicit or not, gives you the possibility of specifiying different isolation levels). It might sound strange to consider transactions for SELECTS, but requiring particular isolation levels might mean that the db is acquiring some lock to the data under the hood: committing the transaction in that case would release such lock.
Since you tagged mysql, here you can read on transaction isolations supported by mysql:
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
A SQL transaction is any statement that contains Data Manipulation Language (DML). That is, any statement that changes values in a table, such as UPDATE, INSERT, MERGE, DELETE, etc.

MySQL InnoDB: Difference Between `FOR UPDATE` and `LOCK IN SHARE MODE`

What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.

do database transactions prevent other users from interfering with it

Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.

Locking mySQL tables/rows

can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.