Mysql InnoDB - Locking scenario - mysql

I am a developer and have only fair knowledge about databases. I need to understand the transaction level locking mechanism in InnoDB.
I read that InnoDB uses row level locking? As far as I understand, it locks down a particular row within a transaction. What will happen to a select statement when a table update is going on ?
For Example, assume there is transaction and a select statement both triggered from two different processes and assume Transaction1 starts before the select statement is issued.
Transaction1 : Start
Update table_x set x = y where 1=1
Transaction1 : End
Select Query
Select x from table_x
What will happen to the select statement. Will it return values "during" Transaction1 takes place or "after" it completes? And if it can begin only after Transaction1 ends, where is Row level locking in this picture?
Am I making sense or my fundamental understanding itself is wrong? Please advise.

It depends on the Isolation level.
SERIALIZABLE
REPEATABLE READS
READ COMMITTED
READ UNCOMMITTED
Good explained on wikipedia
And the mySQL docu

It does not depend only on the locking involved, but on the isolation level, which uses locking to provide the transaction isolation as defined by ACID standards. InnoDB uses not only locking, but also multiversioning of the rows to speed up transactions.
In serializable isolation level it would use read-lock with the update, so the select will have to wait for first transaction to be completed. On lower isolation levels however the lock will be write, and selects won't be blocked. In repeatable read and read committed it will scan the rollback log to get the previous value of the record, if it is updated, and in read uncommitted in will return the current value.
The difference between table-level locking and row-level locking is when you have 2 transactions that run update query. In table-level locking, the 2nd will have to wait the first one, as the whole table is locked. In row-level locking, only the rows that match the where clause* (as well as some gaps between them, but this is another topic) will be locked, which means that different transactions can update different parts of the table without need to wait for each other.
*assuming that there is index covering the where clause

The select will not wait for the transaction to complete, instead it will return the current value of the rows (aka, before the transaction started).
If you want the select to wait for the transaction to finish you can use "LOCK IN SHARE MODE":
Select x from table_x LOCK IN SHARE MODE;
This will cause the select to wait for any row(s) that are currently lock by a transaction holding an exclusive (update/delete) lock on them.
A read performed with LOCK IN SHARE MODE reads the latest available
data and sets a shared mode lock on the rows read. A shared mode lock
prevents others from updating or deleting the row read. Also, if the
latest data belongs to a yet uncommitted transaction of another
session, we wait until that transaction ends.
http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html

SELECT started from outside of a transaction will see the table as it was before transaction started. It will see updated values only after transsaction is commited.

Related

How Pessimistic lock works in database,does Isolation level has to do any thing with it?

I was reading about database locking(pessimistic,optimistic) mechanism,
session 1:
t1: open transaction:
t2: sleep(3 sec)
t5: updte user set name='x' where id =1
session 2:
t2:update user set name='y' where id=1
my doubts are:
1. What will happen at t5
2. does It has to do any thing with Isolation level?if yes what will be the behavior in different isolation level.
3. Does database(mysql,oracle) only do pessimistic locking?
Let me answer your questions in a reverse order bacause this way I do not have to repeat certain parts.
Since optimistic locking means that the records read in a transaction are not locked, optimistic locks cannot be implemented. You should not really use the term optimistic lock, use optimistic concurrency control instead. The pessimistic locking strategy is the one that involves database level locks, which are implemented by all rdbms that use transactions - including mysql with innodb.
Mysql does not have any database level support for optimistic concurrency control. This does not mean that other rdbms do not support OCC either. You need to check out their manuals.
Isolation levels do not affect the outcome of the scenario described in the question, since there is no select there, only 2 atomic updates and the field referenced in the where clause is not updated.
Isolation levels mainly influence how data is read by transactions, not how they can update it.
The outcome of the scenario described in the question depends on which session issues the update first and how long that transaction is open. Whichever session executes the update first will make the change and sets an exclusive lock on the index record. The other transaction will not be able to execute the update until the first transaction completes. If the first transaction runs for a long time, then the other one may time out while waiting for the lock to be released.

Locking selects

I use InnoDB engine for all my tables. I know that by default INSERT creates lock for rows that will be inserted, and UPDATE creates lock for rows that it uses (no matter if in set or where clausules). SELECT doesn't lock anything. And nothing locks whole tables.
But what if I did something like that:
SELECT * FROM table INTO OUTFILE '/tmp/file.txt'
If it would last 5min, anything could happen in some other thread. I've read I could use:
SELECT * FROM table INTO OUTFILE '/tmp/file.txt' LOCK IN SHARE MODE;
But then again I couldn't do any SELECT operations on this table, and it sucks.
What's the best approach to do this? Also, I've read that the last query should be used inside a transaction with a rollback instead of a commit - why is that so?
If you want a consistent view of an InnoDB table for a long running SELECT, the best approach is to just ensure that the transaction isolation level for the session is set to REPEATABLE READ when the SELECT is run.
That won't block other threads that attempt to read the same rows. But it might block some threads from obtaining exclusive locks or write intent locks.
https://dev.mysql.com/doc/refman/5.6/en/set-transaction.html
As an addendum, to clarify some of the points OP raises.
"SELECT doesn't lock anything."
It's true that a non-locking SELECT won't obtain row locks. But some special SELECT statements (as pointed out later) that can obtain row locks:
SELECT ... FOR UPDATE
SELECT ... LOCK IN SHARE MODE
And there are meta-data locks, which will block DDL operations on the table (e.g. ALTER TABLE) while the SELECT statement is executing.
"And nothing locks whole tables."
That's not strictly true. A LOCK TABLE statement can obtain a lock on the entire table. And a SELECT ... FOR UPDATE (with no predicates) could (potentially) obtain locks on every row in the table.
"SELECT ... LOCK IN SHARE MODE will block other SELECT statements"
This isn't true. Shared locks will block exclusive locks from other threads. But they won't block other threads from obtaining share locks, and won't block non-locking SELECT statements.
What's the best approach to do this?
To re-iterate the first part of my answer again... just run a non-locking SELECT statement. As long as the transaction isolation level isn't set to READ UNCOMMITED, the SELECT statement will get a consistent view of the rows in the table, from the timepoint at the time the SELECT begins executing.
Also, I've read that the last query should be used inside a transaction with a rollback instead of a commit - why is that so?
This is a curious notion. It has me puzzled. Why would a ROLLBACK preferred over a COMMIT?
As long as no DML changes have been applied, I think the COMMIT and the ROLLBACK would be equivalent. In both cases, all of the locks obtained by the transaction would be released. In terms of the database, I don't think it makes a difference.
This has me thinking this recommendation comes from a preferred pattern on the client side. Maybe there's a notion of following a rule such as "don't commit unless you've applied DML changes". But that's just a guess.
My personal recommendation would be to follow the normative pattern of using a COMMIT to end the transaction. I don't favor using an implicit ROLLBACK. In my personal opinion, a ROLLBACK should be issued when we want to explicitly discard DML changes that have been applied in a transaction. And that's typically due to an exception or error condition.

Do "SELECT ... LOCK IN SHARE MODE" and "SELECT ... FOR UPDATE" have to be inside of a transaction?

I'm reading the documentation for these commands and am confused. The descriptions for the commands mention transactions:
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on any rows that
are read. Other sessions can read the rows, but cannot modify them
until your transaction commits. If any of these rows were changed by
another transaction that has not yet committed, your query waits until
that transaction ends and then uses the latest values.
For index records the search encounters, SELECT ... FOR UPDATE blocks
other sessions from doing SELECT ... LOCK IN SHARE MODE or from
reading in certain transaction isolation levels. Consistent reads will
ignore any locks set on the records that exist in the read view. (Old
versions of a record cannot be locked; they will be reconstructed by
applying undo logs on an in-memory copy of the record.)
But then the examples don't show transactions being used. Running a test command such as select * from users for update; without a transaction doesn't result in any errors (it works). Does this mean transactions don't have to be used with these commands? If so, is there any advantage to putting these commands inside of a transaction?
In InnoDB each query is effectively run in a transaction. If you don't start transaction explicitly (with start transaction or by setting autocommit to off), each transaction is committed after the query run. This means that if you are not in a transaction, the lock acquired with SELECT ... IN SHARE MODE will be released as soon as the query is completed. There is nothing that prevents you from doing this, it just doesn't make much sense to use locks outside of a transaction; as these locks are to guarantee that the value you select won't change until a later query you are going to execute (like if you want to insert/update data in one table based on the values in another)
A transaction ensures that all the commands it contains will either run successfully or rollback.
These types of select statements affect other transactions in other sessions. So basically wrapping these in transactions is only a matter of whether you are selecting the data as part of a larger set of commands.
If you only want to select the data you should either use the shared lock or no lock at all and no need to begin a transaction.

do database transactions prevent other users from interfering with it

Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.

MySQL table locking: holder reads and writes, other sessions only read?

Is it possible to lock a table such that the holder can read and write, and other sessions can only read?
The documentation seems to suggestion that a read lock allows everyone to only read, and a write lock allows only the holder to read and write and other sessions have no access. Seems like having the holder able to read and write and other sessions only able to read would be a pretty frequently needed behavior -- perhaps the most frequently needed behavior.
Maybe the performance hit in implementing this scenario would be too high?
Take a look at LOCK IN SHARE MODE.
This will let you set non blocking read locks.
But remember, this can lead to deadlocks! Make sure you are okay with processes having out of date information.
There are many correct words in existing answers, but no one seems to have given a clear answer. I will try.
As you have already seen in documentation on LOCK TABLES, it can not be used for the purpose, since for the READ lock:
The session that holds the lock can read the table (but not write it).
and for the WRITE lock:
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
That is the effect can hardly be achievable with an arbitrary engine table, but it can be achived with a transactional engine, that is InnoDB.
Let's think about what means that a single session keeps a constant write lock on a table and other tables can read data from the table in terms of transactions. That means that we have an open long living transaction (let it be W transaction) which locks a table for modifications and other transactions (in other sessions) can read data that is already modified, but not yet committed. In terms of isolation levels, that means that we should set up the default isolation level to READ-UNCOMMITTED, so that we would not have to change the isolation level for each new session:
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
But our transaction W, should use a stronger isolation level, otherwise we can not apply any locking to our table. READ-COMMITTED is not strong enough, but REPEATABLE-READ is exactly what we want. That is befor starting a W transaction we should set the transaction level for the current session:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Now, how to lock the whole table. Let's create a table:
CREATE TABLE t (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
val VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
) ENGINE = InnoDB;
LOCK IN SHARE MODE is not what we want:
If any of these rows [that are read] were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
LOCK FOR UPDATE seems to do what we need:
SELECT ... FOR UPDATE locks the rows and any associated index entries.
Now all we need is to lock the rows. The simplest thing we can to is to lock the primary key. COUNT(*) does a full index scan for InnoDB (since InnoDB does not know that exact row count).
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT COUNT(*) FROM t FOR UPDATE;
INSERT INTO t VALUES (NULL, '');
Now you can open other sessions and try to read the data from the table and try to add or modify the existing data from those sessions.
The problem is though, that you should commit the modifications in W, and as soon as you commit the transaction, the lock is released and all waiting inserts or updates are applied as well, even if you commit it with:
COMMIT AND CHAIN; SELECT COUNT(*) FROM ti FOR UPDATE;
The moral of the story is that it is much easier to have two MySQL accounts: a) writing account which has INSERT, UPDATE and DELETE GRANT permissions, and b) reading account which has not.
There is SELECT ... FOR UPDATE, which will lock the rows for other callers that do SELECT ... FOR UPDATE, but will not lock it for anyone doing just SELECT. UPDATEs will wait for the lock, as well.
This is useful when you want to fetch a value and then push an update back without anyone changing the value and you not noticing. Be careful, adding too much of those will get you into a deadlock.
You may find that the InnoDB engine does what you need by default: writes do not block reads. You need to be careful with the transaction isolation level so that writes are available when you want them.