Building on the query in this answer (please note/assume that the GROUP_CONCATs are now also held in user defined variables), when and what will InnoDB lock on table1?
I'd prefer that it only lock the table1 row that it's currently updating and release it upon starting on the next row.
I'd also prefer that when it locks table2 (or its' rows) that SELECTs will at least be able to read it.
The column being updated is not PK or even indexed.
How can this be achieved, or is it already doing that?
This is in a TRIGGER.
Many thanks in advance!
The lock is held for the entire transaction (as the operation is atomic, this means that either all of the rows are updated or no rows) and you can't change that (without changing the storage engine). However it does not block reads (unless you are in SEIALIZABLE isolation level), so SELECT queries will be executed, but they will read the old values. Only SELECT FOR UPDATE and SELECT...LOCK IN SHARE MODE will be blocked by an update.
Related
I am facing a problem, and I am trying to wrap my head around the isolation levels. To understand these isolation levels, I've read the documentation found on the MariaDB website.
The base isolation level used by InnoDB tables is stated to be REPEATABLE_READ.
Consider the following problem. I have the following two tables structure:
/** tableA **/
id INT AUTO_INCREMENT PRIMARY_KEY
/** tableB **/
id INT
claimedBy INT NULLABLE
and also have a function, which pseudocode looks like this
/** should create a new row in "tableA", and update the rows from "tableB" which ids match the ones in the array from the input parameter, and that are null, to the id of this new row - in case the number of updated rows does not match the length of the input array, it should roll back everything **/
claim(array what) {
- starts transaction
- inserts a row into "tableA" and retrieve's it's id, storing it inside "variableA"
- updates "claimedBy" to "variableA" on all rows from "tableB" that have "claimedBy" set to null and have "id" that is in "what"
- counts the number of rows from "tableB", where "claimedBy" equals to "variableA"
- if the count does not match the length of the "what" parameter, rolls back the transaction
- if the count matches, commits the transaction
}
My questions, which would help me understand isolation levels more concretly are the following:
In case two separate calls are made concurrently to this function which both have "what" arrays that intersect at any point, if I understand correctly, REPEATABLE_READ would prevent my data to become corrupted, because all the rows will be locked in the table as soon as the first update begins to perform, thus whichever function calls update is executed second, will be completely rolled back. Am I right in this? Based on the example on the official documentation it would seem like that rows are checked for the where condition and locked one-by-one. Is this the case? If yes, is it possible, that on concurrent calls to the function, both queries get rolled back? Or worse, is it possible that a deadlock would occur here?
In this concrete example, I could safely decrease the isolation level for the transaction to READ_COMMITED, which would also prevent the data corruption, but would not retain the locks for the duration of the update for rows that are not affected by the update, am I correct in this?
The lock retaining for manual TRANSACTIONS in MariaDB are for the duration of the query operation that create these locks, or for the duration of the complete transaction operation? (ie, until the transaction is either rolled back or commitd?)
FOLLOWUP QUESTION
Am I mistaken, that if using READ_COMMITED isolation, the following two concurrent calls could execute at the same time (without one, waiting for the lock of the other to be released), but not if REPEATABLE_READ isolation was used?
/** Session #1 **/
claim(array(1,2,3));
/** Session #2 **/
claim(array(4,5,6));
There's very little difference between REPEATABLE-READ and READ-COMMITTED in the scenario you describe.
The same locks are acquired in both cases. Locks are always held until the end of the transaction.
REPEATABLE-READ queries may also acquire gap locks to prevent new rows inserted, if those rows would change the result of some SELECT query. The MySQL manual explains gap locks better, and it works the same way in MariaDB: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks Regardless, I don't think this will be an issue.
I don't think you're at risk of a deadlock in the scenario you describe. Your UPDATE should lock all the rows examined. Rows are not locked one by one; the lock request is atomic. That is, if any of the set of examined rows cannot be locked because another session already has some or all of them locked, then the new lock request waits.
Once your UPDATE succeeds (locks are acquired and then the rows are updated), then your session has them locked and keeps them locked until the end of the transaction. Subsequently doing a count would reference only locked rows, so there's no way another session could slip in and cause a deadlock.
One subtle point about locking that you may not notice in the documentation: locking SQL statements act as if they are run in READ-COMMITTED mode, even if your transaction is REPEATABLE-READ. In other words, locks are acquired on the most recently committed version of a row, even if a non-locking SELECT query would not read the most recent version of that row. This is surprising to some programmers.
Re your comments:
I wrote a demo of the locking/nonlocking odd behavior in my answer here: How MVCC works with Lock in MySql?
Regarding releasing locks, yes, that's correct, in READ-COMMITTED mode, a lock is released if your UPDATE doesn't make any net change to the row. That is, if your update sets a column to the value that it already had. But in your case, you are changing values on rows that match your conditions. You specifically select for rows where the claimedBy is NULL, and you are setting that column to a non-NULL value.
Regarding your followup question, you don't have an index on the claimedBy column, so your query will have to at least examine all the rows. In READ-COMMITTED mode, it will be able to release the lock pretty promptly on rows that don't match the search conditions. But it would be better to have an index on claimedBy so it is able to examine only rows that match the condition. I would think it's better (if only by a slight margin) to avoid locking extra rows, instead of locking them and releasing the locks.
I don't think that transaction isolation is such an important factor in performance optimization. Choosing indexes to narrow down the set of examined rows is a much better strategy in most cases.
I have requirement where we need to update the row without holding the lock for the while updating.
Here is the details of the requirements, we will be running a batch processing on a table every 5 mins update blogs set is_visible=1 where some conditions this query as to run on millions of records so we don't want to block all the rows for write during updates.
I totally understand the implications of not having write locks which is fine for us because is_visible column will be updated only by this batch process no other thread wil update this column. On the other hand there will be lot of updates to other columns of the same table which we don't want to block
First of all, if you default on the InnoDB storage engine of MySQL, then there is no way you can update data without row locks except setting the transaction isolation level down to READ UNCOMMITTED by running
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
However, I don't think the database behavior is what you expect since the dirty read is allowed in this case. READ UNCOMMITTED is rarely useful in practice.
To complement the answer from #Tim, it is indeed a good idea to have a unique index on the column used in the where clause. However, please note as well that there is no absolute guarantee that the optimizer will eventually choose such execution plan using the index created. It may work or not work, depending on the case.
For your case, what you could do is to split the long transaction into multiple short transactions. Instead of updating millions of rows in one shot, scanning only thousands of rows each time would be better. The X locks are released when each short transaction commits or rollbacks, giving the concurrent updates the opportunity to go ahead.
By the way, I assume that your batch has lower priority than the other online processes, thus it could be scheduled out of peak hours to further minimize the impact.
P.S. The IX lock is not on the record itself, but attached to the higher-granularity table object. And even with REPEATABLE READ transaction isolation level, there is no gap lock when the query uses a unique index.
Best practice is to always acquire a specific lock when there is a chance that an update could happen concurrently with other transactions. If your storage engine be MyISAM, then MySQL will lock the entire table during an update, and there isn't much you can do about that. If the storage engine be InnoDB, then it is possible that MySQL would only put an exclusive IX lock on the records targeted by the update, but there are caveats to this being the case. The first thing you would do to try to achieve this would be a SELECT ... FOR UPDATE:
SELECT * FROM blogs WHERE <some conditions> FOR UPDATE;
In order to ensure that InnoDB only locks the records being updated, there needs to be a unique index on the column which appears in the WHERE clause. In the case of your query, assuming id were the column involved, it would have to be a primary key, or else you would need to create a unique index:
CREATE UNIQUE INDEX idx ON blogs (id);
Even with such an index, InnoDB may still apply gap locks on the records in between index values, to ensure that the REPEATABLE READ contract is enforced.
So, you may add an index on the column(s) involved in your WHERE clause to optimize the update on InnoDB.
I'm studying about MySQL and how it works, and something confuses me and I don't find any clear explanation on the web about this.
What exactly is the difference between row and table locks? One locks the row and the other locks the table. Correct?
So, in which sort of situations would you use a table lock and row lock? Is it something the programmer or database manager can program in or it is the enigne that does it for you?
If there is any other information you think is good to know, feel free to add that to your answer.
I'm sorry for this possible noobish question, but I'm still learning.
While this is SQL server, it applies well to mySQL as well: What are row, page and table locks? And when they are acquired?.
MySQL docs shows this:
Generally, table locks are superior to row-level locks in the following cases:
Most statements for the table are reads.
Statements for the table are a mix of reads and writes, where writes are updates or deletes for a single row that can be fetched with one key read:
SELECT combined with concurrent INSERT statements, and very few UPDATE or DELETE statements.
Many scans or GROUP BY operations on the entire table without any writers.
Now when to use: The infamous "It depends" applies here:
Ask yourself what is the use case for this transaction?
Typically row level locking will be used when high granular control is needed. In my opinion this should be used as the default. Say a orders or orders detail table where the order could be updated or deleted. Locking the whole table on a high transaction volume table makes no sense. I want users of individual orders to be able to update each order and not lock someone else out when I know the scope of their change is a limited to a specific order.
Now if I needed to restore the orders and details table from backup for some reason; or make many updates to many records based on an external source; I may lock the whole table to ensure all the updates complete successfully and I can verify the load before I let anyone back in. I don't want any changes while I'm making the needed updates. But we have to consider if locking the whole table will negatively impact user experience; or if we have no other options available. Locking at the table level will prevent other users from changing any value. IS this really what we want?
Assume a MySQL table called, say, results. results is automatically updated via cron every day, around 11AM. However, results is also updated from a user-facing front-end, and around 11AM, there are a lot of users performing actions that also update the results table. What this means is that the automatic cron and the user updates often fail with 'deadlock' errors.
Our current solution:
We have implemented a try/catch that will repeat the attempt 10 times before moving on the next row. I do not like this solution at all because, well, it isn't a solution, just a workaround, and a faulty one at that. There's still no guarantee that the update will work at all if the deadlock persists through 10 attempts, and the execution time is potentially multiplied by 10 (not as much of an issue on the cron side, but definitely on the user side).
Another change we are about to implement is moving the cron to a different time of day, so as to not have the automatic update running at the same time as heavy platform usage. This should alleviate much of the problems for now, however I still don't like it, as it is still just a workaround. If the usage patterns of our users changes and the platform sees heavy use during that period, then we'll encounter the same issue again.
Is there a solution, either technical (code) or architectural (database design) that can help me alleviate or eliminate altogether these deadlock errors?
Deadlocks happen when you have one transaction that is acquiring locks on multiple rows in a non-atomic fashion, i.e. updates row A, then a split-second later it updates row B.
But there's a chance other sessions can split in between these updates and lock row B first, then try to lock row A. It can't lock row A, because the first session has got it locked. And now the first session won't give up its lock on row A, because it's waiting on row B, which the second session has locked.
Solutions:
All sessions must lock rows in the same order. So either session 1 or 2 will lock row A, the other will wait for row A. Only after locking row A does any session proceed to request a lock for row B. If all sessions are locking rows in ascending order, then they will never deadlock (descending order works just as well, the point is that all sessions must do the same).
Make one atomic lock-acquiring operation per transaction. Then you can't get this kind of interleaving effect.
Use pessimistic locking. That is, lock all resources the session might need to update in one atomic lock request at the beginning of its work. One example of doing this broadly is the LOCK TABLES statement. But this is usually considered a hinderance to concurrent access to the tables.
You might like my presentation InnoDB Locking Explained with Stick Figures. The section on deadlocks starts on slide 68.
I want to add explicit lock on row which is currently being updated and I also want to remove the same lock explicitly, after updating that row in mysql.
I know there is inbuilt locking system of mysql but I want add it explicitly as well.
You could of course issue a
SELECT .. FOR UPDATE
statement before the actual update. To release the lock again, commit the transaction. Read about locking reads here. But according to that documentation, that would do the same as simply issuing the UPDATE statement itself:
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.