Here's what I want to do:
try to lock a few documents
if any lock fails:
unlock any docs we've locked so far
wait until all locks are cleared
retry from beginning
do something
However, I don't know how to efficiently wait until all locks are cleared. I can't find anything in the couchbase docs, other than using an "infinite" loop that checks if the lock attempt returns a temporary failure error. Is there any good way to wait the correct amount of time before retrying? (plus some random time to avoid repeated conflicts)
Waiting 15+ seconds before retrying isn't really that user-friendly.
There is not directly.
Two possible optimizations could make it a bit more efficient though.
One is that you could, at the app level, have a scheme whereby if you can't acquire the lock you maintain a record with waiters on that lock so only the topmost one or two are trying to acquire the lock.
Two is that you probably should use an exponential backoff retry to efficiently retry on the lock. If you really want to get sophisticated, you could build and train a model around a negative exponential on when the lock is most likely available. This is where you'd poll more frequently as the "deadline" approaches and the math for that is described with a negative exponential function.
Related
I'm working on an application that sees thousands of basically simultaneous login attempts. These login attempts depend on ADFS metadata that has to be refreshed from time to time for different groups of users. While I am building an automatic refresher when it gets to T-12hours until the refresh is required, I also want to handle the situation when the automatic refresh fails to occur. What I want is for the first login attempt that fails due to out-of-date metadata to trigger a refresh, but only the first, otherwise we'll get thousands of unnecessary requests to an ADFS server.
Since the ADFS metadata is stored in a MySQL table anyway, I thought using the InnoDB locking mechanism to handle this. If a login request fails due to out-of-date metadata, it will request a lock on the row holding the relevant metadata. If the lock is granted, it will check the refresh date on the metadata, and if that is out-of-date, it will trigger a refresh of the metadata and then write the new metadata to that row.
All subsequent logins that fail due to old metadata will also request their own locks, which will not be granted because the first request was granted a lock. As soon as the first request finishes updating the metadata, it will release the lock, and the next lock will be granted. That request will check the refresh date, see that it does not need to be refreshed, and continue as normal with the new metadata.
My question is, can MySQL/InnoDB handle, say, 10,000 transactions waiting for a lock on a single row? If there is a limit, can the limit be changed?
Consider using GET_LOCK() instead.
You can have at most one transaction active per user thread, therefore only max_connections transactions can exist at a given moment. Each transaction is single-threaded, so it can run only one SQL query at a time. This places an upper limit on the number of lock-waits.
Granted, each SQL query might be waiting for many locks. So there's a bit of memory resource needed to keep track of each lock, and this is a finite resource. InnoDB uses a portion of the buffer pool for a data structure it calls the "lock table." This is not a table you can access using SQL, but it's using the same internal API used for storing tables.
https://www.percona.com/blog/2006/07/13/how-much-memory-innodb-locks-really-take/ says:
So we locked over 100K rows using about 44KB. This is still quite efficient using less than 4 bits per locked row.
https://blog.pythian.com/innodb-error-total-number-of-locks-exceeds-the-lock-table-size/ says:
The locks table size for the 677 million rows was 327M according to the InnoDB status.
Speaking for my own experience, I have never had the occasion to get an error about this, but it is possible if you have a small buffer pool size.
I would say if you have that much locking queueing up, you will probably notice other problems before InnoDB runs out of memory for the lock table. For example, all your clients will seem to be waiting for queries which cannot proceed.
Assume a MySQL table called, say, results. results is automatically updated via cron every day, around 11AM. However, results is also updated from a user-facing front-end, and around 11AM, there are a lot of users performing actions that also update the results table. What this means is that the automatic cron and the user updates often fail with 'deadlock' errors.
Our current solution:
We have implemented a try/catch that will repeat the attempt 10 times before moving on the next row. I do not like this solution at all because, well, it isn't a solution, just a workaround, and a faulty one at that. There's still no guarantee that the update will work at all if the deadlock persists through 10 attempts, and the execution time is potentially multiplied by 10 (not as much of an issue on the cron side, but definitely on the user side).
Another change we are about to implement is moving the cron to a different time of day, so as to not have the automatic update running at the same time as heavy platform usage. This should alleviate much of the problems for now, however I still don't like it, as it is still just a workaround. If the usage patterns of our users changes and the platform sees heavy use during that period, then we'll encounter the same issue again.
Is there a solution, either technical (code) or architectural (database design) that can help me alleviate or eliminate altogether these deadlock errors?
Deadlocks happen when you have one transaction that is acquiring locks on multiple rows in a non-atomic fashion, i.e. updates row A, then a split-second later it updates row B.
But there's a chance other sessions can split in between these updates and lock row B first, then try to lock row A. It can't lock row A, because the first session has got it locked. And now the first session won't give up its lock on row A, because it's waiting on row B, which the second session has locked.
Solutions:
All sessions must lock rows in the same order. So either session 1 or 2 will lock row A, the other will wait for row A. Only after locking row A does any session proceed to request a lock for row B. If all sessions are locking rows in ascending order, then they will never deadlock (descending order works just as well, the point is that all sessions must do the same).
Make one atomic lock-acquiring operation per transaction. Then you can't get this kind of interleaving effect.
Use pessimistic locking. That is, lock all resources the session might need to update in one atomic lock request at the beginning of its work. One example of doing this broadly is the LOCK TABLES statement. But this is usually considered a hinderance to concurrent access to the tables.
You might like my presentation InnoDB Locking Explained with Stick Figures. The section on deadlocks starts on slide 68.
What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.
I am trying to understand how transactions work with locks in sql and watched this great tutorial on youtube.
I don't understand what happens though if 2 different transactions get a shared lock on the same object, and then one or both of them then want to upgrade to an exclusive lock on the object to write to it.
What would happen on the example at 1:12:55 in the video?
Would it be classed as a deadlock and would one of the transactions have to roll back? Surely one of them can't wait at the point of wanting the exclusive lock and remove it's shared lock, and then continue when the first transaction finishes, because locks cannot be removed until the end of the transaction can they?
Cheers.
The transaction wanting the exclusive lock would have to wait until all other shared locks were removed before it can take effect. In this case, it would wait for the other transaction to release its shared lock, after which it would obtain an exclusive lock.
Thus, if the other transaction were held up by the initial transaction for some other reason, this would result in a deadlock. Because of this, a good DB design should try to ensure that the initial transaction would not block anything while waiting for an exclusive lock.
On certain occasions, when several back-end process happen to run at the same time (queue management is something else, I can solve it like that, but this is not the question here),
I get General error: 1205 Lock wait timeout exceeded; try restarting transaction ROLLING BACK
The process which has less priority is the one that locks the table, due to the fact that it started a few minutes before the high priority one.
How do I give priority to a query over an already running process?
Hope it was clear enough.
Once a query has begun execution it cannot be paused/interrupted. The only exception to this is at the DB administration level where you could essentially force the query to stop (think of it as killing a running process in windows if you will). However you don't want to do that, so forget it.
Your best option would be to use a LOW PRIORITY chunked operation. Basically what that means is if the query on the LOW PRIORITY is taking too long to execute, think about ways in which you could split it up to be quicker without creating orphaned data or illegal data in the database.
A very basic use case would be imagine an insert that inserts 10,000 new rows. By "chunking" the insert so that it runs the insert multiple times with smaller data sets (i.e. 500 at a time), each one will complete more quickly, and therefore allow any non-LOW PRIORITY operations to be executed in a more timely manner.
How To
Setting something as low priority is as simple as adding in the LOW_PRIORITY flag.
INSERT LOW_PRIORITY INTO xxx(a,b,c,) VALUES()
UPDATE LOW_PRIORITY xxx SET a=b
DELETE LOW_PRIORITY FROM xxx WHERE a="value"