Simple question.
If I'm using DB::transactions() and I do the following:
DB::transaction(function()
{
$result =
DB::table('orders')->select('id')->where('id', '>', 17)->lockForUpdate()->get();
});
What happens if I execute this script at exactly the same split second?
Laravel says:
Alternatively, you may use the lockForUpdate method. A "for update"
lock prevents the rows from being modified or from being selected with
another shared lock.
Does the lockForUpdate prevent a read from happening at the same time, or does it only come in to affect when doing a following UPDATE to the row?
Can I guarantee if a script is already reading from this row, then a concurrent script at the same millisecond will fail and WAIT for the transaction to release the lock before trying to run the code?
I haven't found a super clear answer anywhere, all examples are trying to update or insert. I just want to guard against a concurrent select.
This is an old thread but as there are no answers, here is my input. I've faced similar situation and after several T&E, finally got to lock the table records that are intended to be modified by only one exclusive transaction at a given time.
Probably not worthy of mention, is your table storage engine set to InnoDB? Because after several failed attempts, I discovered that my table storage engine was MyISAM.
Related
Software: Django 2.1.0, Python 3.7.1, MariaDB 10.3.8, Linux Ubuntu 18LTS
We recently added some load to a new application, and starting observing lots of deadlocks. After a lot digging, I found out that the Django select_for_update query resulted in an SQL with several subqueries (3 or 4). In all deadlocks I've seeen so far, at least one of the transactions involves this SQL with multiple subqueries.
my question is... Does the select_for_udpate lock records from every table involved? In my case, would record from the main SELECT, and from other tables used by subqueries get locked? Or only records from the main SELECT?
From Django docs:
By default, select_for_update() locks all rows that are selected by the query. For example, rows of related objects specified in select_related() are locked in addition to rows of the queryset’s model.
However, I'm not using select_related() , at least I don't put it explicitly.
Summary of my app:
with transaction.atomic():
ModelName.objects.select_for_update().filter(...)
...
update record that is locked
...
50+ clients sending queries to the database concurrently
Some of those queries ask for the same record. Meaning different transactions will run the same SQL at the same time.
After a lot of reading, I did the following to try to get the deadlock under control:
1- Try/Catch exception error '1213' (deadlock). When this happens, wait 30 seconds and retry the query. Here, I rely on the ROLLBACK function from the database engine.
Also, print output of SHOW ENGINE INNODB STATUS and SHOW PROCESSLIST. But SHOW PROCESSLIST doesn't give useful information.
2- Modify the Django select_on_update so that it doesn't build an SQL with subqueries. Now, the SQL generated contains a single WHERE with values and no subqueries.
Anything else that could be done to reduce the deadlocks?
If u hv select_for_update inside a transaction, it will only be released went the whole transaction commits or rollbacks. With nowait set to true the other concurrent requests will immediately fail with:
3572, 'Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.')
So if we cant use optimistic locks and cannot make transactions shorter, we can set nowait=true in our select_for_update, and we will see a lot of failures if our assumptions are correct. Here we can just catch deadlock failures and retry them with backoff strategy. This is based on the assumption that all people are trying to write to the same thing like an auction item, or ticket booking with a short window of time. If that is not the case consider changing the db design a bit to make deadlocks common
In my code I need to do the following:
Check a MySQL table (InnoDB) if a particular row (matching some criteria) exists. If it does, return it. If it doesn't, create it and then return it.
The problem I seem to have is race conditions. Every now and then two processes run so closely together, that they both check the table at the same time, don't see the row, and both insert it - thus duplicate data.
I'm reading MySQL documentation trying to come up with some way to prevent this. What I've come up so far:
Unique indexes seem to be one option, but they're not universal (it only works when the criteria is something unique for all rows).
Transactions even at SERIALIZABLE level don't protect against INSERT, period.
Neither do SELECT ... LOCK IN SHARE MODE or SELECT ... FOR UPDATE.
A LOCK TABLE ... WRITE would do it, but it's a very drastic measure - other processes won't be able to read from the table, and I need to lock ALL tables that I intend to use until I unlock them.
Basically, I'd like to do either of the following:
Prevent all INSERT to the table from processes other than mine, while allowing SELECT/UPDATE (this is probably impossible because it make so little sense most of the time).
Organize some sort of manual locking. The two processes would coordinate among themselves which one gets to do the select/insert dance, while the other waits. This needs some sort of operation that waits until the lock is released. I could probably implement a spin-lock (one process repeatedly checks if the other has released the lock), but I'm afraid that it would be too resource intensive.
I think I found an answer myself. Transactions + SELECT ... FOR UPDATE in an InnoDB table can provide a synchronization lock (aka mutex). Have all processes lock on a specific row in a specific table before they start their work. Then only one will be able to run at a time and the rest will wait until the first one finishes its transaction.
What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.
We have 2 scripts/mysql connections that are grabbing rows from a table. Once a script grabs some rows, the other script must not be able to access those rows.
What I've got so far, that seems to work is this:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
START TRANSACTION
SELECT * FROM table WHERE result='new' FOR UPDATE
// Loop over update
UPDATE table SET result='old' WHERE id=...
COMMIT
From what I understand the same connection could read the dirty data, but the other connections shouldn't be able to since the rows are locked. Is this correct?
Also is there a better way of guaranteeing that each row can only be SELECT one time with both scripts running?
edit:
Oh... and the engine is Innodb
edit: Also I'd like to try to avoid deadlocks, unless they really have no effect, in which I could just prepare for them and rerun the query.
SELECT ... FOR UDATE sets exclusive lock on the rows, and if it's not possible it waits for lock to be released, the main aim of SELECT ... FOR UDATE statement is to prevent others from reading the certain rows, while you are manipulating them.
If I get your question right, by 'dirty data' you mean those locked rows?
Don't see why you call them 'dirty', cause they are just locked, but indeed inside of same transaction you can read the rows you've locked (obviuosly).
Regarding your second question
Also is there a better way of guaranteeing that each row can only be
SELECT one time with both scripts running?
SELECT ... FOR UDATE guarantees that in each moment certain rows can be read only inside of one transaction. I dont see a better way to do so, as soon as this statement was specially designed for that purpose.
TL;DR - MySQL doesn't let you lock a table and use a transaction at the same time. Is there any way around this?
I have a MySQL table I am using to cache some data from a (slow) external system. The data is used to display web pages (written in PHP.) Every once in a while, when the cached data is deemed too old, one of the web connections should trigger an update of the cached data.
There are three issues I have to deal with:
Other clients will try to read the cache data while I am updating it
Multiple clients may decide the cache data is too old and try to update it at the same time
The PHP instance doing the work may be terminated unexpectedly at any time, and the data should not be corrupted
I can solve the first and last issues by using a transaction, so clients will be able to read the old data until the transaction is committed, when they will immediately see the new data. Any problems will simply cause the transaction to be rolled back.
I can solve the second problem by locking the tables, so that only one process gets a chance to perform the update. By the time any other processes get the lock they will realise they have been beaten to the punch and don't need to update anything.
This means I need to both lock the table and start a transaction. According to the MySQL manual, this is not possible. Starting a transaction releases the locks, and locking a table commits any active transaction.
Is there a way around this, or is there another way entirely to achieve my goal?
This means I need to both lock the table and start a transaction
This is how you can do it:
SET autocommit=0;
LOCK TABLES t1 WRITE, t2 READ, ...;
... do something with tables t1 and t2 here ...
COMMIT;
UNLOCK TABLES;
For more info, see mysql doc
If it were me, I'd use the advisory locking function within MySQL to implement a mutex for updating the cache, and a transaction for read isolation. e.g.
begin_transaction(); // although reading a single row doesnt really require this
$cached=runquery("SELECT * FROM cache WHERE key=$id");
end_transaction();
if (is_expired($cached)) {
$cached=refresh_data($cached, $id);
}
...
function refresh_data($cached, $id)
{
$lockname=some_deterministic_transform($id);
if (1==runquery("SELECT GET_LOCK('$lockname',0)") {
$cached=fetch_source_data($id);
begin_transaction();
write_data($cached, $id);
end_transaction();
runquery("SELECT RELEASE_LOCK('$lockname')");
}
return $cached;
}
(BTW: bad things may happen if you try this with persistent connections)
I'd suggest to solve the issue by removing the contention altogether.
Add a timestamp column to your cached data.
When you need to update the cached data:
Just add new cached data to your table using the current timestamp
Remove cached data older than, let's say, 24 hours.
When you need to serve the cached data
Sort by timestamp (DESC) and return the newest cached data
At any given time your clients will retrieve records which are never deleted by any other process. Moreover, you don't care if a client gets cached data belonging to different writes (i.e. with different timestamps)
The second problem may be solved without involving the database at all. Have a lock file for the cache update procedure so that other clients know that someone is already on it. This may not catch each and every corner case, but is it that big of a deal if two clients are updating the cache at the same time? After all, they are doing the update in transactions to the cache will still be consistent.
You may even implement the lock yourself by having the last cache update time stored in a table. When a client wants update the cache, make it lock that table, check the last update time and then update the field.
I.e., implement your own locking mechanism to prevent multiple clients from updating the cache. Transactions will take care of the rest.