I have read that transaction is usually used in movie ticket booking website, to solve concurrent purchase problem. However, I failed to understand why is it necessary.
If at the same time, 2 users book the same seat (ID = 1) on the same show (ID = 99), can't you simply issue the following SQL command?
UPDATE seat_db
SET takenByUserID=someUserId
WHERE showID=99 AND seatID=1 AND takenByUserID IS NOT NULL
As I can see, this SQL is already been executed atomically, there's no concurrency issue. The database will set seat ID=1 to 1st user of which the server receives the request, then let the 2nd user's request fail. So, why is transaction still needed for ticket booking system?
When you batch all of your DML statements into a single transaction typically you are telling the database a couple things:
Make this operation (i.e. book movie ticket) an all-or-nothing operation
Ensure you don't leave any orphan rows and have consistent data
Lock all the associated tables up-front so that no other writes can be done while the operation runs
Prevents other transactions from modifying tables your current operation wants to access
Prevents deadlock and allows processing to continue by aborting one of the locking queries
Whether you need to wrap your UPDATE seat_db request in its own transaction depends on what other processing (DML) is being done before and after it.
You'll have to use transactions if your action involves multiple unrelated rows. For example, if the user has to pay for the ticket, then there will be at least two updates: update the user's credit and mark the seat as occupied. If any of the two updates were performed alone you'll definitely get into trouble.
Related
As part of the persistence process in one of my models an md5 check_sum of the entire record is generated and stored with the record. The md5 check_sum contains a flattened representation of the entire record including all EAV attributes etc. This makes preventing absolute duplicates very easy and efficient.
I am not using a unique index on this check_sum for a specific reason, I want this all to be silent, i.e. if a user submits a duplicate then the app just silently ignores it and returns the already existing record. This ensures backwards compatibility with legacy app's and api's.
I am using Laravel's eloquent. So once a record has been created and before committing the application does the following:
$taxonRecords = TaxonRecord::where('check_sum', $taxonRecord->check_sum)->get();
if ($taxonRecords->count() > 0) {
DB::rollBack();
return $taxonRecords->first();
}
However recently I encountered a 60,000/1 shot incident(odds based on record counts at that time). A single duplicate ended up in the database with the same check_sum. When I reviewed the logs I noticed that the creation time was identical down to the second. Further investigation of Apache logs showed a valid POST but the POST was duplicated. I presume the users browser malfunctioned or something but both POSTS arrived simultaneously resulting in two simultaneous transactions.
My question is how can I ensure that a transaction and its contained SELECT for the previous check_sum is Atomic & Isolated. Based upon my reading the answer lies in https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html and isolation levels.
If transaction A and transaction B arrive at the server at the same time then they should not run side by side but should wait for the first to complete.
You created a classic race condition. Both transactions are calculating the checksum while they're both in progress, not yet committed. Neither can read the other's data, since they're uncommitted. So they calculate that they're the only one with the same checksum, and they both go through and commit.
To solve this, you need to run such transactions serially, to be sure that there aren't other concurrent transactions submitting the same data.
You may have to use use GET_LOCK() before starting your transaction to calculate the checksum. Then RELEASE_LOCK() after you commit. That will make sure other concurrent requests wait for your data to be committed, so they will see it when they try to calculate their checksum.
I develop an online reservation system. To simplify let's say that users can book multiple items and each item can be booked only once. Items are first added to the shopping cart.
App uses MySql / InnoDB database. According to MySql documentation, default isolation level is Repeatable reads.
Here is the checkout procedure I've came up with so far:
Begin transaction
Select items in the shopping cart (with for update lock)
Records from cart-item and items tables are fetched at this step.
Check if items haven't been booked by anybody else
Basically check if quantity > 0. It's more complicated in the real application, thus I put it here as a separate step.
Update items, set quantity = 0
Also perform other essential database manipulations.
Make payment (via external api like PayPal or Stripe)
No user interaction is necessary as payment details can be collected before checkout.
If everything went fine commit transaction or rollback otherwise
Continue with non-essential logic
Send e-mail etc in case of success, redirect for error.
I am unsure if that is sufficient. I'm worried whether:
Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?
Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?
Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?
Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?
I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?
If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?
I'm a bit confused about SELECT ... FOR UPDATE on non-existent rows section of Willem Renzema answer. When may it become important? Could you provide any example?
Here are some resources I've read:
How to deal with concurrent updates in databases?, MySQL: Transactions vs Locking Tables, Do database transactions prevent race conditions?,
Isolation (database systems), InnoDB Locking and Transaction Model, A beginner’s guide to database locking and the lost update phenomena.
Rewrote my original question to make it more general.
Added follow-up questions.
Begin transaction
Select items in shopping cart (with for update lock)
So far so good, this will at least prevent the user from doing checkout in multiple sessions (multiple times trying to checkout the same card - good to deal with double clicks.)
Check if items haven't been booked by other user
How do you check? With a standard SELECT or with a SELECT ... FOR UPDATE? Based on step 5, I'm guessing you are checking a reserved column on the item, or something similar.
The problem here is that the SELECT ... FOR UPDATE in step 2 is NOT going to apply the FOR UPDATE lock to everything else. It is only applying to what is SELECTed: the cart-item table. Based on the name, that is going to be a different record for each cart/user. This means that other transactions will NOT be blocked from proceeding.
Make payment
Update items marking them as reserved
If everything went fine commit transaction, rollback otherwise
Following the above, based on the information you've provided, you may end up with multiple people buying the same item, if you aren't using SELECT ... FOR UPDATE on step 3.
Suggested Solution
Begin transaction
SELECT ... FOR UPDATE the cart-item table.
This will lock a double click out from running. What you select here should be the some kind of "cart ordered" column. If you do this, a second transaction will pause here and wait for the first to finish, and then read the result what the first saved to the database.
Make sure to end the checkout process here if the cart-item table says it has already been ordered.
SELECT ... FOR UPDATE the table where you record if an item has been reserved.
This will lock OTHER carts/users from being able to read those items.
Based on the result, if the items are not reserved, continue:
UPDATE ... the table in step 3, marking the item as reserved. Do any other INSERTs and UPDATEs you need, as well.
Make payment. Issue a rollback if the payment service says the payment didn't work.
Record payment, if success.
Commit transaction
Make sure you don't do anything that might fail between steps 5 and 7 (like sending emails), else you may end up with them making a payment without it being recorded, in the event the transaction gets rolled back.
Step 3 is the important step with regards to making sure two (or more) people don't try to order the same item. If two people do try, the 2nd person will end up having their webpage "hang" while it processes the first. Then when the first finishes, the 2nd will read the "reserved" column, and you can return a message to the user that someone has already purchased that item.
Payment in transaction or not
This is subjective. Generally, you want to close transactions as quickly as possible, to avoid multiple people being locked out from interacting with the database at once.
However, in this case, you actually do want them to wait. It's just a matter of how long.
If you choose to commit the transaction before payment, you'll need to record your progress in some intermediate table, run the payment, and then record the result. Be aware that if the payment fails, you'll then have to manually undo the item reservation records that you updated.
SELECT ... FOR UPDATE on non-existent rows
Just a word of warning, in case your table design involves inserting rows where you need to earlier SELECT ... FOR UPDATE: If a row doesn't exist, that transaction will NOT cause other transactions to wait, if they also SELECT ... FOR UPDATE the same non-existent row.
So, make sure to always serialize your requests by doing a SELECT ... FOR UPDATE on a row that you know exists first. Then you can SELECT ... FOR UPDATE on the row that may or may not exist yet. (Don't try to do just a SELECT on the row that may or may not exist, as you'll be reading the state of the row at the time the transaction started, not at the moment you run the SELECT. So, SELECT ... FOR UPDATE on non-existent rows is still something you need to do in order to get the most up to date information, just be aware it will not cause other transactions to wait.)
1. Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?
Yes. While active transaction keeps FOR UPDATE lock on a record, statements in other transactions that use any lock (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE) will be suspended untill either active transaction commits or "Lock wait timeout" is exceeded.
2. Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?
This will not be a problem, as this is exactly what is necessary. Checkout transactions should be executed sequentially, ie. latter checkout should not start before former finish.
3. Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?
Repeatable reads isolation level ensures that changes made by a transaction are not visible until that transaction is commited. Therefore items availability will be displayed correctly. Nothing will be shown unavailable before it is actually paid for. No locks are necessary.
SELECT ... LOCK IN SHARE MODE would cause checkout transaction to wait until it is finished. This could slow down checkouts without giving any payoff.
4. Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?
It is possible. Transaction may be rolled back when "Lock wait timeout" is exceeded or when deadlock happens. In that case it would be a good idea to retry it automatically.
By default suspended statements fail after 50s.
5. I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?
Yes, SELECT ... FOR UPDATE on items table should be enough.
Yes, these selects wait, because FOR UPDATE is an exclusive lock.
Yes, simple SELECT will just grab value as it was before transaction started, this will happen immediately.
6. If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?
Yes, SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE lock all read records, so whatever we JOIN is included. See MySql Docs.
What's interesting (at least for me) everything that is scanned in the processing of the SQL statement gets locked, no matter wheter it is selected or not. For example WHERE id < 10 would lock also the record with id = 10!
If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows.
I have a central database for handling user credit with multiple servers reads and writes to it. The application sits on top of these servers serve user requests by doing the following for each request:
1. check if user has enough credit for the task by reading from db.
2. perform the time consuming request
3. deduct a credit from user account, save the new credit count back to db.
the application uses the database's optimistic locking. So following might happen
1. request a comes in, see that user x has enough credit,
2. request b comes in, see that user x has enough credit,
3. a performs work
4. a saves the new credit count back to db
5. b performs work
6. b tries to save the new credit count back to db, application gets an exception and fails to account for this credit deduction.
With pessimistic locking, the application will need to explicitly get a lock on the user account to guarantee exclusive access, but this KILLs performance since the system have many concurrent requests.
so what would be a good new design for this credit system?
Here are two "locking" mechanisms at avoid using InnoDB's locking mechanism for either of two reasons:
A task that takes longer than you should spend in a BEGIN...COMMIT of InnoDB.
A task that ends in a different program (or different web page) than it started in.
Plan A. (This assumes the race condition is rare, and the time wasted for Step 2 is acceptable in those rare cases.)
(same) check if user has enough credit for the task by reading from db.
(same) perform the time consuming request
(added) START TRANSACTION;
(added) Again Check if the user has enough credit. (ROLLABCK and abort if not.)
(same as old #3) deduct a credit from user account, save the new credit count back to db.
(added) COMMIT;
START..COMMIT is InnoDB transaction stuff. If a race condition caused 'x' to not have credit by step 4, you will ROLLBACK and not perform steps 4 and 5.
Plan B. (This is more complex, but you might prefer it.)
Have a table Locks for locking. It contains user_id and a timestamp.
START TRANSACTION;
If user_id is in Locks, abort (ROLLBACK and exit).
INSERT INTO Locks the user_id and current_timestamp in Locks (thereby "locking" 'x').
COMMIT;
Perform the processing (original Steps 1,2,3)
DELETE FROM Locks WHERE user_id = 'x'; (autocommit=1 suffices here.)
A potential problem: If the processing dies in step 6, not getting around to releasing the lock, that user will be locked out forever. The 'solution' is to periodically check Locks for any timestamps that are 'very' old. If any are found, assume that the processing died, and DELETE the row(s).
You didn't state explicitly what you want to achieve, so I assume you don't want to perform the work just to realise it has been in vain due to low credit.
No-lock
Implement credit hold on step (1) and associate the work (2) and the deduction (3) with the hold. This way low credit user won't pass step (1).
Optimistic locking
As a collision is detected in optimistic locking post factum, I don't think it fits the assumption.
Pessimistic locking
It isn't possible to tell definitely without knowing the schema, but I think it's an exaggeration about killing performance. You can smartly incorporate MySQL InnoDB transaction isolation levels and locking reads at finer granularity than exclusively locking a user account completely. For instance, using SELECT ... LOCK IN SHARE MODE which sets shared locks and allows reads for other transactions.
Rick's caution about the tasking taking longer then MySQL will wait (innodb_lock_wait_timeout) applies here.
You want The Escrow Transactional Method.
You record the credit left after doling out some to each updating process and the credit doled out to (ie held in escrow for) them. A process retries until success a transaction that increases the credit doled out by what it needs and decreases the credit that is left by what it needs; it succeeds only if that would leave the credit left non-negative. Then it does its long calculation. Regardless of the calculation's success it then applies a transaction that decreases the credit doled out. But on success it also increases the assets while on failure it increases the credit left.
Use the timestamp/rowversion approach that you will find in all real database engines except MySQL.
You can emulate them with MySQL in this way. Have a TIMESTAMP column (updated) that gets updated whenever a row is updated. Select that column along with the rest of the data you require. Use the returned timestamp as a condition in your WHERE clause so that the row will only be updated when the timestamp is still the same as when you read the row.
UPDATE table SET col1 = value WHERE id = 1 AND updated = timestamp_value_read
Now when you run the update and the timestamps do not match no update will be performed. You can test for this by using rows affected, if zero rows were updated then you know that the row was modified between read and write. Handle that condition in your code whichever way is best for your application and your users.
Timestamp tutorial
When doing a transaction in a mysql db, they are talking about the ongoing transaction not being able to see any updates made by external sources until it commits. So does this mean that changes CAN be made but the transaction just will not be able to see them, or is it actually impossible to update the db while the transaction is going on.
Because I need it to be impossible for other queries to change anything about certain tables while the transaction is going. Right now I write lock all those tables, start a transaction for the atomicity, commit, and than unlock. Is this the way to do this?
From my testing it seems that setting the isolation level to SERIALIZABLE accomplishes the same as manual table locking and unlocking? Is this correct?
It's going to depend on the transaction isolation level you have set on your database. You can read more about the levels here. For example, for READ UNCOMMITTED, you can actually read rows that are uncommitted by another transaction. This is usually not what you want to happen.
Locking an entire table is a really extreme choice though, and should probably not be done unless there's no other choice. My recommendation would be to consider the rows you need to lock, and then you can lock those specific rows using a select for update statement.
For example, suppose you have a resources table and a schedules table that contains bookings for those resources. When booking a resource, you have to check the schedules table for a given resource to make sure it's available for the desired time. However, you have to do this is a concurrent way, that is, you want to ensure that between the time you check the schedules table for availability for the resource, and the time you actually insert the row into the schedules table, you want to ensure that some other transaction doesn't book the resource for the same time (or an overlapping time).
You can accomplish this by using a select for update command:
select * from resources where resource_name=’a’ for update;
Assuming you're doing this in a stored procedure, if some other code fires the stored procedure for the same resource, it will block on that statement. This will ensure that resources don't get double booked.
We could also accomplish this by locking the entire resources table. However, there's no need to do that since we're only interested in booking a single resource. So it's good enough to just lock the resource row we care about.
Note that for MySQL, you need to index the columns you use in the for update or it will lock the entire table.
The point to all this is to always consider maximum concurrency. In other words, don't lock more than you need to. Otherwise, you make the application much less scalable and you inhibit concurrency.
I have a table that is collecting data input by users.
I want to carry out the following SQL statements:
SELECT statement 1
SELECT statement 2
UPDATE table rows that I've read out in the 2 select statements
I want to guard against the possibility of another user inputting new data in between any of the statements.
I've read the MySQL manual and it seems that I could lock the tables first, but I'm more familiar with transactions and I would like to know if wrapping a transaction around the 3 statements would achieve what I want. I've found it quite hard to be certain this will work from reading the manuals (or maybe it's just me....)
There are two possible problem scopes here; transactions (if you're using an engine that supports transactions, like InnoDB) will solve one of them.
Transactions keep all of your queries operating on a snapshot of database state when the transaction was started, and any modifications are applied all-or-nothing when the transaction is completed. This effectively solves interleaving and race conditions with the queries.
However, you stated that you want to prevent a user inputting new data in between any of the statements. If this is a situation where you want to ensure that a user submitting a request is starting from current data, you'll need to implement your own locking mechanism, or at least a way to trap cases where interleaving between requests is causing an issue.
Basically, transactions will only help with queries running in concurrent requests. If this scenario would be a problem:
User1 requests data
User2 requests data
User1 submits modifications
User2 submits modifications
Where User2 was able to submit their changes without knowing about the changes made by User1, you need your own locking system; transactions aren't going to help. This is coming from a web development background where each step is a separate web request in a separate transaction.