How to properly use transactions and locks to ensure database integrity? - mysql

I develop an online reservation system. To simplify let's say that users can book multiple items and each item can be booked only once. Items are first added to the shopping cart.
App uses MySql / InnoDB database. According to MySql documentation, default isolation level is Repeatable reads.
Here is the checkout procedure I've came up with so far:
Begin transaction
Select items in the shopping cart (with for update lock)
Records from cart-item and items tables are fetched at this step.
Check if items haven't been booked by anybody else
Basically check if quantity > 0. It's more complicated in the real application, thus I put it here as a separate step.
Update items, set quantity = 0
Also perform other essential database manipulations.
Make payment (via external api like PayPal or Stripe)
No user interaction is necessary as payment details can be collected before checkout.
If everything went fine commit transaction or rollback otherwise
Continue with non-essential logic
Send e-mail etc in case of success, redirect for error.
I am unsure if that is sufficient. I'm worried whether:
Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?
Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?
Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?
Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?
I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?
If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?
I'm a bit confused about SELECT ... FOR UPDATE on non-existent rows section of Willem Renzema answer. When may it become important? Could you provide any example?
Here are some resources I've read:
How to deal with concurrent updates in databases?, MySQL: Transactions vs Locking Tables, Do database transactions prevent race conditions?,
Isolation (database systems), InnoDB Locking and Transaction Model, A beginner’s guide to database locking and the lost update phenomena.
Rewrote my original question to make it more general.
Added follow-up questions.

Begin transaction
Select items in shopping cart (with for update lock)
So far so good, this will at least prevent the user from doing checkout in multiple sessions (multiple times trying to checkout the same card - good to deal with double clicks.)
Check if items haven't been booked by other user
How do you check? With a standard SELECT or with a SELECT ... FOR UPDATE? Based on step 5, I'm guessing you are checking a reserved column on the item, or something similar.
The problem here is that the SELECT ... FOR UPDATE in step 2 is NOT going to apply the FOR UPDATE lock to everything else. It is only applying to what is SELECTed: the cart-item table. Based on the name, that is going to be a different record for each cart/user. This means that other transactions will NOT be blocked from proceeding.
Make payment
Update items marking them as reserved
If everything went fine commit transaction, rollback otherwise
Following the above, based on the information you've provided, you may end up with multiple people buying the same item, if you aren't using SELECT ... FOR UPDATE on step 3.
Suggested Solution
Begin transaction
SELECT ... FOR UPDATE the cart-item table.
This will lock a double click out from running. What you select here should be the some kind of "cart ordered" column. If you do this, a second transaction will pause here and wait for the first to finish, and then read the result what the first saved to the database.
Make sure to end the checkout process here if the cart-item table says it has already been ordered.
SELECT ... FOR UPDATE the table where you record if an item has been reserved.
This will lock OTHER carts/users from being able to read those items.
Based on the result, if the items are not reserved, continue:
UPDATE ... the table in step 3, marking the item as reserved. Do any other INSERTs and UPDATEs you need, as well.
Make payment. Issue a rollback if the payment service says the payment didn't work.
Record payment, if success.
Commit transaction
Make sure you don't do anything that might fail between steps 5 and 7 (like sending emails), else you may end up with them making a payment without it being recorded, in the event the transaction gets rolled back.
Step 3 is the important step with regards to making sure two (or more) people don't try to order the same item. If two people do try, the 2nd person will end up having their webpage "hang" while it processes the first. Then when the first finishes, the 2nd will read the "reserved" column, and you can return a message to the user that someone has already purchased that item.
Payment in transaction or not
This is subjective. Generally, you want to close transactions as quickly as possible, to avoid multiple people being locked out from interacting with the database at once.
However, in this case, you actually do want them to wait. It's just a matter of how long.
If you choose to commit the transaction before payment, you'll need to record your progress in some intermediate table, run the payment, and then record the result. Be aware that if the payment fails, you'll then have to manually undo the item reservation records that you updated.
SELECT ... FOR UPDATE on non-existent rows
Just a word of warning, in case your table design involves inserting rows where you need to earlier SELECT ... FOR UPDATE: If a row doesn't exist, that transaction will NOT cause other transactions to wait, if they also SELECT ... FOR UPDATE the same non-existent row.
So, make sure to always serialize your requests by doing a SELECT ... FOR UPDATE on a row that you know exists first. Then you can SELECT ... FOR UPDATE on the row that may or may not exist yet. (Don't try to do just a SELECT on the row that may or may not exist, as you'll be reading the state of the row at the time the transaction started, not at the moment you run the SELECT. So, SELECT ... FOR UPDATE on non-existent rows is still something you need to do in order to get the most up to date information, just be aware it will not cause other transactions to wait.)

1. Other user that tries to book same item at the same time will be handled correcly. Will his transaction T2 wait until T1 is done?
Yes. While active transaction keeps FOR UPDATE lock on a record, statements in other transactions that use any lock (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE) will be suspended untill either active transaction commits or "Lock wait timeout" is exceeded.
2. Payment using PayPal or Stripe may take some time. Wouldn't this become a problem in terms of performance?
This will not be a problem, as this is exactly what is necessary. Checkout transactions should be executed sequentially, ie. latter checkout should not start before former finish.
3. Items availability will be shown correctly all the time (items should be available until checkout succeeds). Should these read-only selects use shared lock?
Repeatable reads isolation level ensures that changes made by a transaction are not visible until that transaction is commited. Therefore items availability will be displayed correctly. Nothing will be shown unavailable before it is actually paid for. No locks are necessary.
SELECT ... LOCK IN SHARE MODE would cause checkout transaction to wait until it is finished. This could slow down checkouts without giving any payoff.
4. Is it possible that MySql rollbacks transaction by itself? Is it generally better to retry automatically or display an error message and let user try again?
It is possible. Transaction may be rolled back when "Lock wait timeout" is exceeded or when deadlock happens. In that case it would be a good idea to retry it automatically.
By default suspended statements fail after 50s.
5. I guess its enough if I do SELECT ... FOR UPDATE on items table. This way both request caused by double click and other user will have to wait till transaction finishes. They'll wait because they also use FOR UPDATE. Meanwhile vanilla SELECT will just see a snapshot of db before the transaction, with no delay though, right?
Yes, SELECT ... FOR UPDATE on items table should be enough.
Yes, these selects wait, because FOR UPDATE is an exclusive lock.
Yes, simple SELECT will just grab value as it was before transaction started, this will happen immediately.
6. If I use JOIN in SELECT ... FOR UPDATE, will records in both tables be locked?
Yes, SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, DELETE lock all read records, so whatever we JOIN is included. See MySql Docs.
What's interesting (at least for me) everything that is scanned in the processing of the SQL statement gets locked, no matter wheter it is selected or not. For example WHERE id < 10 would lock also the record with id = 10!
If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily scan many rows.

Related

SQL: why we need transaction for ticket booking?

I have read that transaction is usually used in movie ticket booking website, to solve concurrent purchase problem. However, I failed to understand why is it necessary.
If at the same time, 2 users book the same seat (ID = 1) on the same show (ID = 99), can't you simply issue the following SQL command?
UPDATE seat_db
SET takenByUserID=someUserId
WHERE showID=99 AND seatID=1 AND takenByUserID IS NOT NULL
As I can see, this SQL is already been executed atomically, there's no concurrency issue. The database will set seat ID=1 to 1st user of which the server receives the request, then let the 2nd user's request fail. So, why is transaction still needed for ticket booking system?
When you batch all of your DML statements into a single transaction typically you are telling the database a couple things:
Make this operation (i.e. book movie ticket) an all-or-nothing operation
Ensure you don't leave any orphan rows and have consistent data
Lock all the associated tables up-front so that no other writes can be done while the operation runs
Prevents other transactions from modifying tables your current operation wants to access
Prevents deadlock and allows processing to continue by aborting one of the locking queries
Whether you need to wrap your UPDATE seat_db request in its own transaction depends on what other processing (DML) is being done before and after it.
You'll have to use transactions if your action involves multiple unrelated rows. For example, if the user has to pay for the ticket, then there will be at least two updates: update the user's credit and mark the seat as occupied. If any of the two updates were performed alone you'll definitely get into trouble.

MariaDB. Use Transaction Rollback without locking tables

On a website, when a user posts a comment I do several queries, Inserts and Updates. (On MariaDB 10.1.29)
I use START TRANSACTION so if any query fails at any given point I can easily do a rollback and delete all changes.
Now I noticed that this locks the tables when I do an INSERT from an other INSERT, and I'm not talking while the query is running, that’s obvious, but until the transaction is not closed.
Then DELETE is only locked if they share a common index key (comments for the same page), but luckily UPDATE is no locked.
Can I do any Transaction that does not lock the table from new inserts (while the transaction is ongoing, not the actual query), or any other method that lets me conveniently "undo" any query done after some point?
PD:
I start Transaction with PHPs function mysqli_begin_transaction() without any of the flags, and then mysqli_commit().
I don't think that a simple INSERT would block other inserts for longer than the insert time. AUTO_INC locks are not held for the full transaction time.
But if two transactions try to UPDATE the same row like in the following statement (two replies to the same comment)
UPDATE comment SET replies=replies+1 WHERE com_id = ?
the second one will have to wait until the first one is committed. You need that lock to keep the count (replies) consistent.
I think all you can do is to keep the transaction time as short as possible. For example you can prepare all statements before you start the transaction. But that is a matter of milliseconds. If you transfer files and it can take 40 seconds, then you shouldn't do that while the database transaction is open. Transfer the files before you start the transaction and save them with a name that indicates that the operation is not complete. You can also save them in a different folder but on the same partition. Then when you run the transaction, you just need to rename the files, which should not take much time. From time to time you can clean-up and remove unrenamed files.
All write operations work in similar ways -- They lock the rows that they touch (or might touch) from the time the statement is executed until the transaction is closed via either COMMIT or ROLLBACK. SELECT...FOR UPDATE and SELECT...WITH SHARED LOCK also get involved.
When a write operation occurs, deadlock checking is done.
In some situations, there is "gap" locking. Did com_id happen to be the last id in the table?
Did you leave out any SELECTs that needed FOR UPDATE?

MySQL InnoDB: Difference Between `FOR UPDATE` and `LOCK IN SHARE MODE`

What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.

optimistic locking user credit management

I have a central database for handling user credit with multiple servers reads and writes to it. The application sits on top of these servers serve user requests by doing the following for each request:
1. check if user has enough credit for the task by reading from db.
2. perform the time consuming request
3. deduct a credit from user account, save the new credit count back to db.
the application uses the database's optimistic locking. So following might happen
1. request a comes in, see that user x has enough credit,
2. request b comes in, see that user x has enough credit,
3. a performs work
4. a saves the new credit count back to db
5. b performs work
6. b tries to save the new credit count back to db, application gets an exception and fails to account for this credit deduction.
With pessimistic locking, the application will need to explicitly get a lock on the user account to guarantee exclusive access, but this KILLs performance since the system have many concurrent requests.
so what would be a good new design for this credit system?
Here are two "locking" mechanisms at avoid using InnoDB's locking mechanism for either of two reasons:
A task that takes longer than you should spend in a BEGIN...COMMIT of InnoDB.
A task that ends in a different program (or different web page) than it started in.
Plan A. (This assumes the race condition is rare, and the time wasted for Step 2 is acceptable in those rare cases.)
(same) check if user has enough credit for the task by reading from db.
(same) perform the time consuming request
(added) START TRANSACTION;
(added) Again Check if the user has enough credit. (ROLLABCK and abort if not.)
(same as old #3) deduct a credit from user account, save the new credit count back to db.
(added) COMMIT;
START..COMMIT is InnoDB transaction stuff. If a race condition caused 'x' to not have credit by step 4, you will ROLLBACK and not perform steps 4 and 5.
Plan B. (This is more complex, but you might prefer it.)
Have a table Locks for locking. It contains user_id and a timestamp.
START TRANSACTION;
If user_id is in Locks, abort (ROLLBACK and exit).
INSERT INTO Locks the user_id and current_timestamp in Locks (thereby "locking" 'x').
COMMIT;
Perform the processing (original Steps 1,2,3)
DELETE FROM Locks WHERE user_id = 'x'; (autocommit=1 suffices here.)
A potential problem: If the processing dies in step 6, not getting around to releasing the lock, that user will be locked out forever. The 'solution' is to periodically check Locks for any timestamps that are 'very' old. If any are found, assume that the processing died, and DELETE the row(s).
You didn't state explicitly what you want to achieve, so I assume you don't want to perform the work just to realise it has been in vain due to low credit.
No-lock
Implement credit hold on step (1) and associate the work (2) and the deduction (3) with the hold. This way low credit user won't pass step (1).
Optimistic locking
As a collision is detected in optimistic locking post factum, I don't think it fits the assumption.
Pessimistic locking
It isn't possible to tell definitely without knowing the schema, but I think it's an exaggeration about killing performance. You can smartly incorporate MySQL InnoDB transaction isolation levels and locking reads at finer granularity than exclusively locking a user account completely. For instance, using SELECT ... LOCK IN SHARE MODE which sets shared locks and allows reads for other transactions.
Rick's caution about the tasking taking longer then MySQL will wait (innodb_lock_wait_timeout) applies here.
You want The Escrow Transactional Method.
You record the credit left after doling out some to each updating process and the credit doled out to (ie held in escrow for) them. A process retries until success a transaction that increases the credit doled out by what it needs and decreases the credit that is left by what it needs; it succeeds only if that would leave the credit left non-negative. Then it does its long calculation. Regardless of the calculation's success it then applies a transaction that decreases the credit doled out. But on success it also increases the assets while on failure it increases the credit left.
Use the timestamp/rowversion approach that you will find in all real database engines except MySQL.
You can emulate them with MySQL in this way. Have a TIMESTAMP column (updated) that gets updated whenever a row is updated. Select that column along with the rest of the data you require. Use the returned timestamp as a condition in your WHERE clause so that the row will only be updated when the timestamp is still the same as when you read the row.
UPDATE table SET col1 = value WHERE id = 1 AND updated = timestamp_value_read
Now when you run the update and the timestamps do not match no update will be performed. You can test for this by using rows affected, if zero rows were updated then you know that the row was modified between read and write. Handle that condition in your code whichever way is best for your application and your users.
Timestamp tutorial

While in a transaction, how can reads to an affected row be prevented until the transaction is done?

I'm fairly sure this has a simple solution, but I haven't been able to find it so far. Provided an InnoDB MySQL database with the isolation level set to SERIALIZABLE, and given the following operation:
BEGIN WORK;
SELECT * FROM users WHERE userID=1;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
I would like to make sure that as soon as the select inside the transaction is issued, the row corresponding to userID=1 is locked for reads until the transaction is done. As it stands now, UPDATEs to this row will wait for the transaction to be finished if it is in process, but SELECTs simply will read the previous value. I understand this is the expected behaviour in this case, but I wonder if there is a way to lock the row in such a way that SELECTs will also wait until the transaction is finished to return the values?
The reason I'm looking for that is that at some point, and with enough concurrent users, it could happen that while the previous transaction is in process someone else reads the "credits" to calculate something else. Ideally the code run by that someone else should wait for the transaction to finish to use the new value, because otherwise it could lead to irreversible desync issues.
Note that I don't want to lock the entire table for reads, just the specific row.
Also, I could add a boolean "locked" field to the tables and set it to 1 every time I'm starting a transaction but I don't really feel this is the most elegant solution here, unless there is absolutely no other way to handle this through mysql directly.
I found a workaround, specifically:
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on the rows
read. A shared mode lock enables other sessions to read the rows but
not to modify them. The rows read are the latest available, so if they
belong to another transaction that has not yet committed, the read
blocks until that transaction ends.
(Source)
It seems that one can just include LOCK IN SHARE MODE in the critical SELECT statements that rely on transactional data and they will indeed wait for current transactions to finish before retrieving the row/s. For this to work the transaction has to use FOR UPDATE explicitly (as opposed to the original example I gave). E.g., given the following:
BEGIN WORK;
SELECT * FROM users WHERE userID=1 FOR UPDATE;
UPDATE users SET credits=100 WHERE userID=1;
COMMIT;
Anywhere else in the code I could use:
SELECT * FROM users WHERE userID=1 LOCK IN SHARE MODE;
Since this statement is not wrapped in a transaction, the lock is released immediately, thus having no impacts in subsequent queries, but if the row involving userID=1 has been selected for update within a transaction this statement would wait until the transaction is done, which is exactly what I was looking for.
You could try the SELECT ... FOR UPDATE locking read.
A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.
Please go through the following site: http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html