Concurrent writes to MySQL and testing solutions - mysql

I was practicing some "system design" coding questions and I was interested in how to solve a concurrency problem in MySQL. The problem was "design an inventory checkout system".
Let's say you are trying to check out a specific item from an inventory, a library book for instance.
If two people are on the website, looking to book it, is it possible that they both check it out? Let's assume the query is updating the status of the row to mark a boolean checked_out to True.
Would transactions solve this issue? It would cause the second query that runs to fail (assuming they are the same query).
Alternatively, we insert rows into a checkouts table. Since both queries read that the item is not checked out currently, they could both insert into the table. I don't think a transaction would solve this, unless the transaction includes reading the table to see if a checkout currently exists for this item that hasn't yet ended.
One of the suggested methods
How would I simulate two writes at the exact same time to test this?

No, transactions alone do not address concurrency issues. Let's quickly revisit mysql's definition of transactions:
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
To sum it up: transactions are a way to ensure data integrity.
RDBMSs use various types of locking, isolation levels, and storage engine level solutions to address concurrency. People often mistake transactions as a mean to control concurrency because transactions affect how long certain locks are held.
Focusing on InnoDB: when you issue an update statement, mysql places an exclusive lock on the record being updated. Only the transaction holding the exclusive lock can modify the given record, the others have to wait until the transaction is committed.
How does this help you preventing multiple users checking out the same book? Let's say you have an id field uniquely identifying the books and a checked_out field indicating the status of the book.
You can use the following atomic update to check out a book:
update books set checked_out=1 where id=xxx and checked_out=0
The checked_out=0 criteria makes sure that the update only succeeds if the book is not checked out yet. So, if the above statement affects a row, then the current user checks out the book. If it does not affect any rows, then someone else has already checked out the book. The exclusive lock makes sure that only one transaction can update the record at any given time, thus serializing the access to that record.
If you want to use a separate checkouts table for reserving books, then you can use a unique index on book ids to prevent the same book being checked out more than once.

Transactions don't cause updates to fail. They cause sequences of queries to be serialized. Only one accessor can run the sequence of queries; others wait.
Everything in SQL is a transaction, single-statement update operations included. The kind of transaction denoted by BEGIN TRANSACTION; ... COMMIT; bundles a series of queries together.
I don't think a transaction would solve this, unless the transaction
includes reading the table to see if a checkout currently exists for
this item.
That's generally correct. Checkout schemes must always read availability from the database. The purpose of the transaction is to avoid race conditions when multiple users attempt to check out the same item.
SQL doesn't have thread-safe atomic test-and-set instructions like multithreaded processor cores have. So you need to use transactions for this kind of thing.
The simplest form of checkout uses a transaction, something like this.
BEGIN TRANSACTION;
SELECT is_item_available, id FROM item WHERE catalog_number = whatever FOR UPDATE;
/* if the item is not available, tell the user and commit the transaction without update*/
UPDATE item SET is_item_available = 0 WHERE id = itemIdPreviouslySelected;
/* tell the user the checkout succeeded. */
COMMIT;
It's clearly possible for two or more users to attempt to check out the same item more-or-less simultaneously. But only one of them actually gets the item.
A more complex checkout scheme, not detailed here, uses a two-step system. First step: a transaction to reserve the item for a user, rejecting the reservation if someone else has it checked out or reserved. Second step: reservation holder has a fixed amount of time to accept the reservation and check out the item, or the reservation expires and some other user may reserve the item.

Related

Using table locking to prevent multiple users from updating at a given time

I am building a simple shopping cart. Currently, to ensure that a customer can never purchase a product that is out of stock, when processing the order I have a loop for each product in their cart:
-- Begin a transaction --
Loop through each product in the cart and
Select the stock count from the products table
If it is in stock:
I will reduce the stock count from the product
Add the product to the order items table
Otherwise, I call a rollback and return an error
-- (If there isn't a call for rollback, everything ends off with a commit --
However, if at any given time, the stock count for a product is updated AFTER it has checked for that particular product, there may be inconsistencies.
Question: would it be a good idea to lock the table from writes whenever I am processing an order? So that when the 'loop' above occurs, I can be assured that no one else is able to alter the product count and it will always be accurate.
The idea is that the product count/availability will always be consistent, and there will never be an instance where the stock count goes to -1 (which would be unfulfillable).
However, I have seen so many posts on locks being inefficient/having bad effects. If so, what is the best way to accomplish this?
I have seen alternatives like handling it in an update + select query, but have seen that it may also not be suitable in some cases.
You have at least three strategies:
1. Pessimistic Locking
If your application will experience low activity then you can lock the tables (or single rows) to make sure no other thread changes the values during the processing of a purchase. It works, but it has performance limitations.
2. Optimistic Locking
If your application/web site must serve a high load then you can opt for the "optimistic locking" strategy. In this case you add a version number column to your critical tables and then you use it when reading/writing it.
When updating stock you check the version number you are updating must be the same that you read. If it's not the case anymore (another thread modified it) you roll back the transaction and can retry again a couple of times until you succeed.
It requires more development effor since you need to identify the bad case and implement retry logic (if you want to).
3. Processing Queues
You can implement processing queues. When a thread wants to "purchase an order" it can submit it to a processing queue for purchase orders. This queue can be implemented by one or more threads dedicated to this task; if you choose multiple threads they can be divided by order types, regions, categories, etc. to distribute the load.
This requires more programming effort since you need to manage asynchronous processing, but can sustain much higher levels of load.
You can use this strategy for multiple different tasks: purchasing orders, refilling stock, sending notifications, processing promotions, etc.

SQL row locks and transactions

I’m really new to relational databases. I’m working on a project that involves finances and so I want any actions that affect balance not to occur at the same time and I want to achieve that using locks, however I’m not sure how to use them. Vision I have now:
I want to have a separate table for each action and a balance field in users table, value of which would be derived from all the relevant tables. That being sad I’m never actually going to update existing records - only adding them. I want to make sure only one record for each user is being inserted at a time in these tables. For instance: 3 transactions occur at the same time and so 3 records are about to be added to any relevant tables. Two of the records have the same userid, a foreign key to my users table, and the other one has a different one. I want my records with the same foreign keys to be pipelined, and the other one can be done whenever. How do I achieve this? Are there any better ways to approach this?
I want any actions that affect balance not to occur at the same time
Why?
I want to achieve that using locks
Why?
To give you a counter example. Let's say you want to avoid having negative account balances. When a user withdraws 500$, how can you model that without locks.
UPDATE accounts
SET balance = balance - 500
WHERE accountholderid = 42
AND balance >= 500
This works without any explicit locks and is safe for concurrent access. You will have to check the update count, if it is 0 you would have overdrawn the account.
(I'm aware MySQL will still acquire a row lock)
It still makes sense to have a ledger but even there the need for locks is not obvious to me.
Use ENGINE=InnoDB for all your tables.
Use transactions:
BEGIN;
do all the work for a single action
COMMIT;
The classic example of a single action is to remove money from one account and add it to another account. The removing would include a check for overdraft, in which case you would have code to ROLLBACK instead of COMMIT.
The locks you get assure that everything for the single action is either completely done, or nothing at all is done. This even applies if the system crashes between the BEGIN and COMMIT.
Without begin and commit, but with autocommit=ON, each statement is implicitly surrounded by begin and commit. That is the UPDATE example in a previous answer is 'atomic'. However, if the money deducted from the one account needs to be added to another account, what will happen if a crash occurs just after the UPDATE? The money vanishes. So, you really need
BEGIN;
if not enough funds, ROLLBACK and exit
UPDATE to take money from one account
UPDATE to add that money to another account
INSERT into some log or audit trail to track all transactions
COMMIT;
Check after each step -- ROLLBACK and take evasive action on any unexpected error.
What happens if 2 (or more) actions happen at the "same time"?
One waits for the other.
There is a deadlock and a ROLLBACK is forced on it.
But, in no case, will the data be messed up.
A further note... In some cases you need FOR UPDATE:
BEGIN;
SELECT some stuff from a row FOR UPDATE;
test the stuff, such as account balance
UPDATE that same row;
COMMIT;
The FOR UPDATE says to other threads "Keep your hands off this row, I'm likely to change it; please wait until I am finished." Without FOR UPDATE, another thread could sneak in and drain the account of the money you thought was there.
Comments on some of your thoughts:
One table is usually sufficient for many users and their account. It would contain the "current" balance for each account. I mentioned a "log"; that would be a separate table; it would contain a "history" (as opposed to just the "current" info).
FOREIGN KEYs are mostly irrelevant in this discussion. They serve 2 purposes: Verify that another table has a row that should be there; and implicitly create an INDEX to make that check faster.
Pipelining? If you are not doing more than a hundred 'transactions' per second, the BEGIN..COMMIT logic is all you need to worry about.
"Same time" and "simultaneously" are misused terms. It is very unlikely that two users will hit the database at the "same time" -- consider browser delays, network delays, OS delays, etc. Plus the fact that most of those steps force activity to go single-file. The network forces one message to get there before another. Meanwhile, if one of your 'transactions' takes 0.01 second, who cares if the "simultaneous" request has to wait for it to finish. The point is that what I described will force the "wait" if needed to avoid messing up the data.
All that said, there still can be some "at the same time" -- If transactions don't touch the same rows, then the few milliseconds it takes from BEGIN to COMMIT could overlap. Consider this timeline of two transactions that came in almost simultaneously:
BEGIN; -- A
pull money from Alice -- A
BEGIN; -- B
pull money from Bobby -- B
give Alice's money to Alan -- A
give Bobby's money to Betty --B
COMMIT; --A
COMMIT; --B

Mysql transactions happening at the same time

Can two transactions occur at the same time? Let's say you have transactions A and B, each of which will perform a read to get the max value of some column then a write to insert a new row with that max+1. Is it possible that A performs a read to get the max, then B performs a read before A writes, causing both transactions to write the same value to the column?
Doing this with isolation level set to read uncommitted to false seems to prevent duplicates, but I can't wrap my head around why.
Can two transactions occur at the same time?
Yes, that is quite possible and in fact it is required for all the RDBMS to support that feature out of the box to speed up things. Think about an application accessed by Thousands of users simultaneously, if everything goes in sequence the users may have to wait day in order to get the response.
Let's say you have transactions A and B, each of which will perform a read to get the max value of some column then a write to insert a new row with that max+1. Is it possible that A performs a read to get the max, then B performs a read before A writes, causing both transactions to write the same value to the column?
If A & B are happening into two different sessions, its quite possible user case.
Doing this with isolation level set to read uncommitted to false seems to prevent duplicates, but I can't wrap my head around why?
I think, your requirement to get next increment number with isolation block is quite common, and here you need to instruct database to do a mutual exclusive read operation for writing operation has to happen, you could instruct the database to do it, by setting isolation, or may be 'temporary isolation' level should solve your.
If gettting next number is only problem and you don't have other constrained then
My Sql AUTO_INCREMENT would be best suited answer for you.
But it seems, you have asked this question specifically, means, you may have constrained.
Refer my similar questions and answer.
Your solution should be something like below.
begin;
select last_number from TABLE1 ... FOR UPDATE;
Read the result in App.
update TABLE1 set last_number=last_number+1 where ...;
commit;

How to prevent multiple workers from racing to process the same task?

I start this worker 10 times to give it a sense of concurrency:
class AnalyzerWorker
#queue = :analyzer
def self.perform
loop do
# My attempt to lock pictures from other worker instances that may
# try to analyze the same picture (race condition)
pic = Pic.where(locked: false).first
pic.update_attributes locked: true
pic.analyze
end
end
end
This code is actually still vulnerable to race condition, one of the reasons i think is because there's a gap of time between fetching the unlocked picture and actually locking it.
Maybe there's more reasons, any robust approach to prevent this?
Active Record provides optimistic locking and pessimistic locking.
In order to use optimistic locking, the table needs to have a column
called lock_version of type integer. Each time the record is updated,
Active Record increments the lock_version column. If an update request
is made with a lower value in the lock_version field than is currently
in the lock_version column in the database, the update request will
fail with an ActiveRecord::StaleObjectError.
Pessimistic locking uses a locking mechanism provided by the
underlying database. Using lock when building a relation obtains an
exclusive lock on the selected rows. Relations using lock are usually
wrapped inside a transaction for preventing deadlock conditions.
Code samples are provided in the referenced links...
Either should work but each need different implementations. From what you are doing, I'd consider pessimistic locking since the possibility of a conflict is relatively high.
Your current implementation is kind of a mixture of both however, as you indicated, it really doesn't solve the problem. You might be able to make yours work, but using the Active Record implementation makes sense.

mySQL - Prevent double booking

I am trying to work out the best way to stop double 'booking' in my application.
I have a table of unique id's each can be sold only once.
My current idea is to use a transaction to check if the chosen products are available, if they are then insert into a 'status' column that it is 'reserved' along with inserting a 'time of update' then if the user goes on to pay I update the status to 'sold'.
Every 10 mins I have a cron job check for 'status' = 'reserved' that was updated more than 10 mins ago and delete such rows.
Is there a better way? I have never used transactions (I have just heard the word banded around) so if someone could explain how I would do this that would be ace.
despite what others here have suggested, transactions are not the complete solution.
sounds like you have a web application here and selecting and purchasing a reservation takes a couple of pages (steps). this means you would have to hold a transaction open across a couple of pages, which is not possible.
your approach (status column) is correct, however, i would implement it differently. instead of a status column, add two columns: reserved_by and reserved_ts.
when reserving a product, set reserved_by to the primary key of the user or the session and reserved_ts to now().
when looking for unreserved products, look for ones where reserved_ts is null or more than 10 minutes old. (i would actually look for a couple minutes older than whatever you tell your user to avoid possible race conditions.)
a cron job to clear old reservations becomes unnecessary.
What you're attempting to do with your "reserved" status is essentially to emulate transactional behavior. You're much better off letting an expert (mysql) handle it for you.
Have a general read about database transactions and then how to use them in MySQL. They aren't too complicated. Feel free to post questions about them here later, and I'll try to respond.
Edit: Now that I think about your requirements... perhaps only using database transactions isn't the best solution - having tons of transactions open and waiting for user action to commit the transactions is probably not a good design choice. Instead, continue what you were doing with "status"="reserved" design, but use transactions in the database to set the value of "status", to ensure that the row isn't "reserved" by two users at the same time.
You do not need to have any added state to do this.
In order to avoid dirty reads, you should set the database to an isolation level of that will avoid them. Namely, REPEATABLE READ or SERIALIZABLE.
You can set the isolation level globally, or session specific. If all your sessions might need the isolation, you may as well set it globally.
Once the isolation level is set, you just need to use a transaction that starts before you SELECT, and optionally UPDATEs the status if the SELECT revealed that it wasn't reserved yet.