Updating account balances with mysql - mysql

I have a field on a User table that holds the account balance for the user. Users can perform a lot of actions with my service that will result in rapid changes to their balance.
I'm trying to use mysql's serializable isolation level to make sure that multiple user actions will not update the value incorrectly. (Action A and action B simultaneously want to deduct 1 dollar from the balance.) However, I'm getting a lot of deadlock errors.
How do I do this correctly without getting all these deadlocks, and still keeping the balance field up to date?
simple schema: user has an id and a balance.
im using doctrine, so i'm doing something like the following:
$con->beginTransaction();
$tx = $con->transaction;
$tx->setIsolation('SERIALIZABLE');
$user = UserTable::getInstance()->find($userId);
$user->setBalance($user->getBalance() + $change);
$user->save();
$con->commit();

First trying to use serializable isolation level on your transaction is a good idea. It means you know at least a minimum what a transation is, and that the isolation level is one of the biggest problem.
Note that serializable is not really a true seriability. More on that on this previous answer, when you'll have some time to read it :-).
But the most important part is that you should consider that having automatic rollbacks on your transaction because of failed serialibility is a normal fact, and that the right thing to do is building your application so that transactions could fail and should be replayed.
One simple solution, and for accounting things I like this simple solution as we can predict all the facts, no suprises, so, one solution is to perform table locks. This is not a fine and elegant solution, no row levels locks, just simple big table locks (and always in the same order). After that you can do your operation as a single player and then release teh locks. Not multi user concurrency on the rows of the tables, no next-row magical locks fails (see previous link). This will certainly slow down your write operations, but if everybody performs the table locks in the same order you'll only get locks timeouts problems, no deadlocks and no 'unserializable auto-rollback'.
Edit
From your code sample I'm not sure you can set the transaction isolation level after the begin. You should activate query logs on MySQL and seewhat is done, then check that other transactions runned by the CMS are not still in the serializable level.

Related

Are MySQL InnoDB transactions serializable/atomic? [duplicate]

The PHP Documentation says:
If you've never encountered transactions before, they offer 4 major
features: Atomicity, Consistency, Isolation and Durability (ACID). In
layman's terms, any work carried out in a transaction, even if it is
carried out in stages, is guaranteed to be applied to the database
safely, and without interference from other connections, when it is
committed.
QUESTION:
Does this mean that I can have two separate php scripts running transactions simultaneously without them interfering with one another?
ELABORATING ON WHAT I MEAN BY "INTERFERING":
Imagine we have the following employees table:
__________________________
| id | name | salary |
|------+--------+----------|
| 1 | ana | 10000 |
|------+--------+----------|
If I have two scripts with similar/same code and they run at the exact same time:
script1.php and script2.php (both have the same code):
$conn->beginTransaction();
$stmt = $conn->prepare("SELECT * FROM employees WHERE name = ?");
$stmt->execute(['ana']);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
$salary = $row['salary'];
$salary = $salary + 1000;//increasing salary
$stmt = $conn->prepare("UPDATE employees SET salary = {$salary} WHERE name = ?");
$stmt->execute(['ana']);
$conn->commit();
and assuming the sequence of events is as follows:
script1.php selects data
script2.php selects data
script1.php updates data
script2.php updates data
script1.php commit() happens
script2.php commit() happens
What would the resulting salary of ana be in this case?
Would it be 11000? And would this then mean that 1 transaction will overlap the other because the information was obtained before either commit happened?
Would it be 12000? And would this then mean that regardless of the order in which data was updated and selected, the commit() function forced these to happen individually?
Please feel free to elaborate as much as you want on how transactions and separate scripts can interfere (or don't interfere) with one another.
You are not going to find the answer in php documentation because this has nothing to do with php or pdo.
Innodb table engine in mysql offers 4 so-called isolation levels in line with the sql standard. The isolation levels in conjunction with blocking / non-blocking reads will determine the result of the above example. You need to understand the implications of the various isolation levels and choose the appropriate one for your needs.
To sum up: if you use serialisable isolation level with autocommit turned off, then the result will be 12000. In all other isolation levels and serialisable with autocommit enabled the result will be 11000. If you start using locking reads, then the result could be 12000 under all isolation levels.
Judging by the given conditions (a solitary DML statement), you don't need a transaction here, but a table lock. It's a very common confusion.
You need a transaction if you need to make sure that ALL your DML statements were performed correctly or weren't performed at all.
Means
you don't need a transaction for any number of SELECT queries
you don't need a transaction if only one DML statement is performed
Although, as it was noted in the excellent answer from Shadow, you may use a transaction here with appropriate isolation level, it would be rather confusing. What you need here is table locking. InnoDB engine lets you lock particular rows instead of locking the entire table and thus should be preferred.
In case you want the salary to be 1200 - then use table locks.
Or - a simpler way - just run an atomic update query:
UPDATE employees SET salary = salary + 1000 WHERE name = ?
In this case all salaries will be recorded.
If your goal is different, better express it explicitly.
But again: you have to understand that transactions in general has nothing to do with separate scripts execution. Regarding your topic of race condition you are interested not in transactions but in table/row locking. This is a very common confusion, and you better learn it straight:
a transaction is to ensure that a set of DML queries within one script were executed successfully.
table/row locking is to ensure that other script executions won't interfere.
The only topic where transactions and locking interfere is a deadlock, but again - it's only in case when a transaction is using locking.
Alas, the "without interference" needs some help from the programmer. It needs BEGIN and COMMIT to define the extent of the 'transaction'. And...
Your example is inadequate. The first statement needs SELECT ... FOR UPDATE. This tells the transaction processing that there is likely to be an UPDATE coming for the row(s) that the SELECT fetches. That warning is critical to "preventing interference". Now the timeline reads:
script1.php BEGINs
script2.php BEGINs
script1.php selects data (FOR UPDATE)
script2.php selects data is blocked, so it waits
script1.php updates data
script1.php commit() happens
script2.php selects data (and will get the newly-committed value)
script2.php updates data
script2.php commit() happens
(Note: This is not a 'deadlock', just a 'wait'.)

MySQL InnoDB: Difference Between `FOR UPDATE` and `LOCK IN SHARE MODE`

What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.

optimistic locking user credit management

I have a central database for handling user credit with multiple servers reads and writes to it. The application sits on top of these servers serve user requests by doing the following for each request:
1. check if user has enough credit for the task by reading from db.
2. perform the time consuming request
3. deduct a credit from user account, save the new credit count back to db.
the application uses the database's optimistic locking. So following might happen
1. request a comes in, see that user x has enough credit,
2. request b comes in, see that user x has enough credit,
3. a performs work
4. a saves the new credit count back to db
5. b performs work
6. b tries to save the new credit count back to db, application gets an exception and fails to account for this credit deduction.
With pessimistic locking, the application will need to explicitly get a lock on the user account to guarantee exclusive access, but this KILLs performance since the system have many concurrent requests.
so what would be a good new design for this credit system?
Here are two "locking" mechanisms at avoid using InnoDB's locking mechanism for either of two reasons:
A task that takes longer than you should spend in a BEGIN...COMMIT of InnoDB.
A task that ends in a different program (or different web page) than it started in.
Plan A. (This assumes the race condition is rare, and the time wasted for Step 2 is acceptable in those rare cases.)
(same) check if user has enough credit for the task by reading from db.
(same) perform the time consuming request
(added) START TRANSACTION;
(added) Again Check if the user has enough credit. (ROLLABCK and abort if not.)
(same as old #3) deduct a credit from user account, save the new credit count back to db.
(added) COMMIT;
START..COMMIT is InnoDB transaction stuff. If a race condition caused 'x' to not have credit by step 4, you will ROLLBACK and not perform steps 4 and 5.
Plan B. (This is more complex, but you might prefer it.)
Have a table Locks for locking. It contains user_id and a timestamp.
START TRANSACTION;
If user_id is in Locks, abort (ROLLBACK and exit).
INSERT INTO Locks the user_id and current_timestamp in Locks (thereby "locking" 'x').
COMMIT;
Perform the processing (original Steps 1,2,3)
DELETE FROM Locks WHERE user_id = 'x'; (autocommit=1 suffices here.)
A potential problem: If the processing dies in step 6, not getting around to releasing the lock, that user will be locked out forever. The 'solution' is to periodically check Locks for any timestamps that are 'very' old. If any are found, assume that the processing died, and DELETE the row(s).
You didn't state explicitly what you want to achieve, so I assume you don't want to perform the work just to realise it has been in vain due to low credit.
No-lock
Implement credit hold on step (1) and associate the work (2) and the deduction (3) with the hold. This way low credit user won't pass step (1).
Optimistic locking
As a collision is detected in optimistic locking post factum, I don't think it fits the assumption.
Pessimistic locking
It isn't possible to tell definitely without knowing the schema, but I think it's an exaggeration about killing performance. You can smartly incorporate MySQL InnoDB transaction isolation levels and locking reads at finer granularity than exclusively locking a user account completely. For instance, using SELECT ... LOCK IN SHARE MODE which sets shared locks and allows reads for other transactions.
Rick's caution about the tasking taking longer then MySQL will wait (innodb_lock_wait_timeout) applies here.
You want The Escrow Transactional Method.
You record the credit left after doling out some to each updating process and the credit doled out to (ie held in escrow for) them. A process retries until success a transaction that increases the credit doled out by what it needs and decreases the credit that is left by what it needs; it succeeds only if that would leave the credit left non-negative. Then it does its long calculation. Regardless of the calculation's success it then applies a transaction that decreases the credit doled out. But on success it also increases the assets while on failure it increases the credit left.
Use the timestamp/rowversion approach that you will find in all real database engines except MySQL.
You can emulate them with MySQL in this way. Have a TIMESTAMP column (updated) that gets updated whenever a row is updated. Select that column along with the rest of the data you require. Use the returned timestamp as a condition in your WHERE clause so that the row will only be updated when the timestamp is still the same as when you read the row.
UPDATE table SET col1 = value WHERE id = 1 AND updated = timestamp_value_read
Now when you run the update and the timestamps do not match no update will be performed. You can test for this by using rows affected, if zero rows were updated then you know that the row was modified between read and write. Handle that condition in your code whichever way is best for your application and your users.
Timestamp tutorial

Do transactions prevent other updates for a while, or just hide them?

When doing a transaction in a mysql db, they are talking about the ongoing transaction not being able to see any updates made by external sources until it commits. So does this mean that changes CAN be made but the transaction just will not be able to see them, or is it actually impossible to update the db while the transaction is going on.
Because I need it to be impossible for other queries to change anything about certain tables while the transaction is going. Right now I write lock all those tables, start a transaction for the atomicity, commit, and than unlock. Is this the way to do this?
From my testing it seems that setting the isolation level to SERIALIZABLE accomplishes the same as manual table locking and unlocking? Is this correct?
It's going to depend on the transaction isolation level you have set on your database. You can read more about the levels here. For example, for READ UNCOMMITTED, you can actually read rows that are uncommitted by another transaction. This is usually not what you want to happen.
Locking an entire table is a really extreme choice though, and should probably not be done unless there's no other choice. My recommendation would be to consider the rows you need to lock, and then you can lock those specific rows using a select for update statement.
For example, suppose you have a resources table and a schedules table that contains bookings for those resources. When booking a resource, you have to check the schedules table for a given resource to make sure it's available for the desired time. However, you have to do this is a concurrent way, that is, you want to ensure that between the time you check the schedules table for availability for the resource, and the time you actually insert the row into the schedules table, you want to ensure that some other transaction doesn't book the resource for the same time (or an overlapping time).
You can accomplish this by using a select for update command:
select * from resources where resource_name=’a’ for update;
Assuming you're doing this in a stored procedure, if some other code fires the stored procedure for the same resource, it will block on that statement. This will ensure that resources don't get double booked.
We could also accomplish this by locking the entire resources table. However, there's no need to do that since we're only interested in booking a single resource. So it's good enough to just lock the resource row we care about.
Note that for MySQL, you need to index the columns you use in the for update or it will lock the entire table.
The point to all this is to always consider maximum concurrency. In other words, don't lock more than you need to. Otherwise, you make the application much less scalable and you inhibit concurrency.

Locking mySQL tables/rows

can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.