MySQL/MariaDB InnoDB Simultaneous Transactions & Locking Behaviour - mysql

As part of the persistence process in one of my models an md5 check_sum of the entire record is generated and stored with the record. The md5 check_sum contains a flattened representation of the entire record including all EAV attributes etc. This makes preventing absolute duplicates very easy and efficient.
I am not using a unique index on this check_sum for a specific reason, I want this all to be silent, i.e. if a user submits a duplicate then the app just silently ignores it and returns the already existing record. This ensures backwards compatibility with legacy app's and api's.
I am using Laravel's eloquent. So once a record has been created and before committing the application does the following:
$taxonRecords = TaxonRecord::where('check_sum', $taxonRecord->check_sum)->get();
if ($taxonRecords->count() > 0) {
DB::rollBack();
return $taxonRecords->first();
}
However recently I encountered a 60,000/1 shot incident(odds based on record counts at that time). A single duplicate ended up in the database with the same check_sum. When I reviewed the logs I noticed that the creation time was identical down to the second. Further investigation of Apache logs showed a valid POST but the POST was duplicated. I presume the users browser malfunctioned or something but both POSTS arrived simultaneously resulting in two simultaneous transactions.
My question is how can I ensure that a transaction and its contained SELECT for the previous check_sum is Atomic & Isolated. Based upon my reading the answer lies in https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html and isolation levels.
If transaction A and transaction B arrive at the server at the same time then they should not run side by side but should wait for the first to complete.

You created a classic race condition. Both transactions are calculating the checksum while they're both in progress, not yet committed. Neither can read the other's data, since they're uncommitted. So they calculate that they're the only one with the same checksum, and they both go through and commit.
To solve this, you need to run such transactions serially, to be sure that there aren't other concurrent transactions submitting the same data.
You may have to use use GET_LOCK() before starting your transaction to calculate the checksum. Then RELEASE_LOCK() after you commit. That will make sure other concurrent requests wait for your data to be committed, so they will see it when they try to calculate their checksum.

Related

MySQL InnoDB: Difference Between `FOR UPDATE` and `LOCK IN SHARE MODE`

What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.

optimistic locking user credit management

I have a central database for handling user credit with multiple servers reads and writes to it. The application sits on top of these servers serve user requests by doing the following for each request:
1. check if user has enough credit for the task by reading from db.
2. perform the time consuming request
3. deduct a credit from user account, save the new credit count back to db.
the application uses the database's optimistic locking. So following might happen
1. request a comes in, see that user x has enough credit,
2. request b comes in, see that user x has enough credit,
3. a performs work
4. a saves the new credit count back to db
5. b performs work
6. b tries to save the new credit count back to db, application gets an exception and fails to account for this credit deduction.
With pessimistic locking, the application will need to explicitly get a lock on the user account to guarantee exclusive access, but this KILLs performance since the system have many concurrent requests.
so what would be a good new design for this credit system?
Here are two "locking" mechanisms at avoid using InnoDB's locking mechanism for either of two reasons:
A task that takes longer than you should spend in a BEGIN...COMMIT of InnoDB.
A task that ends in a different program (or different web page) than it started in.
Plan A. (This assumes the race condition is rare, and the time wasted for Step 2 is acceptable in those rare cases.)
(same) check if user has enough credit for the task by reading from db.
(same) perform the time consuming request
(added) START TRANSACTION;
(added) Again Check if the user has enough credit. (ROLLABCK and abort if not.)
(same as old #3) deduct a credit from user account, save the new credit count back to db.
(added) COMMIT;
START..COMMIT is InnoDB transaction stuff. If a race condition caused 'x' to not have credit by step 4, you will ROLLBACK and not perform steps 4 and 5.
Plan B. (This is more complex, but you might prefer it.)
Have a table Locks for locking. It contains user_id and a timestamp.
START TRANSACTION;
If user_id is in Locks, abort (ROLLBACK and exit).
INSERT INTO Locks the user_id and current_timestamp in Locks (thereby "locking" 'x').
COMMIT;
Perform the processing (original Steps 1,2,3)
DELETE FROM Locks WHERE user_id = 'x'; (autocommit=1 suffices here.)
A potential problem: If the processing dies in step 6, not getting around to releasing the lock, that user will be locked out forever. The 'solution' is to periodically check Locks for any timestamps that are 'very' old. If any are found, assume that the processing died, and DELETE the row(s).
You didn't state explicitly what you want to achieve, so I assume you don't want to perform the work just to realise it has been in vain due to low credit.
No-lock
Implement credit hold on step (1) and associate the work (2) and the deduction (3) with the hold. This way low credit user won't pass step (1).
Optimistic locking
As a collision is detected in optimistic locking post factum, I don't think it fits the assumption.
Pessimistic locking
It isn't possible to tell definitely without knowing the schema, but I think it's an exaggeration about killing performance. You can smartly incorporate MySQL InnoDB transaction isolation levels and locking reads at finer granularity than exclusively locking a user account completely. For instance, using SELECT ... LOCK IN SHARE MODE which sets shared locks and allows reads for other transactions.
Rick's caution about the tasking taking longer then MySQL will wait (innodb_lock_wait_timeout) applies here.
You want The Escrow Transactional Method.
You record the credit left after doling out some to each updating process and the credit doled out to (ie held in escrow for) them. A process retries until success a transaction that increases the credit doled out by what it needs and decreases the credit that is left by what it needs; it succeeds only if that would leave the credit left non-negative. Then it does its long calculation. Regardless of the calculation's success it then applies a transaction that decreases the credit doled out. But on success it also increases the assets while on failure it increases the credit left.
Use the timestamp/rowversion approach that you will find in all real database engines except MySQL.
You can emulate them with MySQL in this way. Have a TIMESTAMP column (updated) that gets updated whenever a row is updated. Select that column along with the rest of the data you require. Use the returned timestamp as a condition in your WHERE clause so that the row will only be updated when the timestamp is still the same as when you read the row.
UPDATE table SET col1 = value WHERE id = 1 AND updated = timestamp_value_read
Now when you run the update and the timestamps do not match no update will be performed. You can test for this by using rows affected, if zero rows were updated then you know that the row was modified between read and write. Handle that condition in your code whichever way is best for your application and your users.
Timestamp tutorial

What are the consequences of accessing the same database from different programs at the same time?

I have a mysql database that is accessed using JDBC. If I access the database from two different programs at the same time then what effect will be there on the database?
Please tell in view of when both programs are reading the database, one is reading and the other is writing data and when both are writing data.
I think that when both programs write data then that would definitely lead to loss of data. But what happens in the other scenarios?
MySQL works on an ACID basis: http://en.wikipedia.org/wiki/ACID
Which means, both clients will be reading the database as if they were the only clients.
For this to happen each client must start a transaction, which is a single logical unit of work. Within this transaction either all the operations done to the database must be committed or rolled back.
Different RDBMSs have different defaults for their transaction support. For MySQL, the isolation level is REPEATABLE READ, which means that SELECT statements within the same transaction are consistent with respect to each other.
How you can verify this:
Have program1 going start a transaction and through every row and increasing a value, while the other program starts a transaction and goes through the database calculating the sum of the same value for all rows. When they are done, they close their transactions and print out the results. You will notice that both of them read the database as if they were isolated from each other.
There are whole books written about JDBC. Here are some links that can get you started:
JDBC Tutorial: http://docs.oracle.com/javase/tutorial/jdbc/
MySQL: http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html
Hopefully, MySQL like PostgreSQL, MariaDB or other major databases accept to be used by many programs, each being allowed to have many connections. And the database will not break even if multiple programs try to update the same row at the same time. But ... the how to do that is the problem of the client programs via transactions.
Welcome to the world of ACID transactions ! Within a transaction, the database guarantees that the program keeps a level of consistency. There is no problems for Atomicity, Consistency and Durability, but Isolation is a little more tedious. JDBC defines 4 level of isolation, plus no transaction at all (following extracted from The Java Tutorials : Using Transactions) :
The interface Connection includes five values that represent the transaction isolation levels you can use in JDBC:
Isolation Level Transactions Dirty Reads Non-Repeatable Reads/Phantom Reads
TRANSACTION_NONE Not supported Not applicable Not applicable Not applicable
TRANSACTION_READ_COMMITTED Supported Prevented Allowed Allowed
TRANSACTION_READ_UNCOMMITTED Supported Allowed Allowed Allowed
TRANSACTION_REPEATABLE_READ Supported Prevented Prevented Allowed
TRANSACTION_SERIALIZABLE Supported Prevented Prevented Prevented
Accessing an updated value that has not been committed is considered a dirty read because it is possible for that value to be rolled back to its previous value.
A non-repeatable read occurs when transaction A retrieves a row, transaction B subsequently updates the row, and transaction A later retrieves the same row again. Transaction A retrieves the same row twice but sees different data.
A phantom read occurs when transaction A retrieves a set of rows satisfying a given condition, transaction B subsequently inserts or updates a row such that the row now meets the condition in transaction A, and transaction A later repeats the conditional retrieval. Transaction A now sees an additional row. This row is referred to as a phantom.

MySQL: How to lock tables and start a transaction?

TL;DR - MySQL doesn't let you lock a table and use a transaction at the same time. Is there any way around this?
I have a MySQL table I am using to cache some data from a (slow) external system. The data is used to display web pages (written in PHP.) Every once in a while, when the cached data is deemed too old, one of the web connections should trigger an update of the cached data.
There are three issues I have to deal with:
Other clients will try to read the cache data while I am updating it
Multiple clients may decide the cache data is too old and try to update it at the same time
The PHP instance doing the work may be terminated unexpectedly at any time, and the data should not be corrupted
I can solve the first and last issues by using a transaction, so clients will be able to read the old data until the transaction is committed, when they will immediately see the new data. Any problems will simply cause the transaction to be rolled back.
I can solve the second problem by locking the tables, so that only one process gets a chance to perform the update. By the time any other processes get the lock they will realise they have been beaten to the punch and don't need to update anything.
This means I need to both lock the table and start a transaction. According to the MySQL manual, this is not possible. Starting a transaction releases the locks, and locking a table commits any active transaction.
Is there a way around this, or is there another way entirely to achieve my goal?
This means I need to both lock the table and start a transaction
This is how you can do it:
SET autocommit=0;
LOCK TABLES t1 WRITE, t2 READ, ...;
... do something with tables t1 and t2 here ...
COMMIT;
UNLOCK TABLES;
For more info, see mysql doc
If it were me, I'd use the advisory locking function within MySQL to implement a mutex for updating the cache, and a transaction for read isolation. e.g.
begin_transaction(); // although reading a single row doesnt really require this
$cached=runquery("SELECT * FROM cache WHERE key=$id");
end_transaction();
if (is_expired($cached)) {
$cached=refresh_data($cached, $id);
}
...
function refresh_data($cached, $id)
{
$lockname=some_deterministic_transform($id);
if (1==runquery("SELECT GET_LOCK('$lockname',0)") {
$cached=fetch_source_data($id);
begin_transaction();
write_data($cached, $id);
end_transaction();
runquery("SELECT RELEASE_LOCK('$lockname')");
}
return $cached;
}
(BTW: bad things may happen if you try this with persistent connections)
I'd suggest to solve the issue by removing the contention altogether.
Add a timestamp column to your cached data.
When you need to update the cached data:
Just add new cached data to your table using the current timestamp
Remove cached data older than, let's say, 24 hours.
When you need to serve the cached data
Sort by timestamp (DESC) and return the newest cached data
At any given time your clients will retrieve records which are never deleted by any other process. Moreover, you don't care if a client gets cached data belonging to different writes (i.e. with different timestamps)
The second problem may be solved without involving the database at all. Have a lock file for the cache update procedure so that other clients know that someone is already on it. This may not catch each and every corner case, but is it that big of a deal if two clients are updating the cache at the same time? After all, they are doing the update in transactions to the cache will still be consistent.
You may even implement the lock yourself by having the last cache update time stored in a table. When a client wants update the cache, make it lock that table, check the last update time and then update the field.
I.e., implement your own locking mechanism to prevent multiple clients from updating the cache. Transactions will take care of the rest.

Updating account balances with mysql

I have a field on a User table that holds the account balance for the user. Users can perform a lot of actions with my service that will result in rapid changes to their balance.
I'm trying to use mysql's serializable isolation level to make sure that multiple user actions will not update the value incorrectly. (Action A and action B simultaneously want to deduct 1 dollar from the balance.) However, I'm getting a lot of deadlock errors.
How do I do this correctly without getting all these deadlocks, and still keeping the balance field up to date?
simple schema: user has an id and a balance.
im using doctrine, so i'm doing something like the following:
$con->beginTransaction();
$tx = $con->transaction;
$tx->setIsolation('SERIALIZABLE');
$user = UserTable::getInstance()->find($userId);
$user->setBalance($user->getBalance() + $change);
$user->save();
$con->commit();
First trying to use serializable isolation level on your transaction is a good idea. It means you know at least a minimum what a transation is, and that the isolation level is one of the biggest problem.
Note that serializable is not really a true seriability. More on that on this previous answer, when you'll have some time to read it :-).
But the most important part is that you should consider that having automatic rollbacks on your transaction because of failed serialibility is a normal fact, and that the right thing to do is building your application so that transactions could fail and should be replayed.
One simple solution, and for accounting things I like this simple solution as we can predict all the facts, no suprises, so, one solution is to perform table locks. This is not a fine and elegant solution, no row levels locks, just simple big table locks (and always in the same order). After that you can do your operation as a single player and then release teh locks. Not multi user concurrency on the rows of the tables, no next-row magical locks fails (see previous link). This will certainly slow down your write operations, but if everybody performs the table locks in the same order you'll only get locks timeouts problems, no deadlocks and no 'unserializable auto-rollback'.
Edit
From your code sample I'm not sure you can set the transaction isolation level after the begin. You should activate query logs on MySQL and seewhat is done, then check that other transactions runned by the CMS are not still in the serializable level.