How does transaction suspension work in MySQL? - mysql

In the Spring Framework manual they state that for a PROPAGATION_REQUIRES_NEW the current transaction will be suspended.
What does that "suspended transaction"?
The timer for the timeout stops counting on the current transaction?
What are the actual implication of such suspension?
Thank you,
Asaf

It doesn't mean anything special, a suspended transaction is just a transaction that is temporarily not used for inserts, updates, commit or rollback, because a new transaction should be created due to the specified propagation properties, and only one transaction can be active at the same time.
Basically there are two transaction models: the nested and flat model. In the nested model, if you start a transaction, and you need an other one, the first one remains active, that is, the second one will be nested inside its parent, and so on. On the other hand, in the flat model, the first transaction will be suspended, that is, we won't use it until the new one has been completed.
AFAIK the flat model is used almost exclusively (including Spring and the EJB spec as well), since it's much easier to implement: there is only one active transaction at any given time, so it's easy to decide what to do in case of a rollback, say, because of an exception. More importantly, the underlying database has to support it if you need the nested model, so the flat model is just the common denominator in this case.

Related

Spring Data JPA - Best Way to Update Concurrently Accessed "Total" Field

(Using Spring Boot 2.3.3 w/ MySQL 8.0.)
Let's say I have an Account entity that contains a total field, and one of those account entities represents some kind of master account. I.e. that master account has its total field updated by almost every transaction, and it's important that any updates to that total field are done on the most recent value.
Which is the better choice within such a transaction:
Using a PESSIMISTIC_WRITE lock, fetch the master account, increment the total field, and commit the transaction. Or,
Have a dedicated query that essentially does something like, UPDATE Account SET total = total + x as part of the transaction? I'm assuming I'd still need the same pessimistic lock in this case for the UPDATE query, e.g. via #Query and #Lock.
Also, is it an anti-pattern to retry a failed transaction a set number of times due to a lock-acquisition timeout (or other lock-based exception)? Or is it better to let it fail, report it to the client, and let the client try to call the transaction/service again?
Apologies for the basic question, but, it's been some time since I've had to worry about doing such a thing in Spring.
Thanks in advance!
After exercising my Google Fu a bit more and digging even deeper, it seems variations of this question have already been asked, at least insofar as the 'locking' portion goes.
That is, while the Spring Data JPA docs mention redeclaring repository methods and adding the #Lock annotation, it seems that it is meant strictly for queries that read only. This is what I'd originally thought as it wouldn't make much sense to "lock" an UPDATE query unless there was some additional magic happening with the JPQL query.
As for retrying, retrying does seem to be the way to go, but of course using a number of retries that makes sense for the situation.
Hopefully this helps someone else in the future who has a brain cramp like I did.

Concurrent writes to MySQL and testing solutions

I was practicing some "system design" coding questions and I was interested in how to solve a concurrency problem in MySQL. The problem was "design an inventory checkout system".
Let's say you are trying to check out a specific item from an inventory, a library book for instance.
If two people are on the website, looking to book it, is it possible that they both check it out? Let's assume the query is updating the status of the row to mark a boolean checked_out to True.
Would transactions solve this issue? It would cause the second query that runs to fail (assuming they are the same query).
Alternatively, we insert rows into a checkouts table. Since both queries read that the item is not checked out currently, they could both insert into the table. I don't think a transaction would solve this, unless the transaction includes reading the table to see if a checkout currently exists for this item that hasn't yet ended.
One of the suggested methods
How would I simulate two writes at the exact same time to test this?
No, transactions alone do not address concurrency issues. Let's quickly revisit mysql's definition of transactions:
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
To sum it up: transactions are a way to ensure data integrity.
RDBMSs use various types of locking, isolation levels, and storage engine level solutions to address concurrency. People often mistake transactions as a mean to control concurrency because transactions affect how long certain locks are held.
Focusing on InnoDB: when you issue an update statement, mysql places an exclusive lock on the record being updated. Only the transaction holding the exclusive lock can modify the given record, the others have to wait until the transaction is committed.
How does this help you preventing multiple users checking out the same book? Let's say you have an id field uniquely identifying the books and a checked_out field indicating the status of the book.
You can use the following atomic update to check out a book:
update books set checked_out=1 where id=xxx and checked_out=0
The checked_out=0 criteria makes sure that the update only succeeds if the book is not checked out yet. So, if the above statement affects a row, then the current user checks out the book. If it does not affect any rows, then someone else has already checked out the book. The exclusive lock makes sure that only one transaction can update the record at any given time, thus serializing the access to that record.
If you want to use a separate checkouts table for reserving books, then you can use a unique index on book ids to prevent the same book being checked out more than once.
Transactions don't cause updates to fail. They cause sequences of queries to be serialized. Only one accessor can run the sequence of queries; others wait.
Everything in SQL is a transaction, single-statement update operations included. The kind of transaction denoted by BEGIN TRANSACTION; ... COMMIT; bundles a series of queries together.
I don't think a transaction would solve this, unless the transaction
includes reading the table to see if a checkout currently exists for
this item.
That's generally correct. Checkout schemes must always read availability from the database. The purpose of the transaction is to avoid race conditions when multiple users attempt to check out the same item.
SQL doesn't have thread-safe atomic test-and-set instructions like multithreaded processor cores have. So you need to use transactions for this kind of thing.
The simplest form of checkout uses a transaction, something like this.
BEGIN TRANSACTION;
SELECT is_item_available, id FROM item WHERE catalog_number = whatever FOR UPDATE;
/* if the item is not available, tell the user and commit the transaction without update*/
UPDATE item SET is_item_available = 0 WHERE id = itemIdPreviouslySelected;
/* tell the user the checkout succeeded. */
COMMIT;
It's clearly possible for two or more users to attempt to check out the same item more-or-less simultaneously. But only one of them actually gets the item.
A more complex checkout scheme, not detailed here, uses a two-step system. First step: a transaction to reserve the item for a user, rejecting the reservation if someone else has it checked out or reserved. Second step: reservation holder has a fixed amount of time to accept the reservation and check out the item, or the reservation expires and some other user may reserve the item.

How atomic are SQL transactions really in MySQL?

I wonder — if I have have an SQL transaction where in one statement I do a select query, and then in a later statement an update query, is it guaranteed that nothing has been changed by outside factors in between the two?
So I select a number of rows ⟵ in the transaction
Another procedure changes one of the rows ⟵ outside the transaction
Than I want to do the update ⟵ in the transaction
So can this happen? I know the total transaction either happens or it doesn't, but are all individual statements in 1 transaction also executed as 1 atomic unit where nothing can happen in between two different statements? Or is the only way to ensure that the database is locked in between the two statements by setting a manual table lock?
This problem is about this by the way: I transfer money from one user (the buyer) to another (the seller). However, the buyer already deposited money when placing the buy order. Now he may cancel this buy order at any moment. I will then give him back the deposited money. So now it can happen that I'm in the process of transferring the deposited money from the buyer to the seller, while the buyer cancels his order, and I give him back his money. So now the money is given to the buyer, AND to the seller. This requires some high level isolation right?
It depends on transaction isolation level.
Default isolation level on all of databases I have ever used is Read Committed and this level allows to see changes made by other committed transactions.
In contrast Serial or Snapshot isolation levels isolate current transaction from others but it does not scale as good as Read Committed.
You can change isolation level per transaction or globally on all of modern databases but I would not suggest to do it without a very good reason, Read Committed is a good isolation for typical use cases because it needs no locking for reads, Serial isolation uses heavy locking to make transactions serial instead of concurrent and it might not scale for typical uses cases.

How to prevent multiple workers from racing to process the same task?

I start this worker 10 times to give it a sense of concurrency:
class AnalyzerWorker
#queue = :analyzer
def self.perform
loop do
# My attempt to lock pictures from other worker instances that may
# try to analyze the same picture (race condition)
pic = Pic.where(locked: false).first
pic.update_attributes locked: true
pic.analyze
end
end
end
This code is actually still vulnerable to race condition, one of the reasons i think is because there's a gap of time between fetching the unlocked picture and actually locking it.
Maybe there's more reasons, any robust approach to prevent this?
Active Record provides optimistic locking and pessimistic locking.
In order to use optimistic locking, the table needs to have a column
called lock_version of type integer. Each time the record is updated,
Active Record increments the lock_version column. If an update request
is made with a lower value in the lock_version field than is currently
in the lock_version column in the database, the update request will
fail with an ActiveRecord::StaleObjectError.
Pessimistic locking uses a locking mechanism provided by the
underlying database. Using lock when building a relation obtains an
exclusive lock on the selected rows. Relations using lock are usually
wrapped inside a transaction for preventing deadlock conditions.
Code samples are provided in the referenced links...
Either should work but each need different implementations. From what you are doing, I'd consider pessimistic locking since the possibility of a conflict is relatively high.
Your current implementation is kind of a mixture of both however, as you indicated, it really doesn't solve the problem. You might be able to make yours work, but using the Active Record implementation makes sense.

Alternatives to LINQ To SQL on high loaded pages

To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying.
But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute.
I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request.
var stat = DB.Stats.First();
stat.Visits++;
// ....
DB.SubmitChanges();
But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed.
I found a solution, I created a stored procedure
UPDATE Stats SET Visits=Visits+1
It works well.
Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases.
So my question is, how to solve this problem? Are there any alternatives that can work here?
I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.
This isn't exactly a problem with Linq to SQL, per se, it's an expected result with optimistic concurrency, which Linq to SQL uses by default.
Optimistic concurrency means that when you update a record, you check the current version in the database against the copy that was originally retrieved before making any offline updates; if they don't match, report a concurrency violation ("row not found or changed").
There's a more detailed explanation of this here. There's also a fairly sizable guide on handling concurrency errors. Typically the solution involves simply catching ChangeConflictException and picking a resolution, such as:
try
{
// Make changes
db.SubmitChanges();
}
catch (ChangeConflictException)
{
foreach (var conflict in db.ChangeConflicts)
{
conflict.Resolve(RefreshMode.KeepCurrentValues);
}
}
The above version will overwrite whatever is in the database with the current values, regardless of what other changes were made. For other possibilities, see the RefreshMode enumeration.
Your other option is to disable optimistic concurrency entirely for fields that you expect might be updated. You do this by setting the UpdateCheck option to UpdateCheck.Never. This has to be done at the field level; you can't do it at the entity level or globally at the context level.
Maybe I should also mention that you haven't picked a very good design for the specific problem you're trying to solve. Incrementing a "counter" by repeatedly updating a single column of a single row is not a very good/appropriate use of a relational database. What you should be doing is actually maintaining a history table - such as Visits - and if you really need to denormalize the count, implement that with a trigger in the database itself. Trying to implement a site counter at the application level without any data to back it up is just asking for trouble.
Use your application to put actual data in your database, and let the database handle aggregates - that's one of the things databases are good at.
Use a producer/consumer or message queue model for updates that don't absolutely have to happen immediately, particularly status updates. Instead of trying to update the database immediately keep a queue of updates that the asp.net threads can push to and then have a writer process/thread that writes the queue to the database. Since only one thread is writing, there will be much less contention on the relevant tables/roles.
For reads, use caching. For high volume sites even caching data for a few seconds can make a difference.
Firstly, you could call DB.SubmitChanges() right after stats.Visits++, and that would greatly reduce the problem.
However, that still is not going to save you from the concurrency violation (that is, simultaneously modifying a piece of data by two concurrent processes). To fight that, you may use the standard mechanism of transactions. With LINQ-to-SQL, you use transactions by instantiating a TransactionScope class, thusly:
using( TransactionScope t = new TransactionScope() )
{
var stats = DB.Stats.First();
stats.Visits++;
DB.SubmitChanges();
}
Update: as Aaronaught correctly pointed out, TransactionScope is not going to help here, actually. Sorry. But read on.
Be careful, though, not to make the body of a transaction too long, as it will block other concurrent processes, and thus, significantly reduce your overall performance.
And that brings me to the next point: your very design is probably flawed.
The core principle in dealing with highly shared data is to design your application in such way that the operations on that data are quick, simple, and semantically clear, and they must be performed one after another, not simultaneously.
The one operation that you're describing - counting visits - is pretty clear and simple, so it should be no problem, once you add the transaction. I must add, however, that while this will be clear, type-safe and otherwise "good", the solution with stored procedure is actually a much preferred one. This is actually exactly the way database applications were being designed in ye olden days. Think about it: why would you need to fetch the counter all the way from the database to your application (potentially over the network!) if there is no business logic involved in processing it. The database server may increment it just as well, without even sending anything back to the application.
Now, as for other operations, that are hidden behind // ..., it seems (by your description) that they're somewhat heavy and long. I can't tell for sure, because I don't see what's there, but if that's the case, you probably want to separate them into smaller and quicker ones, or otherwise rethink your design. I really can't tell anything else with this little information.