We are developing online schedule application. One schedule can be edited simultaneously by several users. There is one very important business constraint. There must be only three events in one day.
Technically speaking and simplifying, there is a table in database with columns: | id | event | date |. Application runs in transaction "select... count... where..." and if result is less than 3, it inserts new event.
Wich approaches can be used to guarantee that two threads will not create four events in one day? This is a classical check-and-write problem. And we wonder how it can be solved on database level?
Using transactions doesn`t guarantee that in the second transaction another thread will not do the same: checks that number of events is less than 3 and makes insert. Locking the whole table is not acceptable because it will reduce response time, concurrency, etc.
Application is developed in Java using Spring, Hibernate, MySQL.
Thanks in advance for any pieces of advice.
For blocking process you should use Select ... FOR UPDATE statement. But this one works only in innodb.
Example:
//java logic
try {
//mysql logic
start transaction;
select * from where 'some condition' FOR UPDATE
INSERT INTO TABLE ....
commit;
//java logic
catch (Exception e) {
rollback;
}
See more info about specific row locking http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
With your data model you could use a check constraint to count the number of rows. AFAIK MySQL doesn't natively support this type of constraint, but it looks like it's possible to emulate them with a trigger
Alternatively you could consider a different data model with a days table and an events table. You could use optimistic locking of days to ensure that a second transaction didn't have an out of date understanding of the data.
Since you are going through Spring, and since there is a concurrency issue, try synchronizing execution at the Java layer, rather than at the DB layer. We've had similar issues when trying to use a DB to maintain concurrency.
Perhaps you could make the execution block in Java synchronized so that it forces execution to block; inside the synchronized method, check that all of your business logic returns true. if true, continue with normal execution. If false, abort with an exception.
Related
Is there any good & performant alternative to FOR UPDATE SKIP LOCKED in mariaDB? Or is there any good practice to archieve job queueing in mariaDB?
Instead of using a lock to indicate a queue record is being processed, use an indexed processing column. Set it to 0 for new records, and, in a separate transaction from any processing, select a single not yet processing record and update it to 1. Possibly also store the time and process or thread id and server that is processing the record. Have a separate monitoring process to detect jobs flagged as processing that did not complete processing within the expected time.
An alternative that avoids even the temporary lock on a non-primary index needed to select a record is to use a separate, non-database message queue to notify you of new records available in the database queue. (Unless you won't ever care if a unit of work is processed more than once, I would always use a database table in addition to any non-database queue.)
DELETE FROM QUEUE_TABLE LIMIT 1 RETURNING *
for dequeue operations. Depending on your needs it might work ok
Update 2022-06-14:
MariaDB supports SKIP LOCKED now.
In my code I need to do the following:
Check a MySQL table (InnoDB) if a particular row (matching some criteria) exists. If it does, return it. If it doesn't, create it and then return it.
The problem I seem to have is race conditions. Every now and then two processes run so closely together, that they both check the table at the same time, don't see the row, and both insert it - thus duplicate data.
I'm reading MySQL documentation trying to come up with some way to prevent this. What I've come up so far:
Unique indexes seem to be one option, but they're not universal (it only works when the criteria is something unique for all rows).
Transactions even at SERIALIZABLE level don't protect against INSERT, period.
Neither do SELECT ... LOCK IN SHARE MODE or SELECT ... FOR UPDATE.
A LOCK TABLE ... WRITE would do it, but it's a very drastic measure - other processes won't be able to read from the table, and I need to lock ALL tables that I intend to use until I unlock them.
Basically, I'd like to do either of the following:
Prevent all INSERT to the table from processes other than mine, while allowing SELECT/UPDATE (this is probably impossible because it make so little sense most of the time).
Organize some sort of manual locking. The two processes would coordinate among themselves which one gets to do the select/insert dance, while the other waits. This needs some sort of operation that waits until the lock is released. I could probably implement a spin-lock (one process repeatedly checks if the other has released the lock), but I'm afraid that it would be too resource intensive.
I think I found an answer myself. Transactions + SELECT ... FOR UPDATE in an InnoDB table can provide a synchronization lock (aka mutex). Have all processes lock on a specific row in a specific table before they start their work. Then only one will be able to run at a time and the rest will wait until the first one finishes its transaction.
If two independent scripts call a database with update requests to the same field, but with different values, would they execute at the same time and one overwrite the other?
as an example to help ensure clarity, imagine both of these statements being requested to run at the same time, each by a different script, where Status = 2 is called microseconds after Status = 1 by coincidence.
Update My_Table SET Status = 1 WHERE Status= 0;
Update My_Table SET Status = 2 WHERE Status= 0;
What would my results be and why? if other factors play a roll, expand on them as much as you please, this is meant to be a general idea.
Side Note:
Because i know people will still ask, my situation is using MySql with Google App Engine, but i don't want to limit this question to just me should it be useful to others. I am using Status as an identifier for what script is doing stuff to the field. if status is not 0, no other script is allowed to touch it.
This is what locking is for. All major SQL implementations lock DML statements by default so that one query won't overwrite another before the first is complete.
There are different levels of locking. If you've got row locking then your second update will run in parallel with the first, so at some point you'll have 1s and 2s in your table.
Table locking would force the second query to wait for the first query to completely finish to release it's table lock.
You can usually turn off locking right in your SQL, but it's only ever done if you need a performance boost and you know you won't encounter race conditions like in your example.
Edits based on the new MySQL tag
If you're updating a table that used the InnoDB engine, then you're working with row locking, and your query could yield a table with both 1s and 2s.
If you're working with a table that uses the MyISAM engine, then you're working with table locking, and your update statements would end up with a table that would either have all 1s or all 2s.
from https://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html (MySql)
Normally, you do not need to lock tables, because all single UPDATE statements are atomic; no other session can interfere with any other currently executing SQL statement. However, there are a few cases when locking tables may provide an advantage:
from https://msdn.microsoft.com/en-us/library/ms177523.aspx (sql server)
An UPDATE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive lock, no other transactions can modify data.
If you were having two separate connections executing the two posted update statements, whichever statement was started first, would be the one that completed. THe other statement would not update the data as there would no longer be records with a status of 0
The short answer is: it depends on which statement commits first. Just because one process started an update statement before another doesn't mean that it will complete before another. It might not get scheduled first, it might be blocked by another process, etc.
Ultimately, it's a race condition: the operation that completes (and commits) last, wins.
Since you have TWO scripts doing the same thing and using different values for the UPDATE, they will NOT run at the same time, one of the scripts will run before even if you think you are calling them at the same time. You need to specify WHEN each script should run, otherwise the program will not know what should be 1 and what should be 2.
I'm working with Ruby On Rails (but it doesn't really matter) with a SQL backend, either MySQL or Postgres.
The web application will be multi-process, with a cluster of app-server processes running and working on the same DB.
I was wondering: is there any good and common strategy to handle racing conditions?
Since it's going to be a DB-intense application, I can easily see how two clients can try to modify the same data at the same time.
Let's simplify the situation:
Two clients/users GET the same data, it doesn't matter if this happens at the same time.
They are served with two web pages representing the same data.
Later both of them try to write some incompatible modifications to the same record.
Is there a simple way to handle this kind of situation?
I was thinking of using id-tokens associated with each record. This tokens would be changed upon updates of the records, thus invalidating any subsequent update attempt based on stale data (old expired token).
Is there a better way? Maybe something already built in MySQL?
I'm also interested in coding patterns used in this cases.
thanks
Optimistic locking
The standard way to handle this in webapps is to use what's referred to as "optimistic locking".
Each record has a unique ID and an integer (or timestamp, but integer is better) optimistic lock field. This oplock filed is initialized to 0 on record creation.
When you get the record you get the oplock field with it.
When you set the record you set the oplock value to the oplock you retrieved with the SELECT plus one and you make the UPDATE conditional on the oplock value still being what it was when you last looked:
UPDATE thetable
SET field1 = ...,
field2 = ...,
oplock = 1
WHERE record_id = ...
AND oplock = 0;
If you lost a race with another session this statement will still succeed but it will report zero rows affected. That allows you to tell the user their change collided with changes by another user or to merge their changes and re-send, depending on what makes sense in that part of the app.
Many frameworks provide tooling to help automate this, and most ORMs can do it out of the box. Ruby on Rails supports optimistic locking.
Be careful when combining optimistic locking with pessimistic locking (as described below) for traditional applications. It can work, you just need to add a trigger on all optimistically lockable tables that increments the oplock column on an UPDATE if the UPDATE statement didn't do so its self. I wrote a PostgreSQL trigger for Hibernate oplock support that should be readily adaptable to Rails. You only need this if you're going to update the DB from outside Rails, but in my view it's always a good idea to be safe.
Pessimistic locking
The more traditional approach to this is to begin a transaction and do a SELECT ... FOR UPDATE when fetching a record you intend to modify. You then hold the transaction open and idle while the user ponders what they're going to do and issue the UPDATE on the already-locked record before COMMITting.
This doesn't work well and I don't recommend it. It requires an open, often idle transaction for each user. This can cause problems with MVCC row cleanup in PostgreSQL and can cause locking problems in applications. It's also very inefficient for large applications with high user counts.
Insert races
Dealing with races on INSERT requires you to have a suitable application level unique key on the table, so inserts fail when they conflict.
I am experiencing what appears to be the effects of a race condition in an application I am involved with. The situation is as follows, generally, a page responsible for some heavy application logic is following this format:
Select from test and determine if there are rows already matching a clause.
If a matching row already exists, we terminate here, otherwise we proceed with the application logic
Insert into the test table with values that will match our initial select.
Normally, this works fine and limits the action to a single execution. However, under high load and user-abuse where many requests are intentionally sent simultaneously, MySQL allows many instances of the application logic to run, bypassing the restriction from the select clause.
It seems to actually run something like:
select from test
select from test
select from test
(all of which pass the check)
insert into test
insert into test
insert into test
I believe this is done for efficiency reasons, but it has serious ramifications in the context of my application. I have attempted to use Get_Lock() and Release_Lock() but this does not appear to suffice under high load as the race condition still appears to be present. Transactions are also not a possibility as the application logic is very heavy and all tables involved are not transaction-capable.
To anyone familiar with this behavior, is it possible to turn this type of handling off so that MySQL always processes queries in the order in which they are received? Is there another way to make such queries atomic? Any help with this matter would be appreciated, I can't find much documented about this behavior.
The problem here is that you have, as you surmised, a race condition.
The SELECT and the INSERT need to be one atomic unit.
The way you do this is via transactions. You cannot safely make the SELECT, return to PHP, and assume the SELECT's results will reflect the database state when you make the INSERT.
If well-designed transactions (the correct solution) are as you say not possible - and I still strongly recommend them - you're going to have to make the final INSERT atomically check if its assumptions are still true (such as via an INSERT IF NOT EXISTS, a stored procedure, or catching the INSERT's error in the application). If they aren't, it will abort back to your PHP code, which must start the logic over.
By the way, MySQL likely is executing requests in the order they were received. It's possible with multiple simultaneous connections to receive SELECT A,SELECT B,INSERT A,INSERT B. Thus, the only "solution" would be to only allow one connection at a time - and that will kill your scalability dead.
Personally, I would go about the check another way.
Attempt to insert the row. If it fails, then there was already a row there.
In this manner, you check or a duplicate and insert the new row in a single query, eliminating the possibility of races.