Relational DB racing conditions - mysql

I'm working with Ruby On Rails (but it doesn't really matter) with a SQL backend, either MySQL or Postgres.
The web application will be multi-process, with a cluster of app-server processes running and working on the same DB.
I was wondering: is there any good and common strategy to handle racing conditions?
Since it's going to be a DB-intense application, I can easily see how two clients can try to modify the same data at the same time.
Let's simplify the situation:
Two clients/users GET the same data, it doesn't matter if this happens at the same time.
They are served with two web pages representing the same data.
Later both of them try to write some incompatible modifications to the same record.
Is there a simple way to handle this kind of situation?
I was thinking of using id-tokens associated with each record. This tokens would be changed upon updates of the records, thus invalidating any subsequent update attempt based on stale data (old expired token).
Is there a better way? Maybe something already built in MySQL?
I'm also interested in coding patterns used in this cases.
thanks

Optimistic locking
The standard way to handle this in webapps is to use what's referred to as "optimistic locking".
Each record has a unique ID and an integer (or timestamp, but integer is better) optimistic lock field. This oplock filed is initialized to 0 on record creation.
When you get the record you get the oplock field with it.
When you set the record you set the oplock value to the oplock you retrieved with the SELECT plus one and you make the UPDATE conditional on the oplock value still being what it was when you last looked:
UPDATE thetable
SET field1 = ...,
field2 = ...,
oplock = 1
WHERE record_id = ...
AND oplock = 0;
If you lost a race with another session this statement will still succeed but it will report zero rows affected. That allows you to tell the user their change collided with changes by another user or to merge their changes and re-send, depending on what makes sense in that part of the app.
Many frameworks provide tooling to help automate this, and most ORMs can do it out of the box. Ruby on Rails supports optimistic locking.
Be careful when combining optimistic locking with pessimistic locking (as described below) for traditional applications. It can work, you just need to add a trigger on all optimistically lockable tables that increments the oplock column on an UPDATE if the UPDATE statement didn't do so its self. I wrote a PostgreSQL trigger for Hibernate oplock support that should be readily adaptable to Rails. You only need this if you're going to update the DB from outside Rails, but in my view it's always a good idea to be safe.
Pessimistic locking
The more traditional approach to this is to begin a transaction and do a SELECT ... FOR UPDATE when fetching a record you intend to modify. You then hold the transaction open and idle while the user ponders what they're going to do and issue the UPDATE on the already-locked record before COMMITting.
This doesn't work well and I don't recommend it. It requires an open, often idle transaction for each user. This can cause problems with MVCC row cleanup in PostgreSQL and can cause locking problems in applications. It's also very inefficient for large applications with high user counts.
Insert races
Dealing with races on INSERT requires you to have a suitable application level unique key on the table, so inserts fail when they conflict.

Related

How does a lock work for two inserts in MySQL?

Let's say isolation level is Repeatable Read as it's really is as default for MySQL.
I have two inserts (no checking, no unique columns).
a) Let's say these two inserts happen at the same moment. What will happen? Will it first run the first insert and the second or both of them in different MySQL's threads?
b) Let's say I have insert statement and column called vehicle_id as unique, but before that, I check if it exists or not. If it doesn't exist, go on and insert. Let's say two threads in my code both come at the same moment. So they will both go into if statement since they happened at the same moment.
Now, they both have to do insert with the same vehicle_id. How does MySQL handle this? If it's asynchronous or something, maybe both inserts might happen so quickly that they will both get inserted even though vehicle_id was the same as unique field. If it's not asynchronous or something, one will get inserted first, second one waits. When one is done, second one goes and tries to insert, but it won't insert because of unique vehicle_id restriction. How does this situation work?
I am asking because locks in repeatable read for INSERT lose their essence. I know how it's going to work for Updating/Selecting.
As I understand it the situation is:
a) the threads are assigned for each connection. If both updates are received on the same connection then they will be executed in the same thread, one after the other according to the order in whcih they are received. If they're in different threads then it will be down to whichever thread is scheduled first and that's likely to be OS determined and non-deterministic from your point of view.
b) if a column is defined as UNIQUE at the server, then you cannot insert a second row with the same value so the second insert must fail.
Trying to use a conflicting index in the way you described appears to be an application logic problem, not a MySQL problem. Whatever entity is responsible for your unique ID's (which is your application in this case) it needs to ensure that they are unique. One approach is to implement an Application Lock using MySQL which allows applications running in isolation from each other to share a lock at the server. Check in the mysql docs for how to use this. It's usual use is intended to be application level - therefore not binding on the MySQL server. Another approach would be to use Uuids for unique keys and rely on their uniqueness when you need to create a new one.

MySQL query synchronization/locking question

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

MySQL Under High Load, Race Condition?

I am experiencing what appears to be the effects of a race condition in an application I am involved with. The situation is as follows, generally, a page responsible for some heavy application logic is following this format:
Select from test and determine if there are rows already matching a clause.
If a matching row already exists, we terminate here, otherwise we proceed with the application logic
Insert into the test table with values that will match our initial select.
Normally, this works fine and limits the action to a single execution. However, under high load and user-abuse where many requests are intentionally sent simultaneously, MySQL allows many instances of the application logic to run, bypassing the restriction from the select clause.
It seems to actually run something like:
select from test
select from test
select from test
(all of which pass the check)
insert into test
insert into test
insert into test
I believe this is done for efficiency reasons, but it has serious ramifications in the context of my application. I have attempted to use Get_Lock() and Release_Lock() but this does not appear to suffice under high load as the race condition still appears to be present. Transactions are also not a possibility as the application logic is very heavy and all tables involved are not transaction-capable.
To anyone familiar with this behavior, is it possible to turn this type of handling off so that MySQL always processes queries in the order in which they are received? Is there another way to make such queries atomic? Any help with this matter would be appreciated, I can't find much documented about this behavior.
The problem here is that you have, as you surmised, a race condition.
The SELECT and the INSERT need to be one atomic unit.
The way you do this is via transactions. You cannot safely make the SELECT, return to PHP, and assume the SELECT's results will reflect the database state when you make the INSERT.
If well-designed transactions (the correct solution) are as you say not possible - and I still strongly recommend them - you're going to have to make the final INSERT atomically check if its assumptions are still true (such as via an INSERT IF NOT EXISTS, a stored procedure, or catching the INSERT's error in the application). If they aren't, it will abort back to your PHP code, which must start the logic over.
By the way, MySQL likely is executing requests in the order they were received. It's possible with multiple simultaneous connections to receive SELECT A,SELECT B,INSERT A,INSERT B. Thus, the only "solution" would be to only allow one connection at a time - and that will kill your scalability dead.
Personally, I would go about the check another way.
Attempt to insert the row. If it fails, then there was already a row there.
In this manner, you check or a duplicate and insert the new row in a single query, eliminating the possibility of races.

How to guarantee constraint between SQL rows?

We are developing online schedule application. One schedule can be edited simultaneously by several users. There is one very important business constraint. There must be only three events in one day.
Technically speaking and simplifying, there is a table in database with columns: | id | event | date |. Application runs in transaction "select... count... where..." and if result is less than 3, it inserts new event.
Wich approaches can be used to guarantee that two threads will not create four events in one day? This is a classical check-and-write problem. And we wonder how it can be solved on database level?
Using transactions doesn`t guarantee that in the second transaction another thread will not do the same: checks that number of events is less than 3 and makes insert. Locking the whole table is not acceptable because it will reduce response time, concurrency, etc.
Application is developed in Java using Spring, Hibernate, MySQL.
Thanks in advance for any pieces of advice.
For blocking process you should use Select ... FOR UPDATE statement. But this one works only in innodb.
Example:
//java logic
try {
//mysql logic
start transaction;
select * from where 'some condition' FOR UPDATE
INSERT INTO TABLE ....
commit;
//java logic
catch (Exception e) {
rollback;
}
See more info about specific row locking http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
With your data model you could use a check constraint to count the number of rows. AFAIK MySQL doesn't natively support this type of constraint, but it looks like it's possible to emulate them with a trigger
Alternatively you could consider a different data model with a days table and an events table. You could use optimistic locking of days to ensure that a second transaction didn't have an out of date understanding of the data.
Since you are going through Spring, and since there is a concurrency issue, try synchronizing execution at the Java layer, rather than at the DB layer. We've had similar issues when trying to use a DB to maintain concurrency.
Perhaps you could make the execution block in Java synchronized so that it forces execution to block; inside the synchronized method, check that all of your business logic returns true. if true, continue with normal execution. If false, abort with an exception.

MySQL table locking for a multi user JSP/Servlets site

Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()