Consistency of shared data, invalidate write of stale data object - mysql

I have a scenario where I want to invalidate write operation. Briefing the scenario below:
There are two process running in altogether different systems and we have one common database. Let the process be termed as 'P1' and 'P2'. Each process can do read (R) and write (W) operaion. R1 and W1 show operation done by P1 and R2, W2 by P2. Lets take a common DB object (O).
Now operations are executing in following sequence :
R1 (read 'O' by process 'P1')
R2 (read 'O' by process 'P2')
W1 (write 'O' by process 'P1') -> this make 'O' of P2 dirty.
Now what I want is to fail P2 while performing its W2 operation as it contain old inconsistent object.
I have read few blogs of checking timestamp before persisting, but this is not the solution (as it is error prone even if it is matter of millisecs).
I want to know enterprise level solution.
Also I will like to know how it can be achieved by using 3rd party solution like hibernate.

You need optimistic locking. In hibernate you can provide optimistic locking with a separate numeric field annotated as #version for each entity. Each insert or update operation increments this field value by one and avoid stale data( with lower version value) from geting persisted to database.

Related

How interdependent db calls handled in transaction.atomic

I have two DB calls inside the transaction.atomic()
Sample codes
with transaction.atomic():
result1, created = SomeModel.objects.get_or_create(**somedata)
if not created:
flag = SomeOtherModel.objects.filter(somemodel=result1).exists()
if flag:
result1.some_attr = value1 if flag else value2
result1.save()
AFAIK about the transaction.atomic when my python codes do not cause an exception, so all the DB calls will be committed on the database. If any of the exceptions are raised inside the block, no database operation will be committed to the database.
So how is this thing handled when the DB call of one is used in the python logic to make other DB operations?
Didn't find this specific in the documentation, if any good source, please mention it.
Database transactions are a complex topic, I don't have the exact answer with linked documentation but from experience, I can say that you're good to use mutated or new values created within a transaction within the same transaction. The simple explanation for a transaction is it ensures a series of commands either succeed or fail entirely so your database isn't left in a partial in-complete state, in between a transaction your experience with the database at least from an ORM perspective should remain the same.
Here's a good StackOverflow post I found with some good conversations around it: Database transactions - How do they work?

Consistent read/write on Aurora Serverless Mysql

Having distributed serverless application, based on AWS Aurora Serverless MySQL 5.6 and multiple Lambda functions. Some of Lambdas represent writing threads, another are reading treads. For denoting most important details, lets suppose that there is only one table with following structure:
id: bigint primary key autoincrement
key1: varchar(700)
key2: bigint
content: blob
unique(key1, key2)
Writing threads perform INSERTs in following manner: every writing thread generates one entry with key1+key2+content, where key1+key2 pair is unique, and id is generating automatically by autoincrement. Some writing threads can fail by DUPLICATE KEY ERROR, if key1+key2 will have repeating value, but that does not matter and okay.
There also some reading threads, which are polling table and tries to process new inserted entries. Goal of reading thread is retrieve all new entries and process them some way. Amount of reading threads is uncontrolled and they does not communicate with each other and does not write anything in table above, but can write some state in custom table.
Firstly it's seems that polling is very simple - it's enough to reading process to store last id that has been processed, and continue polling from it, e.g. SELECT * FROM table WHERE id > ${lastId}. Approach above works well on small load, but does not work with high load by obvious reason: there are some amount of inserting entries, which have not yet appeared in the database, because cluster had not been synchronized at this point.
Let's see what happens in cluster point of view, event if it consists of only two servers A and B.
1) Server A accepts write transaction with entry insertion and acquired autoincrement number 100500
2) Server B accepts write transaction with entry insertion and acquired autoincrement number 100501
3) Server B commits write transaction
4) Server B accepts read transaction, and returns entries with id > 100499, which is only 100501 entry.
5) Server A commits write transaction.
6) Reading thread receives only 100501 entry and moves lastId cursor to 100501. Entry 100500 is lost for current reading thread forever.
QUESTION: Is there way to solve problem above WITHOUT hard-lock tables on all cluster, in some lock-less aware way or something similar?
The issue here is that the local state in each lambda (thread) does not reflect the global state of said table.
As a first call I would try to always consult the table what is the latest ID before reading the entry with that ID.
Have a look at built in function LAST_INSERT_ID() in MySQL.
The caveat
[...] the most recently generated ID is maintained in the server on a
per-connection basis
Your lambda could be creating connections prior to handler function / method which would make them longer living (it's a known trick, but it's not bomb proof here), but I think new simultaneously executing lambda function would be given a new connection, in which case the above solution would fall apart.
Luckily what you have to do then is to wrap all WRITES and all READS in transactions so that additional coordination will take place when reading and writing simultaneously to the same table.
In your quest you might come across transaction isolation levels and SEERIALIZEABLE would be safest and least perfomant, but apparently AWS Aurora does not support it (I had not verified that statement).
HTH

Thread safety in Slick

I have a general understanding question about how Slick/the database manage asynchronous operations. When I compose a query, or an action, say
(for {
users <- UserDAO.findUsersAction(usersInput.map(_.email))
addToInventoriesResult <- insertOrUpdate(inventoryInput, user)
deleteInventoryToUsersResult <- inventoresToUsers.filter(_.inventoryUuid === inventoryInput.uuid).delete if addToInventoriesResult == 1
addToInventoryToUsersResult <- inventoresToUsers ++= users.map(u => DBInventoryToUser(inventoryInput.uuid, u.uuid)) if addToInventoriesResult == 1
} yield(addToInventoriesResult)).transactionally
Is there a possibility that another user can for example remove the users just after the first action UserDAO.findUsersAction(usersInput.map(_.email)) is executed, but before the rest, such that the insert will fail (because of foreign key error)? Or a scenario that can lead to a lost update like: transaction A reads data, then transaction B updates this data, then transaction A does an update based on what it did read, it will not see B's update an overwrite it
I think this probably depends on the database implementation or maybe JDBC, as this is sent to the database as a block of SQL, but maybe Slick plays a role in this. I'm using MySQL.
In case there are synchronisation issues here, what is the best way to solve this?. I have read about approaches like a background queue that processes the operations sequentially (as semantic units), but wouldn't this partly remove the benefit of being able to access the database asynchronously -> have bad performance?
First of all, if the underlying database driver is blocking (the case with JDBC based drivers) then Slick cannot deliver async peformance in the truly non-blocking sense of the word (i.e. a thread will be consumed and blocked for however long it takes for a given query to complete).
There's been talk of implementing non-blocking drivers for Oracle and SQL Server (under a paid Typesafe subscription) but that's not happening any time soon AFAICT. There are a couple of projects that do provide non-blocking drivers for Postegres and MySQL, but YMMV, still early days.
With that out of the way, when you call transactionally Slick takes the batch of queries to execute and wraps them in a try-catch block with underlying connection's autocommit flag set to false. Once the queries have executed successfully the transaction is committed by setting autocommit back to the default, true. In the event an Exception is thrown, the connection's rollback method is called. Just standard JDBC session boilerplate that Slick conveniently abstracts away.
As for your scenario of a user being deleted mid-transaction and handling that correctly, that's the job of the underlying database/driver.

Is there ever a reason to use a database transaction for read only sql statements?

As the question says, is there ever a reason to wrap read-only sql statements in a transaction? Obviously updates require transactions.
You still need a read-lock on the objects you operate on. You want to have consistent reads, so writing the same records shouldn't be possible while you're reading them...
If you issue several SELECT statements in a single transaction, you will also produce several read-locks.
SQL Server has some good documentation on this (the "read-lock" is called shared lock, there):
http://msdn.microsoft.com/en-us/library/aa213039%28v=sql.80%29.aspx
I'm sure MySQL works in similar ways
Yes, if it's important that the data is consistent across the select statements run. For instance if you were getting the balance of several bank accounts for a user, you wouldn't want the balance values read to be inconsistent. Eg if this happened:
With balance values B1=10 and B2=20
Your code reads B1= 10.
Transaction TA1 starts on another DB client
TA1 writes B1 to 20, B2 to 10
TA1 commits
Your code reads B2 = 10
So you now think that B1 is 10 and B2 is 10, which could be displayed to the user and that says that $10 has disappeared!
Transactions for reading will prevent this, since we would read B2 as 20 in step 5 (assuming a multiversioning concurrency control DB, which mysql+innodb is).
MySQL 5.1, with the innodb engine has a default transaction isolation level which is REPEATABLE READS. So if you perform your SELECT inside a transaction no Dirty reads or Nonrepeatable reads can happen. That means even with transaction commiting between two of your queries you'll always get a consistent database. In theory in REPEATABLE READS you couls only fear phantom reads, but with innodb this cannot even occurs. So by simply opening a Transaction you can assume database consistency (coherence) and perform as much select as you want without fearing parallel-running-and-ending write transactions.
Do you have any interest in having such a big consistency constraint? Well it depends of what you're doing with your queries. having inconsistent reads means that if one of your query is based on a result from a previous one you may have problems:
if you're performing only one query you do not care, at all
if none of your queries assumes a result from a previous one, do not care
if you never re-read a record in the same session, same thing
if you always read dependencies of your main record in the same query and do not use lazy loading, no problem
if a small inconsistency between your first and last query will not break your code, then forget about it. But be careful, this can make a very hard to debug application bug (and hard to reproduce). So get a robust application code, something which could maybe handle databases errors and crash nicely (or not even crash) when this occurs (2 time in one year?).
if you show critical data (I mean bank accounts and not blogs or chats), then you should maybe care about it
if you have a lot of write operations, then you increase the risk of inconsistent reads, you may need to add transactions at least on some key points
you may need to test impact on performances, having all read requests in transactions, when several write transactions are really altering the data, is certainly slowing the engine, he needs to handle several versions of the data. So you shoul dcheck if the impact is not too big for your application

MySql: transactions don't detect deadlocks?

Consider the following perl code:
$schema->txn_begin();
my $r = $schema->resultset('test1')->find({id=>20});
my $n = $r->num;
$r->num($n+1);
print("updating for $$\n");
$r->update();
print("$$ val: ".$r->num."\n");
sleep(4);
$schema->txn_commit();
I'm expecting that since the update is protected by a transaction, then if two processes try to update the "num" field, the second should fail with some error because it lost the race. Interbase calls this a "deadlock" error. MySQL, however will pause on the update() call, but will happily continue on after the first one has called the commit. The second process then has the "old" value of num, causing the increment to be incorrect. Observe:
$ perl trans.pl & sleep 1 ; perl trans.pl
[1] 5569
updating for 5569
5569 val: 1015
updating for 5571
5571 val: 1015
[1]+ Done perl trans.pl
the result value is "1015" in both cases. How can this be correct?
Assuming you're using InnoDB as your storage engine, this is the behavior I would expect. The default transaction isolation level for InnoDB is REPEATABLE READ. This means that when you perform your SELECT, the transaction gets a snapshot of the database at that particular time. The snapshot will not include updated data from other transactions that have not yet committed. Since the SELECT in each process happens before either commits, they'll each see the database in the same state (with num = 1014).
To get the behavior you seem to be expecting, you should follow the suggestion of Lluis and perform a SELECT ... FOR UPDATE to lock the row you're interested in. To do that, change this line
my $r = $schema->resultset('test1')->find({id=>20});
to this
my $r = $schema->resultset('test1')->find({id=>20}, {for=>'update'});
and rerun your test.
If you're not familiar with the intricacies of transactions in MySQL, I highly suggest you read the section in the docs about the InnoDB Transaction Model and Locking. Also, if you haven't already, read the DBIC Usage Notes regarding transactions and AutoCommit very carefully as well. The way the txn_ methods behave when AutoCommit is on or off is a bit tricky. If you're up for it, I would also suggest reading the source. Personally, I had to read the source to fully understand what DBIC was doing so that I could get the exact behavior I wanted.
Try storing the $r->num in a mysql variable instead of perl.
Sorry I don't know Perl, but basically what you want is
START TRANSACTION;
SELECT num INTO #a FROM test1 where id = 20;
UPDATE test1 set num=(#a+1) WJERE id=20;
COMMIT;
This is not a deadlock, a deadlock is something like this:
Tx1
1- updates R1 => write lock on R1
2- updates R2 => write lock on R2
Tx 2
1- updates R2
2- updates R1
If tx1 and tx2 execute simultaneously, it may happen that tx1 waits for the lock on R2 to be free, and tx2 waits for the lock on R1.
In your case, you need to lock the row with id=20 ( using select for update ). The tx arriving "late" will wait a certain amount of time (defined by the db engine) to follow.