MySql: transactions don't detect deadlocks? - mysql

Consider the following perl code:
$schema->txn_begin();
my $r = $schema->resultset('test1')->find({id=>20});
my $n = $r->num;
$r->num($n+1);
print("updating for $$\n");
$r->update();
print("$$ val: ".$r->num."\n");
sleep(4);
$schema->txn_commit();
I'm expecting that since the update is protected by a transaction, then if two processes try to update the "num" field, the second should fail with some error because it lost the race. Interbase calls this a "deadlock" error. MySQL, however will pause on the update() call, but will happily continue on after the first one has called the commit. The second process then has the "old" value of num, causing the increment to be incorrect. Observe:
$ perl trans.pl & sleep 1 ; perl trans.pl
[1] 5569
updating for 5569
5569 val: 1015
updating for 5571
5571 val: 1015
[1]+ Done perl trans.pl
the result value is "1015" in both cases. How can this be correct?

Assuming you're using InnoDB as your storage engine, this is the behavior I would expect. The default transaction isolation level for InnoDB is REPEATABLE READ. This means that when you perform your SELECT, the transaction gets a snapshot of the database at that particular time. The snapshot will not include updated data from other transactions that have not yet committed. Since the SELECT in each process happens before either commits, they'll each see the database in the same state (with num = 1014).
To get the behavior you seem to be expecting, you should follow the suggestion of Lluis and perform a SELECT ... FOR UPDATE to lock the row you're interested in. To do that, change this line
my $r = $schema->resultset('test1')->find({id=>20});
to this
my $r = $schema->resultset('test1')->find({id=>20}, {for=>'update'});
and rerun your test.
If you're not familiar with the intricacies of transactions in MySQL, I highly suggest you read the section in the docs about the InnoDB Transaction Model and Locking. Also, if you haven't already, read the DBIC Usage Notes regarding transactions and AutoCommit very carefully as well. The way the txn_ methods behave when AutoCommit is on or off is a bit tricky. If you're up for it, I would also suggest reading the source. Personally, I had to read the source to fully understand what DBIC was doing so that I could get the exact behavior I wanted.

Try storing the $r->num in a mysql variable instead of perl.
Sorry I don't know Perl, but basically what you want is
START TRANSACTION;
SELECT num INTO #a FROM test1 where id = 20;
UPDATE test1 set num=(#a+1) WJERE id=20;
COMMIT;

This is not a deadlock, a deadlock is something like this:
Tx1
1- updates R1 => write lock on R1
2- updates R2 => write lock on R2
Tx 2
1- updates R2
2- updates R1
If tx1 and tx2 execute simultaneously, it may happen that tx1 waits for the lock on R2 to be free, and tx2 waits for the lock on R1.
In your case, you need to lock the row with id=20 ( using select for update ). The tx arriving "late" will wait a certain amount of time (defined by the db engine) to follow.

Related

Unwanted delayed UPDATE through operating system scheduling?

We are running into a very strange problem with disjunct concurrent PHP processes accessing the same table (using table locks).
There is no replication involved, we're working on a monolith with the mysqli-interface of PHP 5.6.40 (I know, upgrading is due, we're working on it).
Let's say the initial value of a field namend "value" in xyz is 0;
PHP-Process 1: Modifies the table
LOCK TABLE xyz WRITE;
UPDATE xyz SET value = 1;
UNLOCK TABLE xyz;
PHP-Process 2: Depends on a value in that table (e.g. a check for access rights)
SELECT value from xyz;
Now, if we manage to make Process 2 halt and wait for the lock to be released, on a local dev-Environment (XAMPP, MariaDB 10.1.x), everything is fine, it will get the value 1;
BUT, on our production server (DebianLinux, MySQL 5.6.x) there is a seemingly necessary wait period for the value to materialize in query results.
An immediate SELECT statement delivers 0
sleep(1) then SELECT delivers 1
We always assumend that a) LOCK / UNLOCK will Flush Tables or b) A manual FLUSH TABLES xyz WITH READ LOCK will also flush caches, enforcing writing to the disc and generally will ensure that every following query of every other process will yield the expected result.
What we tried so far:
FLUSH TABLES as mentioned - no result
Explicitly acquire a LOCK before executing the SELECT statement - no result
Just wait some time - yielded the result we are looking for, but this is a dirty, unreliable solution.
What do you guys think? What might be the cause? I was thinking of: The query cache not updating in time, paging of the underlying OS not writing stuff back to the disk in time / not validating the memory page of the table data.
Is there any way you know to definitely assure consecutive consistentcy of the data?
There are different transaction isolation modes by default in the different MariadB versions.
You have set up the same mode if you expect the same result. It also seems weird to test it on different MySQL versions.
https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/
Your second process do start of transaction may be far before the commit actually issued.
If you do not want dig in transaction isolation just try do rollback before select(but correct solution is determine what exactly isolation your app require).
Rollback; -- may give error, but it is okay.
SELECT value from xyz;

Are MySQL InnoDB transactions serializable/atomic? [duplicate]

The PHP Documentation says:
If you've never encountered transactions before, they offer 4 major
features: Atomicity, Consistency, Isolation and Durability (ACID). In
layman's terms, any work carried out in a transaction, even if it is
carried out in stages, is guaranteed to be applied to the database
safely, and without interference from other connections, when it is
committed.
QUESTION:
Does this mean that I can have two separate php scripts running transactions simultaneously without them interfering with one another?
ELABORATING ON WHAT I MEAN BY "INTERFERING":
Imagine we have the following employees table:
__________________________
| id | name | salary |
|------+--------+----------|
| 1 | ana | 10000 |
|------+--------+----------|
If I have two scripts with similar/same code and they run at the exact same time:
script1.php and script2.php (both have the same code):
$conn->beginTransaction();
$stmt = $conn->prepare("SELECT * FROM employees WHERE name = ?");
$stmt->execute(['ana']);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
$salary = $row['salary'];
$salary = $salary + 1000;//increasing salary
$stmt = $conn->prepare("UPDATE employees SET salary = {$salary} WHERE name = ?");
$stmt->execute(['ana']);
$conn->commit();
and assuming the sequence of events is as follows:
script1.php selects data
script2.php selects data
script1.php updates data
script2.php updates data
script1.php commit() happens
script2.php commit() happens
What would the resulting salary of ana be in this case?
Would it be 11000? And would this then mean that 1 transaction will overlap the other because the information was obtained before either commit happened?
Would it be 12000? And would this then mean that regardless of the order in which data was updated and selected, the commit() function forced these to happen individually?
Please feel free to elaborate as much as you want on how transactions and separate scripts can interfere (or don't interfere) with one another.
You are not going to find the answer in php documentation because this has nothing to do with php or pdo.
Innodb table engine in mysql offers 4 so-called isolation levels in line with the sql standard. The isolation levels in conjunction with blocking / non-blocking reads will determine the result of the above example. You need to understand the implications of the various isolation levels and choose the appropriate one for your needs.
To sum up: if you use serialisable isolation level with autocommit turned off, then the result will be 12000. In all other isolation levels and serialisable with autocommit enabled the result will be 11000. If you start using locking reads, then the result could be 12000 under all isolation levels.
Judging by the given conditions (a solitary DML statement), you don't need a transaction here, but a table lock. It's a very common confusion.
You need a transaction if you need to make sure that ALL your DML statements were performed correctly or weren't performed at all.
Means
you don't need a transaction for any number of SELECT queries
you don't need a transaction if only one DML statement is performed
Although, as it was noted in the excellent answer from Shadow, you may use a transaction here with appropriate isolation level, it would be rather confusing. What you need here is table locking. InnoDB engine lets you lock particular rows instead of locking the entire table and thus should be preferred.
In case you want the salary to be 1200 - then use table locks.
Or - a simpler way - just run an atomic update query:
UPDATE employees SET salary = salary + 1000 WHERE name = ?
In this case all salaries will be recorded.
If your goal is different, better express it explicitly.
But again: you have to understand that transactions in general has nothing to do with separate scripts execution. Regarding your topic of race condition you are interested not in transactions but in table/row locking. This is a very common confusion, and you better learn it straight:
a transaction is to ensure that a set of DML queries within one script were executed successfully.
table/row locking is to ensure that other script executions won't interfere.
The only topic where transactions and locking interfere is a deadlock, but again - it's only in case when a transaction is using locking.
Alas, the "without interference" needs some help from the programmer. It needs BEGIN and COMMIT to define the extent of the 'transaction'. And...
Your example is inadequate. The first statement needs SELECT ... FOR UPDATE. This tells the transaction processing that there is likely to be an UPDATE coming for the row(s) that the SELECT fetches. That warning is critical to "preventing interference". Now the timeline reads:
script1.php BEGINs
script2.php BEGINs
script1.php selects data (FOR UPDATE)
script2.php selects data is blocked, so it waits
script1.php updates data
script1.php commit() happens
script2.php selects data (and will get the newly-committed value)
script2.php updates data
script2.php commit() happens
(Note: This is not a 'deadlock', just a 'wait'.)

Can Mysql Innodb handle heavy parallel processing

I have a Mysql system with a table of 1.7M records. This is a production system. It was previously Myisam & very resilient but as a test I have converted it to Innodb (and the php script) in the hope that it would run faster and row-level locking would make it even more resilient. It is serviced by 30 robots using PHP 7 CLI. Each of them scans the table for records that need to be updated, updates them then continues as part of the team until the job is done. They do this in chunks of 40 rows which means the script is run about 42,500 times.
But during testing I have noticed some features of Innodb transactions that I had not expected and seem to be showstoppers. Before I roll it back I thought I'd ask others of their views, whether I've completely got something wrong or to prove or disprove my findings. The issue centres around one db call (all search fields are indexed) below is pseudo-code:
update table set busy=$token where condition=true order by id $order limit $units
if affected rows != $units
do function to clear
return
else do stuff.....
endif
BEFORE
Under Myisam the result is that the robots each take a run at getting table level locks and just queue until they get them. This can produce bottlenecks but all are resolved within a minute.
AFTER
Under Innodb the call is ok for one robot but any attempt at multi-user working results in 'Lock wait timeout exceeded; try restarting transaction'.
Changing the wait_timeout / autocommit / tx_isolation makes no difference. Nor does converting this to a transaction and using:
begin
select .... for update
update
test
commit or rollback
It seems to me that:
1 Innodb creates an implicit transaction for all updates even if you don't set up a transaction. If these take too long then parallel processing is not possible.
2 Much more importantly,when Innodb locks rows it does not 'know' which rows it locked. You can't do:
begin
select 10 rows where condition=this for update
update the rows I locked
commit
You have to do two identical calls like this:
begin
select 10 rows where condition=this for update
update 10 rows where condition=this
commit
This is a recipe for deadlocks as robot1 may lock 40 rows, robot2 locks 40 others and so on but then robot1 then updates 40 rows which may be completely different from the ones it just locked. This will continue until all rows are locked and they cannot write back to the table.
So where I have 30 robots contending for chunks of rows that need updating it seems to me that Innodb is useless for my purposes. It is clever but not clever enough to handle heavy parallel processing.
Any thoughts...
Ponder this approach:
SET autocommit = 1;
Restart:
$left_off = 0;
Loop:
# grab one item:
BEGIN;
$id = SELECT id FROM tbl WHERE condition AND id > $left_off
ORDER BY id LIMIT 1 FOR UPDATE;
if nothing returned, you are end of table, COMMIT and GOTO Restart
UPDATE tbl SET busy = $token WHERE id = $id;
COMMIT;
do stuff
UPDATE tbl SET busy = $free WHERE id = $id; -- Release it
$left_off = $id;
Goto Loop
Notes:
It seems that the only reason to set busy is if "do stuff" hangs onto the row "too long". Am I correct?
I chose to lock only one at a time -- less complexity.
$left_off is to avoid scanning over lots of rows again and again. No, OFFSET is not a viable alternative.
BEGIN overrides autocommit. So that transaction lasts until COMMIT.
The second UPDATE is run with autocommit=1, so it is a transaction unto itself.
Be sure to tailor the number of robots -- too few = too slow; too many = too much contention. It is hard to predict the optimal value.
During my tests of Innodb v MyIsam I found that when I did resolve any contention issues the Innodb model was 40% slower than MyIsam. But, I do believe that with further tweaking this can be reduced so that it runs on a par with MyIsam.
What I did notice that MyIsam would queue 'waiting for table-level lock' indefinately which actually suited me but punished the hard disk. Whereas Innodb is a much more democratic process and the disk access is more even. I have rolled it back for the moment but will pursue the Innodb version in a few weeks with the adjustments I commented on above.
In answer to my own question: Yes Innodb can handle heavy parallel processing with a lot of tweaking and rationalizing your database design. Dissapointing that no one answered my question about whether Innodb record locking has an awareness of which records it locks.

Should I commit after a single select

I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?
There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.

mysql concurrent and identical transactions trouble

OK here's the basic idea of what is happening:
begin transaction
some_data=select something from some_table where some_condition;
if some_data does not exists or some_data is outdated
new_data = insert a_new_entry to some_table
commit transaction
return new_data
else
return some_data
end
When multiple processes execute the code above simultaneously(like the client issues a lot of identical requests at a same time), a lot of 'new_data' will be inserted while actually only one is needed.
I think it's a quite typical scenario of concurrency, but still I can't figure out a decent way to avoid it. Things I can think about maybe like having a single worker process to do the select_or_insert job, or maybe set the isolation level to Serializable(unacceptable). But neither is quite satisfactory to me.
PS: The database is mysql, table engine is innodb, and isolation level is repeatable read
In your initial SELECT, use SELECT ... FOR UPDATE.
This ensures that the record is locked against other clients reading it until the transaction has been committed (or rolled-back), so they wait on the SELECT command and do not continue through to the rest of the logic until the first client has completed its UPDATE.
Note that you will need to ROLLBACK the transaction in the else condition, or else the lock will continue blocking until the connection is dropped.