I have mainly been using MyISAM as a storage engine in the past and have only recently been using InnoDB more; and now I'm at a point where I am really starting to work with InnoDB's locking and isolation levels.
I have been reading the documentation and one thing that concerns me is that it states:
InnoDB automatically detects transaction deadlocks and rolls back a transaction or transactions to break the deadlock.
So in other words some code that was supposed to run got rolled back because of a deadlock and all of a sudden your data integrity is out because said code didn't run!?
They also state that:
Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
Trouble is it doesn't explain how to re-issue the queries or test if they failed because of a deadlock?
This appears to me as a significant issue that some of your code that you expect to run (queries be executed) may be rolled back and not re-issued) without you putting in extra code to avoid this. Shouldn't this be automatic?
So can someone explain to me here what the best way to handle this is or if I am misunderstanding something.
some code that was supposed to run got rolled back because of a deadlock
True. Hence your next quote about needing to rerun. Rerunning the transaction involves something your code to go back to the START TRANSACTION and try again. The re-issuing is not automatic; you do need extra code.
Be sure to check for errors, even on BEGIN and COMMIT.
As for what the code looks like... That depends on the API you are using. Some already have try/catch syntax; some do not.
Be careful not to get in an infinite loop. (Example, if you "loop until no error", and the error is something other than "deadlock", such as "connection lost".)
If you never have more than one user connecting at a time, deadlocks are impossible, but other errors, some transient, are possible.
As for isolation levels, I suggest leaving it at the default. Only if you get into high rates of transactions and are doing special things might you need to change the level.
Related
When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)
When working with database transactions, what are the possible conditions (if any) that would cause the final COMMIT statement in a transaction to fail, presuming that all statements within the transaction already executed without issue?
For example... let's say you have some two-phase or three-phase commit protocol where you do a bunch of statements, then wait for some master process to tell you when it is ok to finally commit the transaction:
-- <initial handshaking stuff>
START TRANSACTION;
-- <Execute a bunch of SQL statements>
-- <Inform master of readiness to commit>
-- <Time passes... background transactions happening while we wait>
-- <Receive approval to commit from master (finally!)>
COMMIT;
If your code gets to that final COMMIT statement and sends it to your DBMS, can you ever get an error (uniqueness issue, database full, etc) at that statement? What errors? Why? How do they appear? Does it vary depending on what DBMS you run?
COMMIT may fail. You might have had sufficent resources to log all the changes you wished to make, but lack resources to actually implement the changes.
And that's not considering other reasons it might fail:
The change itself might not fit the constraints of the database.
Power loss stops things from completing.
The level of requested selection concurrency might disallow an update (cursors updating a modified table, for example).
The commit might time out or be on a connection which times out due to starvation issues.
The network connection between the client and the database may be lost.
And all the other "simple" reasons that aren't on the top of my head.
It is possible for some database engines to defer UNIQUE index constraint checking until COMMIT. Obviously if the constraint does not hold true at the time of commit then it will fail.
Sure.
In a multi-user environment, the COMMIT may fail because of changes by other users (e.g. your COMMIT would violate a referential constraint when applied to the now current database...).
Thomas
If you're using two-phase commit, then no. Everything that could go wrong is done in the prepare phase.
There could still be network outage, power less, cosmic rays, etc, during the commit, but even so, the transactions will have been written to permanent storage, and if a commit has been triggered, recovery processes should carry them through.
Hopefully.
Certainly, there could be a number of issues. The act of committing, in and of itself, must make some final, permanent entry to indicate that the transaction committed. If making that entry fails, then the transaction can't commit.
As Ignacio states, there can be deferred constraint checking (this could be any form of constraint, not just unique constraint, depending on the DBMS engine).
SQL Server Specific: flushing FILESTREAM data can be deferred until commit time. That could fail.
One very simple and often overlooked item: hardware failure. The commit can fail if the underlying server dies. This might be disk, cpu, memory, or even network related.
The transaction could fail if it never receives approval from the master (for any number of reasons).
No matter how wonderfully a system may be designed, there is going to be some possibility that a commit will get into a situation where it's impossible to know whether it succeeded or not. In some cases, it may not matter (e.g. if a hard drive holding the database turns into a pile of slag, it may be impossible to tell whether the commit succeeded or not before that occurred but it wouldn't really matter); in others cases, however, this could be a problem. Especially with distributed database systems, if a connection failure occurs at just the right time during a commit, it will be impossible for both sides to be certain of whether the other side is expecting a commit or a rollback.
With MySQL or MariaDB, when used with Galera clustering, COMMIT is when the other nodes in the cluster are checked. So, yes important errors can be discovered by COMMIT, and you must check for these errors.
When receiving so called IPN message from PayPal, I need to update a row in my database.
The issue is that I need perfect reliability.
Currently I use InnoDB. I am afraid that the transaction may fail due a race condition.
Should I use LOCK TABLES? Any other reliable solution?
Should I check for a failure and repeat the transaction several (how many?) times?
You cannot reliably make a distributed process (like adding a row locally and notifying the server remotely) perfectly reliable, no matter the order. This is a lot like the Two General's Problem: there is no single event which can denote the successful completion of the transaction on both sides simultaneously, as any message might get lost along the way.
I'm not sure I understand your issue correctly, but perhaps the following would work: Write a line to some table noting the fact that you are going to verify a given message. Then do the verification, and afterwards write a line to the database about the result of that verification. In the unlikely but important scenario that something broke in between, you will have an intent line with no matching result line. You can then detect such situations and recover from them manually.
On your local database, you'd have single row updates, which you may execute in their own transaction, probably even with autocommit turned on. You have to make sure that the first write is actually committed to disk (and preferrably a binary log on some other disk as well) before you start talking to the PayPal server, but I see no need for locking or similar. You migt want to retry failed transactions, I'd say up to three times, but the important thing is that in the end you can have admin intervention to fix anything your code can't handle.
As per the documentation link given below:
When a lock wait timeout occurs, the current statement is not executed. The current transaction is not rolled back. (Until MySQL 5.0.13 InnoDB rolled back the entire transaction if a lock wait timeout happened. You can restore this behavior by starting the server with the --innodb_rollback_on_timeout option, available as of MySQL 5.0.32.
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout
Does it mean that when a lock wait timeout occurs, it compromises the transactional integrity?
"roollback on timeout" was the default behaviour till 5.0.13 and I guess that was the correct way to handle such situations. Does anyone think that this should be the default behaviour and the user should not be asked to add a parameter for a functionality that is taken for granted?
It does not compromise referential integrity - it just gives you a chance to either retry, or do something else like commit work completed so far, or rollback.
For small transactions, and for simplicity, you might as well switch on the rollback-on-timeout option. However, if you are running transactions over many hours, you might appreciate the chance to react to a timeout.
i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.