When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)
Related
I have mainly been using MyISAM as a storage engine in the past and have only recently been using InnoDB more; and now I'm at a point where I am really starting to work with InnoDB's locking and isolation levels.
I have been reading the documentation and one thing that concerns me is that it states:
InnoDB automatically detects transaction deadlocks and rolls back a transaction or transactions to break the deadlock.
So in other words some code that was supposed to run got rolled back because of a deadlock and all of a sudden your data integrity is out because said code didn't run!?
They also state that:
Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
Trouble is it doesn't explain how to re-issue the queries or test if they failed because of a deadlock?
This appears to me as a significant issue that some of your code that you expect to run (queries be executed) may be rolled back and not re-issued) without you putting in extra code to avoid this. Shouldn't this be automatic?
So can someone explain to me here what the best way to handle this is or if I am misunderstanding something.
some code that was supposed to run got rolled back because of a deadlock
True. Hence your next quote about needing to rerun. Rerunning the transaction involves something your code to go back to the START TRANSACTION and try again. The re-issuing is not automatic; you do need extra code.
Be sure to check for errors, even on BEGIN and COMMIT.
As for what the code looks like... That depends on the API you are using. Some already have try/catch syntax; some do not.
Be careful not to get in an infinite loop. (Example, if you "loop until no error", and the error is something other than "deadlock", such as "connection lost".)
If you never have more than one user connecting at a time, deadlocks are impossible, but other errors, some transient, are possible.
As for isolation levels, I suggest leaving it at the default. Only if you get into high rates of transactions and are doing special things might you need to change the level.
I have a problem with a Grails based application that is connected to MySQL where there is a process that updates a record as part of a larger transaction. This process also kicks off a 2nd thread via a Quartz job that will perform some additional changes. The Quartz job typically starts before the first thread commits the transaction therefore the job loops up to one minute checking for the record to change to the expected state. Oddly it works consistently in some environments, fails consistently in one and infrequently in yet another.
My question has to do with how MySQL recognizes transaction commits between two concurrent connections. One would expect that when connection A performs the commit, that subsequent queries from connection B would recognize the committed change. In my case connection B will have made the same query one or more times before connection A has made the commit. It appears that mySQL is caching the query results for the connection. Oddly enough, while connection B is repeatedly querying and getting the old value, I can issue the same query via the mysql client and see the new value. Does anyone aware of a caching issue or concurrency issues?
For the above observation I have the MySQL log enabled in order to see the individual update, commits and queries occurring.
The various environments are using different versions of MySQL as shown below. I'm in the process of upgrading my environments to the latest MySQL to see if that resolves it.
5.0.51a - two environments that have been very stable with infrequent occurrences however one environment started having increased occurrences over the weekend with moderate traffic.
5.1.55 - one environment consistently fails
Thanks,
John
I want to use mysql row level lock. I can't lock complete table. I want to avoid two process processing two different message for server at same time.
What I thought that I can have some table called:
server_lock and if one process start working on server it will insert a row in the table.
Problem with this approach is that if application crashes. We need to remove the lock manually.
Is there a way I may row level lock and lock will get released if application is crashing ?
Edit
I am using C++ as language.
My application is similar to message queue. But difference is that there is two queue which are getting populated by one process for each queue. After action if action belong to same object and both are processing same object it may result in wrong data. So I want a locking mechanism b/w these two queue so that both processor don't modify same object at same time.
I can think of two ways:
Implement some error handler on your program where you remove the lock. Without knowing anything about your program it is hard to say how to do this, but most languages have some method to do some work before exiting upon a crash. This is dangerous, because a crash happens when something is not right. If you continue to do any work, it is possible that you corrupt the database or something like that.
Periodically update the lock. Add a thread on your program that periodically reacquires the lock, or reacquire the lock in some loop you are doing. Then, when a lock is not updated in a while, you know that it belonged to a program that crashed.
When receiving so called IPN message from PayPal, I need to update a row in my database.
The issue is that I need perfect reliability.
Currently I use InnoDB. I am afraid that the transaction may fail due a race condition.
Should I use LOCK TABLES? Any other reliable solution?
Should I check for a failure and repeat the transaction several (how many?) times?
You cannot reliably make a distributed process (like adding a row locally and notifying the server remotely) perfectly reliable, no matter the order. This is a lot like the Two General's Problem: there is no single event which can denote the successful completion of the transaction on both sides simultaneously, as any message might get lost along the way.
I'm not sure I understand your issue correctly, but perhaps the following would work: Write a line to some table noting the fact that you are going to verify a given message. Then do the verification, and afterwards write a line to the database about the result of that verification. In the unlikely but important scenario that something broke in between, you will have an intent line with no matching result line. You can then detect such situations and recover from them manually.
On your local database, you'd have single row updates, which you may execute in their own transaction, probably even with autocommit turned on. You have to make sure that the first write is actually committed to disk (and preferrably a binary log on some other disk as well) before you start talking to the PayPal server, but I see no need for locking or similar. You migt want to retry failed transactions, I'd say up to three times, but the important thing is that in the end you can have admin intervention to fix anything your code can't handle.
i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.