We execute unit tests (lots of them) Using the DatabaseTransactions trait and a MySQL database connection.
When executing the full test suite, we get 15 or so of "General error: 1205 Lock wait timeout exceeded;".
When executing those tests individually, they all succeed.
Issue comes up mostly when executing sync() method, but not only then.
(tried to increase the wait timeout, no luck).
Any suggestion will be much appreciated.
Posted in laracasts as well: https://laracasts.com/discuss/channels/testing/test-suite-general-error-1205-lock-wait-timeout-exceeded
It came up that a missing parent::tearDown(); was guilty for this.
Thus, a transaction was not closed.
Issue solved.
Related
we're having a weird issue with TypeORM, specifically with Jest(might be related, might not be). A certain test is getting completely stuck/hung and we’re having a hard time figuring out what the issue is.
In terms of stack: Typescript, NodeJS, Apollo Graphql, Jest, MySQL.
The test in question is actually an integration test using Apollo’s integration test framework.
What happened first is that a specific test just completely got stuck, and after several long minutes an error is thrown in the console: QueryFailedError: ER_LOCK_WAIT_TIMEOUT: Lock wait timeout exceeded; try restarting transaction
Trying to pinpoint the problem led me to a function we run on afterEach which “destroys” the database. It initially ran:
await queryRunner.query('DELETE FROM Table1');
await queryRunner.query('DELETE FROM Table2');
...
The error and "deadlock" was initially fixed after I changed it from queryRunner to queryBuilder:
await queryBuilder.delete().from('Table1').execute();
...
This was done after fidgeting around with SHOW PROCESSLIST; and with SHOW ENGINE InnoDB STATUS; to try figuring out what was happening. I also changed the transaction isolation to READ-COMMITTED but to no avail. Nothing really worked except changing it from queryRunner to queryBuilder.
This worked for a bit but now it seems like the test is getting stuck again (The test hasn’t changed but the code it’s testing has). Now after the test hangs, we get this error: Error: Pool is closed. Afterwards the test is "released" and all the tests just start failing one by one.
We found out that this is the sequence of events that causes the test to get stuck:
1. open a transaction with queryRunner
2. perform a read query
3. then perform a write
4. commit the transaction and release the queryRunner
5. delete the DB
6. perform a write - deadlock
Furthermore we noticed the following:
If we make sure that we only use the queryRunner for updates, and not
for queries, then the deadlock doesn’t happen.
Changing the code such that we first make all of the read queries with the regular connection
object (not queryRunner) and only then if we connect with
queryRunner and make all of the writes - then the deadlock does not happen.
Does anyone have any insight as to what might be going on? Is there some instability with queryRunner or some specific things we need to take into account when using it?
Thanks!
I was faced same issue and my problem was unhandled async/await.
Check if some promise object is not handled.
If you use async keyword, you must handle all async functions using await keyword.
Also don't forget calling done method of jest.
When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)
I'm using mysql in a node project.
I would like to unit test a javascript function that makes an sql transaction. If the transaction becomes the victim of a lock monitor, the function has code that handles the failure.
Or does it?
Because I'm unit testing, I'm only making one transaction at a time on a local database, so there's never going to be a deadlock, right? How can I test the deadlock handling if it's never going to happen? Is there a way I can force it to happen?
Example:
thisMustBeDoneBeforeTheQuery();
connection.queryAsync(/*This is an update*/).catch(function(err) {
undoThatStuffIDidBeforeTheQuery();
// I hope that function worked, because my unit tests can't
// make a deadlock happen, so I can't know for sure.
}
What is the essential behavior that your tests need to guard or verify?? Do you need to test your mysql driver? Or MySql itself? I think #k0pernikus Identified the highest value test:
Assuming that the database client results in an exception because of a deadlock, how does your application code handle it?
You should be able to pretty easily create a test harness using a mocking library or Dependency Injection and test stubs to simulate the client driver returning a deadlock exception. This shouldn't require any interaction with mysql, beyond the initial investigation to see what the return code/error exception propagation looks like for your mysql client driver.
This isn't a 100% perfect test, and still leaves your vulnerable in the case the mysql client library changes.
Reproducing concurrent issues deterministically is often times extremely difficult because of timing. But using SELECT ... FOR UPDATE and multiple transactions should be able to deterministically reproduce a deadlock on mysql, to verify your client libraries code.
My app is working with MySQL database, to connection I'm using FireDAC components. Last time I have got a network problem, I test it and it is looks like (time to time) it losing 4 ping request. My app return me error: "[FireDAC][Phys][MySQL] Lost connection to MySQL server during query". Now the question: setting fdconnection.TFDUpdateOptions.LockWait to true (default is false) will resolve my problem or make new problems?
TFDUpdateOptions.LockWait has no effect on your connection to the database. It determines what happens when a record lock can't be obtained immediately. The documentation says it pretty clearly:
Use the LockWait property to control whether FireDAC should wait while the pessimistic lock is acquired (True), or return the error immediately (False) if the record is already locked. The default value is False.
The LockWait property is used only if LockMode = lmPessimistic.
FireDAC can't wait to get a lock if it loses the connection, as clearly there is no way to either request the lock or determine if it was obtained. Therefore, changing LockWait will not change the lost connection issue, and it may slow many other operations against the data.
The only solution to your lost ping requests is to fix your network connection so it stops dropping packets. Simply randomly changing options on TFDConnection isn't going to fix networking issues.
I am running Magento 1.7.0.2 and everything was running great but out of nowhere, I couldn't process any orders and was noticing that on the final order submission, my site would hang and throw a general "unable to process your order, try again" message to the user while on the backend, it would give me one of two errors:
Blockquote
SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction
or
Blockquote
Gateway error code E00001: An error occurred during processing. Please try again.
I am not sure why all of the sudden the mysql tables are locking up on Magento and why my orders cannot process.
After about 8 hours, I got the site back online and it turned out there was nothing on my end to be done! All of the suggestions to mess with innodb, etc didn't work as the site processes were being killed, due to their long wait time. The issue was all being caused by our payment processor being down (Authorize.net) and so the final checkout process was hanging because authorize.net never returned an answer, which is why these errors were being thrown and the tables were being locked.
I thought I would post this here as a question/answer because I was totally down and didn't see anyone on the web talking about this issue. Hopefully, this will help others if this error is seen on their site. Thanks!