I have 6 scripts/tasks. Each one of them starts a MySQL transaction, then do its job, which means SELECT/UPDATE/INSERT/DELETE from a MySQL database, then rollback.
So if the database is at a given state S, I launch one task, when the task terminates, the database is back to state S.
When I launch the scripts sequentially, everything works fine:
DB at state S
task 1
DB at state S
task 2
DB at state S
...
...
task 6
DB at state S
But I'd like to speed up the process by multiple-threading and launching the scripts in parallel.
DB at state S
6 tasks at the same time
DB at state S
Some tasks randomly fail, I sometimes get this error:
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
I don't understand, I thought transactions were meant for that. Is there something I'm missing ? Any experience, advice, clue is welcome.
The MySQL configuration is:
innodb_lock_wait_timeout = 500
transaction-isolation = SERIALIZABLE
and I add AUTOCOMMIT = 0 at the beginning of each session.
PS: The database was built and used under the REPEATABLE READ isolation level which I changed afterwards.
You can prevent deadlocks by ensuring that every transaction/process does a SELECT...FOR UPDATE on all required data/tables with the same ORDER BY in all cases and with the same order of the tables itself (with at least repeateable read isolation level in MySQL).
Apart from that, isolation levels and transactions are not meant to handle deadlocks, it is vice versa, they are the reason why deadlocks exist. If you encounter a deadlock, there are good chances that you would have an inconsistent state of your dataset (which might be much more serious - if not, you might not need transactions at all).
Related
After my airflow upgraded to 2.0.0(and then 2.0.1) and scheduler expanded to 3 nodes, something weird happened:
dagruns were success but the task instances were not scheduled at all
task failed with a null hostname(https://github.com/apache/airflow/issues/13692)
Task is set "upstream_failed" while upstream tasks are success(https://github.com/apache/airflow/issues/13671)
These phenomena never happened when there was only one scheduler node.
And I found that after task instances of a new dagrun were created by a scheduler node, they were not found in another scheduler node's task_instance_scheduling_decisions function.
Then I checked the mysql configurations and found transaction isolation was set to be Repeatable read by default.
After I set transaction isolation to be read commited, everything seems to be good now. But I still wonder are there any side effects?
Yes, READ COMMITTED is different than REPEATABLE READ.
If you use REPEATABLE READ, then in this transaction:
START TRANSACTION;
SELECT * FROM mytable;
SELECT * FROM mytable; -- i.e. the same query
COMMIT;
You are guaranteed that both SELECTs return the same result (as long as they are not locking queries).
But in READ COMMITTED, if some other session has committed a change to data in between the two SELECTs, they can return different results.
In other words, REPEATABLE READ means your SELECT queries always return data that was committed at the moment your START TRANSACTION started. Whereas READ COMMITTED means your SELECT queries return data that was committed after your transaction started, up until the time each respective SELECT starts.
Both levels of transaction isolation have proper uses. But they do behave differently.
Logs showing that from time to time this error is raised.
I'm reading the docs and it's very confusing because we're not locking any tables to do inserts and we have no transactions beyond individual SQL calls.
So - might this be happening because we're running out of the mySQL connection pool in Node? (We've set it to something like 250 simultaneous connections).
I'm trying to figure out how to replicate this but having no luck.
Every query not run within an explicit transaction runs in an implicit transaction that immediately commits when the query finishes or rolls back if an error occurs... so, yes, you're using transactions.
Deadlocks occur when at least two queries are in the process of acquiring locks, and each of them holds row-level locks that they happened to acquire in such an order that they each now need another lock that the other one holds -- so, they're "deadlocked." An infinite wait condition exists between the running queries. The server notices this.
The error is not so much a fault as it is the server saying, "I see what you did, there... and, you're welcome, I cleaned it up for you because otherwise, you would have waited forever."
What you aren't seeing is that there are two guilty parties -- two different queries that caused the problem -- but only one of them is punished. The query that has accomplished the least amount of work (admittedly, this concept is nebulous) will be killed with the deadlock error, and the other query happily proceeds along its path, having no idea that it was the lucky survivor.
This is why the deadlock error message ends with "try restarting transaction" -- which, if you aren't explicitly using transacrions, just means "run your query again."
See https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlocks.html and examine the output of SHOW ENGINE INNODB STATUS;, which will show you the other query -- the one that helped cause the deadlock but that was not killed -- as well as the one that was.
I'm starting out with MySQL trnsactions and I have a doubt:
In the documentation it says:
Beginning a transaction causes any pending transaction to be
committed. See Section 13.3.3, “Statements That Cause an Implicit
Commit”, for more information.
I have more or less 5 users on the same web application ( It is a local application for testing ) and all of them share the same MySQL user to interact with the database.
My question is: If I use transactions in the code and two of them start a transaction ( because of inserting, updating or something ) Could it be that the transactions interfere with each other?
I see in the statements that cause an implicit commit Includes starting a transaction. Being a local application It's fast and hard to tell if there is something wrong going on there, every query turns out as expected but I still have the doubt.
The implicit commit occurs within a session.
So for instance you start a transaction, do some updates and then forget to close the transaction and start a new one. Then the first transaction will implicitely committed.
However, other connections to the database will not be affected by that; they have their own transactions.
You say that 5 users use the same db user. That is okay. But in order to have them perform separate operations they should not use the same connection/session.
With MySQl by default each connection has autocommit turned on. That is, each connection will commit each query immediately. For an InnoDb table each transaction is therefore atomic - it completes entirely and without interference.
For updates that require several operations you can use a transaction by using a START TRANSACTION query. Any outstanding transactions will be committed, but this won't be a problem because mostly they will have been committed anyway.
All the updates performed until a COMMIT query is received are guaranteed to be completed entirely and without interference or, in the case of a ROLLBACK, none are applied.
Other transations from other connections see a consistent view of the database while this is going on.
This property is ACID compliance (Atomicity, Consistency, Isolation, Durability) You should be fine with an InnoDB table.
Other table types may implement different levels of ACID compliance. If you have a need to use one you should check it carefully.
This is a much simplified veiw of transaction handling. There is more detail on the MySQL web site here and you can read about ACID compliance here
I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?
There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.
In our applications we don't use either ADO.NET transaction or SQL Server transactions in procedures and now we are getting the below error in our website when multiple people are using.
Transaction (Process ID 73) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction
Is this error due to the lack of transactions? I thought the consistency will be handled by the DB itself.
And one thing I noticed that SQLCommand.Timeout property has set to 10000. Will this be an issue for the error?
I am trying to solve this issue ASAP. Please help.
EDIT
I saw the Isolationlevel property of ADO.NET transaction, so if I use ADO.NET transaction with proper isolationlevel property like "ReadUncommitted" during reading and "Serializable" during writing?
Every SQL DML (INSERT, UPDATE, DELETE) or DQL (SELECT) statement runs inside a transaction. The default behaviour for SQL Server is for it to open a new transaction (if one doesn't exist), and if the statement completes without errors, to automatically commit the transaction.
The IMPLICIT_TRANSACTIONS behaviour that Sidharth mentions basically gets SQL Server to change it's behaviour somewhat - it leaves the transaction open when the statement completes.
To get better information in the SQL Server error log, you can turn on a trace flag. This will then tell you which connections were involved in the deadlock (not just the one that got killed), and which resources were involved. You may then be able to determine what pattern of behaviour is leading to the deadlocks.
If you're unable to determine the underlying cause, you may have to add some additional code to your application - that catches sql errors due to deadlocks, and retries the command multiple times. This is usually the last resort - it's better to determine which tables/indexes are involved, and work out a strategy that avoids the deadlocks in the first place.
IsolationLevel is your best bet. Default serialization level of transactions is "Serializable" which is the most stringent and if at this level there is a circular reference chances of deadlock are very high. Set it to ReadCommitted while reading and let it be Serializable while writing.
Sql server can use implicit transactions which is what might be happening in your case. Try setting it off:
SET IMPLICIT_TRANSACTIONS OFF;
Read about it here: http://msdn.microsoft.com/en-us/library/ms190230.aspx