After my airflow upgraded to 2.0.0(and then 2.0.1) and scheduler expanded to 3 nodes, something weird happened:
dagruns were success but the task instances were not scheduled at all
task failed with a null hostname(https://github.com/apache/airflow/issues/13692)
Task is set "upstream_failed" while upstream tasks are success(https://github.com/apache/airflow/issues/13671)
These phenomena never happened when there was only one scheduler node.
And I found that after task instances of a new dagrun were created by a scheduler node, they were not found in another scheduler node's task_instance_scheduling_decisions function.
Then I checked the mysql configurations and found transaction isolation was set to be Repeatable read by default.
After I set transaction isolation to be read commited, everything seems to be good now. But I still wonder are there any side effects?
Yes, READ COMMITTED is different than REPEATABLE READ.
If you use REPEATABLE READ, then in this transaction:
START TRANSACTION;
SELECT * FROM mytable;
SELECT * FROM mytable; -- i.e. the same query
COMMIT;
You are guaranteed that both SELECTs return the same result (as long as they are not locking queries).
But in READ COMMITTED, if some other session has committed a change to data in between the two SELECTs, they can return different results.
In other words, REPEATABLE READ means your SELECT queries always return data that was committed at the moment your START TRANSACTION started. Whereas READ COMMITTED means your SELECT queries return data that was committed after your transaction started, up until the time each respective SELECT starts.
Both levels of transaction isolation have proper uses. But they do behave differently.
Related
Does it only affect whatever commands were after the relevant BEGIN transaction?
For example:
BEGIN TRAN
UPDATE orders SET orderdate = '01-08-2013' WHERE orderno > '999'
Now, assume someone else performs a data import that inserts 10,000 new records into another table.
If I subsequently issue a ROLLBACK command, do those records get discarded or is it just the command above that gets rolled back?
Sorry if this a stupid question, I'm only just starting to use COMMIT and ROLLBACK.
Any transaction is confined to the connection it was opened on.
One of the four ACID properties of any relational database management system is Isolation. That means your actions are isolated from other connections and vice versa until you commit. Any change you do is invisible to other connections and if you roll it back they will never know it happened. That means in turn that changes that happened somewhere else are invisible to you until they are committed. Particularly that means that you can't ROLLBACK anyone else's changes.
The Isolation is achieved in one of two ways. One way is to "lock" the resource (e.g. the row). If that happens any other connection trying to read from that row has to wait until you finish your transaction.
The other way is to create a copy of the row that contains the old values. In this case all other connections will see the old version until you commit your transaction.
SQL Server can use both isolation methods. Which one is used depends on the isolation level you choose. The two Snapshot Isolation Levels provide the "copy method" the other four use the "lock method". The default isolation level of "read committed" is one of the "lock method" isolation levels.
Be aware however that the isolation level "READ UNCOMMITTED" basically circumvents these mechanisms and allows you to read changes that others started and have not yet committed. This is a special isolation level that can be helpful when diagnosing a problem but should be avoided in production code.
Suppose I do (note: the syntax below is probably not correct, but don't worry about it...it's just there to make a point)
Start Transaction
INSERT INTO table (id, data) VALUES (100,20), (100,30);
SELECT * FROM table WHERE id = 100;
End Transaction
Hence the goal of the select is to get ALL info from the table that just got inserted by the preceding insert and ONLY by the preceding INSERT...
Now suppose that during the execution, after the INSERT got executed, some other user also performs an INSERT with id = 100...
Will the SELECT statement in the next step of the transaction also get the row inserted by the executed INSERT by the other user or will it just get the two rows inserted by the preceding INSERT within the transaction?
Btw, I'm using MySQL so please tailor your answer to MySQL
This depends entirely on the Transaction Isolation that is used by the DB Connection.
According to MySQL 5.0 Certification Study Guide
Page 420 describes three transactional conditions handled by Isolation Levels
A dirty read is a read by one transaction of uncommitted changes made by another. Suppose the transaction T1 modifies a row. If transaction T2 reads the row and sees the modification neven though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rollbacks, the change is undone but T2 does not know that.
A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows, and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.
A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new and T1 sees that row when it reads again, the row is a phantom.
Page 421 describes the four(4) Transaction Isolation Levels:
READ-UNCOMMITTED : allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.
READ-COMMITTED : allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remains invisible. This isolation level allows non-repeatable reads, and phantoms to occur.
REPEATABLE READ (default) : ensure that is a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changesmade by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows,in the inerbal between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.
SERIALIZABLE : completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.
Isolation level can be set for your DB Session globally, within your session, or for a specific transaction:
SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level;
SET SESSION TRANSACTION ISOLATION LEVEL isolation_level;
SET TRANSACTION ISOLATION LEVEL isolation_level;
where isolation_level is one of the following values:
'READ UNCOMMITTED'
'READ COMMITTED'
'REPEATABLE READ'
'SERIALIZABLE'
In my.cnf you can set the default as well:
[mysqld]
transaction-isolation = READ-COMMITTED
As other user is updating the same row, row level lock will be applied. So he is able to make change only after your transaction ends. So you will be seeing the result set that you inserted. Hope this helps.
Interfere is a fuzzy word when it comes to SQL database transactions. What rows a transaction can see is determined in part by its isolation level.
Hence the goal of the select is to get ALL info from the table that
just got inserted by the preceding insert and ONLY by the preceding
INSERT...
Preceding insert is a little fuzzy, too.
You probably ought to COMMIT the insert in question before you try to read it. Otherwise, under certain conditions not under your control, that transaction could be rolled back, and the row with id=100 might not actually exist.
Of course, after it's committed, other transactions are free to change the value of "id", of "value", or both. (If they have sufficient permissions, that is.)
The transaction will make it seem like that the statements in the transaction run without any interference from other transactions. Most DBMSs (including MySQL) maintain ACID properties for transactions. In your case, you are interested in the A for Atomic, which means that the DBMS will make it seem like all the statements in your transactions run atomically without interruption.
The only users that get effect is those that require access to the same rows in a table. Otherwise the user will not be affected.
However is is slightly more complicated as the row locking can be a read lock or a write lock.
Here is an explanation for the InnoDB storage engine.
For efficiency reasons, developers do not set transactions to totally isolated for each other.
Databases support multiples isolation levels namely Serializable, Repeatable reads, Read committed and Read uncommitted. They are list from the most strict to least strict.
I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?
There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.
I have few questions about ssis transction isolation levels.
consider a scenario:I have an Execute SQL task which insert a data in a table A.This task is pointing to a dataflow task,which read the data which is previously inserted on A.I have started Distributed transaction and if i set transaction isolation in ssis as readcommited,whether it commit the table A at first execute sql task and move to dataflow task?
Also what about other isolation level in this scenario?
From what I can understand from your question you're asking what's the appropriate transaction isolation if you want to read data from a table in the same transaction that data is being written to the table? As far as I know, it shouldn't matter. The isolation types only address situations where another transaction wants to modify the same rows that the uncommitted transaction is modifying. In other words just reading the table should have no problems and you should see the data from the first Execute SQL task. Data written in a transaction is available before the transaction is committed.
For further reading, this is from the Oracle docs, but the same definition should apply to SQL and SSIS packages. Notice they address when two transactions want to modify the same data:
SERIALIZABLE: If a serializable transaction tries to execute a SQL data manipulation statement that modifies any table already modified by an uncommitted transaction, the statement fails.
READ COMMITTED: If a transaction includes SQL data manipulation statements that require row locks held by another transaction, the statement waits until the row locks are released.
DO NOT DOWNVOTE THIS ANSWER. I got it from MSDN forums and I am keeping it here for reference.
http://social.msdn.microsoft.com/Forums/en-US/3dcea5f6-32ef-40aa-90d5-0f2fef9e1d38/isolation-level-in-ssis
A few observations...
The IsolationLevel property in SSIS components only applies when distributed transactions are used (package or other container has TransactionOption=Required). So in that regard, Isolation Level is a bit misleading in SSIS. Even if you set it, its not going to help unless a transaction is opened by SSIS. I wrote about that limitation here: http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.runtime.dtscontainer.isolationlevel.aspx
If you are customizing the isolation level in TSQL (stored procedure or just at the beginning of the a batch) which is called from SSIS, you can override the default SQL Server isolation level Read_committed, however if you just point to a table name in a dataflow source or destination, you can't set the isolation level.
If you choose to manually set the isolation level in other ways in each of your queries, there are a few techniques:
1. If you were to run the SET options in your Commands "set transaction isolation level read uncommitted" http://msdn.microsoft.com/en-us/library/ms173763.aspx
Be careful with Read Uncommitted & Nolock, since it can read dirty data (data changes in flux not fully committed by other connections.)
Using Locking Hints such as http://technet.microsoft.com/en-us/library/ms187373.aspx
select * from t1 (nolock)
Setting the auto-commit isolation level in OLEDB or ODBC if there is a place to override that in the connection string or driver properties of your driver http://msdn.microsoft.com/en-us/library/ms175909.aspx I haven't tested that, but it may be possible.
To see the isolation level being used, if your RDBMS that you connect to is SQL Server 2005/2008, while the connection/session is still active you can query DBCC USEROPTIONS or selecting from dm_exec_sessions
select transaction_isolation_level,* from sys.dm_exec_sessions
(0 = Unspecified, 1 = ReadUncomitted, 2 = ReadCommitted, 3 = Repeatable, 4 = Serializable, 5 = Snapshot)
We also found out that Snapshot Isolation Level is incompatible with Distributed Transactions, therefore it is not possible to use Snapshot Isolation Level through the SSIS properties. A workaround for that would be to use the TSQL syntax for Snapshot Isolation within your Data Sources & ExecuteSqlTask commands directly.
Best of luck, Jason
His MSDN Profile - http://social.msdn.microsoft.com/profile/jason%20h%20(hdinsight)/?ws=usercard-mini
I have 6 scripts/tasks. Each one of them starts a MySQL transaction, then do its job, which means SELECT/UPDATE/INSERT/DELETE from a MySQL database, then rollback.
So if the database is at a given state S, I launch one task, when the task terminates, the database is back to state S.
When I launch the scripts sequentially, everything works fine:
DB at state S
task 1
DB at state S
task 2
DB at state S
...
...
task 6
DB at state S
But I'd like to speed up the process by multiple-threading and launching the scripts in parallel.
DB at state S
6 tasks at the same time
DB at state S
Some tasks randomly fail, I sometimes get this error:
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
I don't understand, I thought transactions were meant for that. Is there something I'm missing ? Any experience, advice, clue is welcome.
The MySQL configuration is:
innodb_lock_wait_timeout = 500
transaction-isolation = SERIALIZABLE
and I add AUTOCOMMIT = 0 at the beginning of each session.
PS: The database was built and used under the REPEATABLE READ isolation level which I changed afterwards.
You can prevent deadlocks by ensuring that every transaction/process does a SELECT...FOR UPDATE on all required data/tables with the same ORDER BY in all cases and with the same order of the tables itself (with at least repeateable read isolation level in MySQL).
Apart from that, isolation levels and transactions are not meant to handle deadlocks, it is vice versa, they are the reason why deadlocks exist. If you encounter a deadlock, there are good chances that you would have an inconsistent state of your dataset (which might be much more serious - if not, you might not need transactions at all).