Airflow upon starting - MySQL server has gone away - mysql

We added a second airflow database to support our staging airflow instance, and now both staging and production seem to have intermittent connection issues. We recently upgraded to 2.2.4, but looking through past RDS logs, I see aborted connections (Got an error reading communication packets) prior to the upgrade as well (previous version 1.10.11). Both instances have a webserver UI that pull in past DAG runs fine, and the scheduler is running DAGs appropriately and storing dag_run data. When the service is started however, we receive an error "MySQL server has gone away" (see below for stack trace). airflow db check and airflow db shell both return successful connections. Some relevant airflow config values are:
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 5
sql_alchemy_max_overflow = 10
sql_alchemy_pool_recycle = 1800
sql_alchemy_pool_pre_ping = True
sql_alchemy_schema =
sql_alchemy_reconnect_timeout = 300
I've also tried upping the pool size to 20, and also confirmed that our db size should be able to handle at least 90 concurrent connections.
[2022-04-28 16:32:03,212] {manager.py:512} WARNING - Refused to delete permission view, assoc with role exists DAG Runs.can_create Admin
Process ForkProcess-19:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 254, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (2006, 'MySQL server has gone away')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/dag_processing/manager.py", line 287, in _run_processor_manager
processor_manager.start()
File "/usr/local/lib/python3.6/site-packages/airflow/dag_processing/manager.py", line 520, in start
return self._run_parsing_loop()
File "/usr/local/lib/python3.6/site-packages/airflow/dag_processing/manager.py", line 585, in _run_parsing_loop
self._find_zombies()
File "/usr/local/lib/python3.6/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/dag_processing/manager.py", line 1079, in _find_zombies
LJ.latest_heartbeat < limit_dttm,
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3373, in all
return list(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 254, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2006, 'MySQL server has gone away')
[SQL: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_instance.pid AS task_instance_pid, task_instance.executor_config AS task_instance_executor_config, task_instance.external_executor_id AS task_instance_external_executor_id, task_instance.trigger_id AS task_instance_trigger_id, task_instance.trigger_timeout AS task_instance_trigger_timeout, task_instance.next_method AS task_instance_next_method, task_instance.next_kwargs AS task_instance_next_kwargs, dag.fileloc AS dag_fileloc, dag_run_1.state AS dag_run_1_state, dag_run_1.id AS dag_run_1_id, dag_run_1.dag_id AS dag_run_1_dag_id, dag_run_1.queued_at AS dag_run_1_queued_at, dag_run_1.execution_date AS dag_run_1_execution_date, dag_run_1.start_date AS dag_run_1_start_date, dag_run_1.end_date AS dag_run_1_end_date, dag_run_1.run_id AS dag_run_1_run_id, dag_run_1.creating_job_id AS dag_run_1_creating_job_id, dag_run_1.external_trigger AS dag_run_1_external_trigger, dag_run_1.run_type AS dag_run_1_run_type, dag_run_1.conf AS dag_run_1_conf, dag_run_1.data_interval_start AS dag_run_1_data_interval_start, dag_run_1.data_interval_end AS dag_run_1_data_interval_end, dag_run_1.last_scheduling_decision AS dag_run_1_last_scheduling_decision, dag_run_1.dag_hash AS dag_run_1_dag_hash
FROM task_instance INNER JOIN job ON task_instance.job_id = job.id AND job.job_type IN (%s) INNER JOIN dag ON task_instance.dag_id = dag.dag_id INNER JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id
WHERE task_instance.state = %s AND (job.state != %s OR job.latest_heartbeat < %s)]
[parameters: ('LocalTaskJob', <TaskInstanceState.RUNNING: 'running'>, <TaskInstanceState.RUNNING: 'running'>, datetime.datetime(2022, 4, 28, 16, 27, 2, 870459))]
(Background on this error at: http://sqlalche.me/e/13/e3q8)```

Related

Is there a solution in fixing 'MySQL connection not available' for very second request in Flask application?

I am running a Flask application on local and production server. I have no issues with the local, I am facing 'MySQL connection not available' for every second database request on production server. After reloading or after performing rollback operation, it is getting executed, but the issue repeats. I tried major solutions available on the internet by changing, pool_recycle, 'wait_timeout' and other timeouts to same value of 1600, but didn't work. I finally destroyed the application and reinstalled everything on the server but the issue is still on. Please help in this, thank you.
My production MySQL config:
class ProductionConfig(Config):
SECRET_KEY = 'secret_key'
DEBUG=False
SQLALCHEMY_DATABASE_URI = "mysql+pymysql://{username}:
{password}#{hostname}/{databasename}".format(
username="myusername",
password="password",
hostname="myhostname",
databasename="mydatabase",
)
SQLALCHEMY_POOL_RECYCLE = 299
SQLALCHEMY_TRACK_MODIFICATIONS = False
Error Message:
Exception on /add_questions/1/2 [POST]
Traceback (most recent call last):
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 713, in _write_bytes
self._sock.sendall(data)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
cursor, statement, parameters, context
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
self._execute_command(COMMAND.COM_QUERY, sql)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 771, in _execute_command
self._write_bytes(packet)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 718, in _write_bytes
"MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (ConnectionResetError(104, 'Connection reset by peer'))")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask_login/utils.py", line 270, in decorated_view
elif not current_user.is_authenticated:
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/werkzeug/local.py", line 347, in __getattr__
return getattr(self._get_current_object(), name)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/werkzeug/local.py", line 306, in _get_current_object
return self.__local()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask_login/utils.py", line 26, in <lambda>
current_user = LocalProxy(lambda: _get_user())
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask_login/utils.py", line 346, in _get_user
current_app.login_manager._load_user()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/flask_login/login_manager.py", line 318, in _load_user
user = self._user_callback(user_id)
File "/home/ulznrcvr/jaihindpro/application/auth/views.py", line 26, in load_user
return User.query.get(int(user_id))
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 1004, in get
return self._get_impl(ident, loading.load_on_pk_identity)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 1121, in _get_impl
return db_load_fn(self, primary_key_identity)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/loading.py", line 287, in load_on_pk_identity
return q.one()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3360, in one
ret = self.one_or_none()
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3329, in one_or_none
ret = list(self)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3405, in __iter__
return self._execute_and_instances(context)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3430, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 984, in execute
return meth(self, multiparams, params)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
distilled_params,
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
e, statement, parameters, cursor, context
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
cursor, statement, parameters, context
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
self._execute_command(COMMAND.COM_QUERY, sql)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 771, in _execute_command
self._write_bytes(packet)
File "/home/ulznrcvr/virtualenv/jaihindpro/3.6/lib/python3.6/site-packages/pymysql/connections.py", line 718, in _write_bytes
"MySQL server has gone away (%r)" % (e,))
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (ConnectionResetError(104, 'Connection reset by peer'))")
[SQL: SELECT user.id AS user_id, user.student_id AS user_student_id, user.role_id AS user_role_id, user.course_id AS user_course_id, user.sub_course_id AS user_sub_course_id, user.batch_id AS user_batch_id, user.full_name AS user_full_name, user.email AS user_email, user.phone_number AS user_phone_number, user.password AS user_password, user.active AS user_active, user.confirmed AS user_confirmed, user.confirmed_date AS user_confirmed_date, user.subscribed AS user_subscribed, user.start_date AS user_start_date, user.end_date AS user_end_date, user.re_url AS user_re_url, user.online AS user_online
FROM user
WHERE user.id = %(param_1)s]
[parameters: {'param_1': 2}]
(Background on this error at: http://sqlalche.me/e/e3q8)
I figured it out myself, I flagged "pool_pre_ping" attribute to "True" in engine settings.
SQLALCHEMY_ENGINE_OPTIONS = {
"pool_pre_ping": True,
"pool_recycle": 300,
}
Ref: https://medium.com/#heyjcmc/controlling-the-flask-sqlalchemy-engine-a0f8fae15b47

Airflow Deadlock while running multiple nodes in parallel in a SubDAG

DAG structure: We have an ETL pipeline having a number of phases. Each phase could have child phases inside it, which is true recursively as well (to a known depth, about 3-4). The inner most layer consists of SQLs (largely variable number, avg about 30), either executing in parallel or serial fashion or both.
For each phase, we have used SubDAG.
We are aware that using SubDAGs is a bad practice, but because of our hierarchical structure, they become a natural choice. We are using Celery Executor for SubDAGs.
For the leaf nodes in the innermost layer, we have a custom operator.
Problem: We (very) occasionally hit a deadlock and the SubDAG task which houses the SQL nodes, fails. There are no child node failures.
Error: sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
StackTrace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/dist-packages/airflow/operators/subdag_operator.py", line 102, in execute
executor=self.executor)
File "/usr/local/lib/python3.6/dist-packages/airflow/models/dag.py", line 1284, in run
job.run()
File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/base_job.py", line 222, in run
self._execute()
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/backfill_job.py", line 769, in _execute
session=session)
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/backfill_job.py", line 699, in _execute_for_run_dates
session=session)
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/backfill_job.py", line 586, in _process_backfill_task_instances
_per_task_process(task, key, ti)
File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/backfill_job.py", line 508, in _per_task_process
session.commit()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 1036, in commit
self.transaction.commit()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 503, in commit
self._prepare_impl()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 482, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 2479, in flush
self._flush(objects)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 2617, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/session.py", line 2577, in _flush
flush_context.execute()
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj
update,
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/orm/persistence.py", line 996, in _emit_update_statements
statement, multiparams
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 982, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1101, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1250, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.6/dist-packages/sqlalchemy/engine/default.py", line 581, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.6/dist-packages/MySQLdb/cursors.py", line 209, in execute
res = self._query(query)
File "/usr/local/lib/python3.6/dist-packages/MySQLdb/cursors.py", line 315, in _query
db.query(q)
File "/usr/local/lib/python3.6/dist-packages/MySQLdb/connections.py", line 239, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
[SQL: UPDATE task_instance SET state=%s, queued_dttm=%s WHERE task_instance.task_id = %s AND task_instance.dag_id = %s AND task_instance.execution_date = %s]
[parameters: ('queued', datetime.datetime(2020, 1, 14, 10, 51, 48, 301880, tzinfo=<Timezone [UTC]>), 'SQL_W', 'RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ', datetime.datetime(2020, 1, 14, 10, 28, 36, 123853, tzinfo=<Timezone [UTC]>))]
(Background on this error at: http://sqlalche.me/e/e3q8)
Additionally, we also see a warning in the logs as below.
[2020-01-14 10:51:43,217] {logging_mixin.py:112} INFO - [2020-01-14 10:51:43,217] {backfill_job.py:246}
WARNING - ('RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ', 'SQL_A', datetime.datetime(2020, 1, 14, 10, 28, 36, 123853, tzinfo=<Timezone [UTC]>), 1) state success not in running=dict_values(
[<TaskInstance: RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ.SQL_B 2020-01-14 10:28:36.123853+00:00 [running]>,
<TaskInstance: RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ.SQL_C 2020-01-14 10:28:36.123853+00:00 [running]>,
<TaskInstance: RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ.SQL_D 2020-01-14 10:28:36.123853+00:00 [running]>,
<TaskInstance: RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ.SQL_E 2020-01-14 10:28:36.123853+00:00 [running]>,
<TaskInstance: RANDOM_DAG_ID.PhaseX.PhaseY.PhaseZ.SQL_F 2020-01-14 10:28:36.123853+00:00 [running]>])
Airflow Version: 1.10.5, 1.10.7
We found that AIRFLOW-2516 is the issue we are facing, this comment pointing to the probable cause. It suggests the deadlock is due to two queries acquiring locks on two indices in different order.
Dropping the second index on task state may not be a good idea as it would affect Airflow Scheduler performance, which is already not great.
They seem to have made fix for deadlock at the main DAG level, not at the SubDAG level.
We have tried auto-retrying the same SubDAG operator after a minute, using BaseOperator parameter retries, but in good number of cases, the deadlock happens in the retry as well.
The same issue could occur even if we remove SubDAGs, which will not be a trivial exercise for us.
Any lateral solutions/workarounds are also welcome.
Please let me know if airflow.cfg file is required.

How do I properly connect Airflow set up on AWS EC2 to RDS?

I have an airflow instance on EC2 that is running the webserver/scheduler. I want to hook up a MySQL RDS instance as the backend metadata database as opposed to the native SQLite. I replaced the one line in Airflow.cfg that connects to the database via sql_alchemy to connect to RDS with a pymysql driver:
#sql_alchemy_conn = sqlite:////home/cloud-user/airflow/airflow.db
sql_alchemy_conn = mysql+pymysql://admin:<PASSWORD>#airflow-db.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306/airflow
The connection seems to work fine and I am able to get into the RDS instance and query tables via a MySQL client set up on my EC2 instance.
When I toggle a DAG on or off, I get this nasty python stack trace in my shell:
[2019-09-30 14:00:51,774] {app.py:1891} ERROR - Exception on /admin/airflow/paused [POST]
Traceback (most recent call last):
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask_admin/base.py", line 69, in inner
return self._run_view(f, *args, **kwargs)
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask_admin/base.py", line 368, in _run_view
return fn(self, *args, **kwargs)
File "/home/cloud-user/.local/lib/python2.7/site-packages/flask_login/utils.py", line 258, in decorated_view
return func(*args, **kwargs)
File "/home/cloud-user/.local/lib/python2.7/site-packages/airflow/www/utils.py", line 279, in wrapper
session.commit()
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1027, in commit
self.transaction.commit()
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 494, in commit
self._prepare_impl()
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 473, in _prepare_impl
self.session.flush()
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2459, in flush
self._flush(objects)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2597, in _flush
transaction.rollback(_capture_exception=True)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2557, in _flush
flush_context.execute()
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1138, in _emit_insert_statements
statement, params
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
return meth(self, multiparams, params)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
distilled_params,
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
e, statement, parameters, cursor, context
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
cursor, statement, parameters, context
File "/home/cloud-user/.local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/connections.py", line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
result.read()
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/connections.py", line 1075, in read
first_packet = self.connection._read_packet()
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/connections.py", line 684, in _read_packet
packet.check_error()
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error
err.raise_mysql_exception(self._data)
File "/home/cloud-user/.local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
raise errorclass(errno, errval)
ProgrammingError: (pymysql.err.ProgrammingError) (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[(\\'is_paused\\', u\\'true\\'), (\\'dag_id\\', u\\'test\\')]'')' at line 1")
[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (%(dttm)s, %(dag_id)s, %(task_id)s, %(event)s, %(execution_date)s, %(owner)s, %(extra)s)]
[parameters: {'task_id': None, 'extra': "[('is_paused', u'true'), ('dag_id', u'test')]", 'execution_date': None, 'event': 'paused', 'owner': 'anonymous', 'dttm': datetime.datetime(2019, 9, 30, 18, 0, 51, 768073, tzinfo=<Timezone [UTC]>), 'dag_id': u'test'}]
(Background on this error at: http://sqlalche.me/e/f405)
From what I saw on the Airflow documentation, I'm making the change correctly, and the 'airflow initdb / resetdb' commands execute without error.
I already spent a fair amount of time googling this error, but there are no clear answers to this problem. I'm really not sure if I'm missing a prerequisite or if I should be using a different connector?
EDIT: I'm using python 2.7, as seen in the stack trace. Airflow claims compatibility in the short-term, but I see another SO user's problems went away after upgrading to python 3.6: Link to other solution. I'll try this and update if it seems to work.
It looks like the solution is indeed to upgrade to python 3.6, leveraging a virtual environment due to the required duality of python 2.x and 3.y with Linux systems and Airflow. Specifically, I followed this guide and my DAGs seem to be executing successfully.

mysql.connector.errors.InterfaceError: Failed parsing EOF packet

I run into such problems when I running my Django project(a week ago, the project work properly, today find this problem):
My Django version is 1.10.2 with python version 3.5.2, MySQL version is 5.5 on ubuntu 14.0.
/Users/deja/Virtualenv/python3.5/bin/python3.5 /Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py --multiproc --qt-support --client 127.0.0.1 --port 60850 --file /Users/mozat/project/crawler_management_system/manage.py runserver 0.0.0.0:6380
pydev debugger: process 81527 is connecting
Connected to pydev debugger (build 145.1504)
pydev debugger: process 81528 is connecting
Performing system checks...
System check identified no issues (0 silenced).
You have 13 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
January 03, 2017 - 07:55:47
Django version 1.10.2, using settings 'crawler_management_system.settings'
Starting development server at http://0.0.0.0:6380/
Quit the server with CONTROL-C.
Internal Server Error: /
Traceback (most recent call last):
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/network.py", line 226, in recv_plain
chunk = self.sock.recv(4 - packet_len)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mozat/project/crawler_management_system/crawler_management_system/mysql_utility.py", line 32, in open_db
raise err
File "/Users/mozat/project/crawler_management_system/crawler_management_system/mysql_utility.py", line 28, in open_db
yield cursor
File "/Users/mozat/project/crawler_management_system/crawler_management_system/views.py", line 40, in select_batch_records
cursor.execute(sql.format(table=self.table,times=times))
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/cursor.py", line 515, in execute
self._handle_result(self._connection.cmd_query(stmt))
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 488, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 267, in _send_cmd
return self._socket.recv()
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/network.py", line 255, in recv_plain
errno=2055, values=(self.get_address(), _strioerror(err)))
mysql.connector.errors.OperationalError: 2055: Lost connection to MySQL server at '127.0.0.1:10189', system error: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 710, in reset_session
self.cmd_reset_connection()
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 1046, in cmd_reset_connection
raise errors.NotSupportedError("MySQL version 5.7.2 and "
mysql.connector.errors.NotSupportedError: MySQL version 5.7.2 and earlier does not support COM_RESET_CONNECTION.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/protocol.py", line 267, in parse_eof
unpacked = struct_unpack('<xxxBBHH', packet)
struct.error: unpack requires a bytes object of length 9
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
response = self._get_response(request)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/Users/mozat/project/crawler_management_system/crawler_management_system/views.py", line 99, in mainpage
result = batch_record_repo.select_batch_records(times)
File "/Users/mozat/project/crawler_management_system/crawler_management_system/views.py", line 41, in select_batch_records
return cursor.fetchall()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/mozat/project/crawler_management_system/crawler_management_system/mysql_utility.py", line 40, in open_db
connection.close()
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/pooling.py", line 117, in close
cnx.reset_session()
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 713, in reset_session
self._database, self._charset_id)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 661, in cmd_change_user
self._post_connection()
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/abstracts.py", line 695, in _post_connection
self.set_charset_collation(self._charset_id)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/abstracts.py", line 654, in set_charset_collation
charset_name, collation_name))
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 869, in _execute_query
self.cmd_query(query)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 488, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 393, in _handle_result
return self._handle_eof(packet)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/connection.py", line 344, in _handle_eof
eof = self._protocol.parse_eof(packet)
File "/Users/deja/Virtualenv/python3.5/lib/python3.5/site-packages/mysql/connector/protocol.py", line 269, in parse_eof
raise errors.InterfaceError(err_msg)
mysql.connector.errors.InterfaceError: Failed parsing EOF packet.
[03/Jan/2017 07:56:40] "GET / HTTP/1.1" 500 172448
anyone happen to see same problem?Could you please give me some suggestion.
all ,nothing wrong with mysql, and Django. It's for I use a complex SQL to access mysql:
SELECT *
FROM
(SELECT *,
#batch_rank := IF(#current_batch = spider , #batch_rank + 1, 1) AS batch_rank,
#current_batch := spider
FROM deja_crawler.crawl_batch_record
where spider in (select distinct spider from deja_crawler.crawl_batch_record where spider like '%_new' or spider like '%_update')
ORDER BY spider, create_time DESC)
ranked
WHERE batch_rank <= 3
the tables used to have not too much data, so it works, later, the table became huge, and the access time cost is almost 30s, but the connection time out is set as 20s, so it will raise time out error.

Django 1.6 w/ gunicorn - OperationalError: (2006, 'MySQL server has gone away')

In the process of upgrading to Django 1.6 I've started to get a frequent OperationalError: (2006, 'MySQL server has gone away') message on requests to the gunicorn server I use to run the django app. These errors occur instantly from the moment the server is started, on requests that should only take a second which makes me doubt that it is a timeout issue. This error isn't present on the old 1.4 branch of the project and the 1.6 branch doesn't behave this way if it's served simply through django-admin.py runserver.
I generally run gunicorn through an sv process (though it errors if I run it manually too) with the command django-admin.py run_gunicorn --workers=4 -b localhost:8000 which results in many requests, even ones for static media, returning with:
[ERROR] 2015-07-29 14:30:27,931 - gunicorn.error:260 - Error handling request
Traceback (most recent call last):
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/gunicorn/workers/sync.py", line 125, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 187, in __call__
self.load_middleware()
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 47, in load_middleware
mw_instance = mw_class()
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/middleware/locale.py", line 24, in __init__
for url_pattern in get_resolver(None).url_patterns:
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/core/urlresolvers.py", line 365, in url_patterns
patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/core/urlresolvers.py", line 360, in urlconf_module
self._urlconf_module = import_module(self.urlconf_name)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/utils/importlib.py", line 40, in import_module
__import__(name)
File "/opt/apps/maplecroft/versions/current/websites/maplecroft/urls.py", line 2, in <module>
from maplecroft.views import RiskAtlasesLandingView
File "/opt/apps/maplecroft/versions/current/maplecroft/views.py", line 40, in <module>
import maplecroft.search as _search
File "/opt/apps/maplecroft/versions/current/maplecroft/search.py", line 71, in <module>
class MaplecroftSearchForm(SearchForm):
File "/opt/apps/maplecroft/versions/current/maplecroft/search.py", line 111, in MaplecroftSearchForm
choices=model_choices(),
File "/opt/apps/maplecroft/versions/current/maplecroft/search.py", line 57, in model_choices
for category in reversed(Category.objects.filter(parent=None)):
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
self._fetch_all()
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
self._result_cache = list(self.iterator())
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
for row in compiler.results_iter():
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
for rows in self.execute_sql(MULTI):
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql
cursor.execute(sql, params)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/backends/util.py", line 69, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
return self.cursor.execute(sql, params)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 124, in execute
return self.cursor.execute(query, args)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/envs/maplecroft/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
OperationalError: (2006, 'MySQL server has gone away')
However, if I drop to --workers=1 everything seems to run smoothly so my current thoughts are that it is an issue with the threaded workers feature of gunicorn?
Edit: I just tried upgrading gunicorn to the latest version (was on 0.17.2) but that doesn't seem to have made a difference.
I am wondering if this question: https://serverfault.com/questions/407612/error-2006-mysql-server-has-gone-away is relevant, but struggling to overlay it with my current issues
It appears that it was the gunicorn version after all- I tried upgrading gunicorn but was still using the django-admin.py run_gunicorn method of starting the server. Upgrading gunicorn and switching to the non-deprecated gunicorn wsgi:application method solved it.