Airflow 2.2.2 - templating conn not defined - jinja2

I'm using Airflow 2.2.2 with the latest providers installed as appropriate.
I'm trying to use the Azure and MySQL hooks and have created custom operators with templates defined for what variables can be templated.
When I do so, I get an error saying that conn or var cannot be found
e.g. my passed parameter is
{{ conn.<variable_name> }}
or
{{ var.json.value.<variable_name> }}
I believe this should be possible in > v2.0 but not working for me, any ideas why?
EDIT: Below are snippets of code with some sensitive information removed, let me know if anything else is needed?
DAG error -
Broken DAG: [/home/dags/dag.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/dags/dag.py", line 52, in <module>
wasb_conn_id = {{ conn.wasb }},
NameError: name 'conn' is not defined
task in dag.py
t1 = WasbLogBlobsToCSVOperator(
task_id='task_xyz',
wasb_conn_id = {{ conn.wasb }},
Custom Operator using an extended version of the Microsoft Azure wasb hook , used by dag.py -
class WasbLogBlobsToCSVOperator(BaseOperator):
template_fields = (
'wasb_conn_id',
)
def __init__(
self,
*,
wasb_conn_id: str = 'wasb',
**kwargs,
) -> None:
super().__init__(**kwargs)
self.wasb_conn_id = wasb_conn_id
self.hook = ExtendedWasbHook(wasb_conn_id=self.wasb_conn_id)

There looks to be a few things going on here that should help.
Jinja templates are string expressions. Try wrapping your wasb_conn_id arg in quotes.
wasb_conn_id = "{{ conn.wasb }}",
Templated fields are not rendered until the task runs meaning the Jinja expression won't be evaluated until an operator's execute() method is called. This is why you are seeing an exception from your comment below. The literal string "{{ conn.wasb }}" is being evaluated as the conn_id. If you want to use a template field in the custom operator, you need to move that logic to be in the scope of the execute() method.
Why do you need to use a Jinja expression here? Since the format of accessing the Connection object via Jinja is {{ conn.<my_conn_id> }}, you could just use the value "wasb" directly.

Related

Access Xcom in S3ToSnowflakeOperatorof Airflow

My use case is i have an S3 event which triggers a lambda (upon an S3 createobject event), which in turn invokes an Airflow DAG passing in a couple of --conf values (bucketname, filekey).
I am then extracting the key value using a Python operator and storing in an xcom variable. I then want to extract this xcom value within a S3ToSnowflakeOperator and essentially load the file into a Snowflake table.
All parts of the process are working bar the extraction of xcom value within the S3ToSnowflakeOperator task. I basically get the following in my logs.
query: [COPY INTO "raw".SOURCE_PARAMS_JSON FROM #MYSTAGE_PARAMS_DEMO/ files=('{{ ti.xcom...]
which looks like the jinja template is not correctly resolving the xcom value.
My code is as follows:
from airflow import DAG
from airflow.utils import timezone
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash import BashOperator
from airflow.providers.snowflake.transfers.s3_to_snowflake import S3ToSnowflakeOperator
FILEPATH = "demo/tues-29-03-2022-6.json"
args = {
'start_date': timezone.utcnow(),
'owner': 'airflow',
}
with DAG(
dag_id='example_dag_conf',
default_args=args,
schedule_interval=None,
catchup=False,
tags=['params demo'],
) as dag:
def run_this_func(**kwargs):
outkey = '{}'.format(kwargs['dag_run'].conf['key'])
print(outkey)
ti = kwargs['ti']
ti.xcom_push(key='FILE_PATH', value=outkey)
run_this = PythonOperator(
task_id='run_this',
python_callable=run_this_func
)
get_param_val = BashOperator(
task_id='get_param_val',
bash_command='echo "{{ ti.xcom_pull(key="FILE_PATH") }}"',
dag=dag)
copy_into_table = S3ToSnowflakeOperator(
task_id='copy_into_table',
s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"],
snowflake_conn_id=SNOWFLAKE_CONN_ID,
stage=SNOWFLAKE_STAGE,
schema="""\"{0}\"""".format(SNOWFLAKE_RAW_SCHEMA),
table=SNOWFLAKE_RAW_TABLE,
file_format="(type = 'JSON')",
dag=dag,
)
run_this >> get_param_val >> copy_into_table
If I replace
s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"],
with
s3_keys=[FILEPATH]
My operator works fine and the data is loaded into Snowflake. So the error is centered on resolving s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"], i believe?
Any guidance/help would be appreciated. I am using Airflow 2.2.2
I removed the S3ToSnowflakeOperator and replaced with the SnowflakeOperator.
I was then able to reference the xcom value (as above) for the sql param value.
**my xcom value was a derived COPY INTO statement effectively replicating the functionality of the S3ToSnowflakeOperator. With the added advantage of being able to store the metadata file information within the table columns too.

dbt invoke post-hook macro with list argument

I'm trying to invoke a macro as a post-hook. The trouble is (I think) is that I'd like to pass a list to this macro... any idea what's going on here? My theory is that I'm passing a list type argument.
-- models/table.sql
{{
config(
materialized = 'table',
post-hook = "{{ my_macro(this,'my_str', ['foo', 'bar']) }}"
)
}}
SELECT * FROM muh_tayble;
-- macros/my_macro.sql
{% macro my_macro(relation, string, list) %}
BLAH
{% endmacro %}
error message
Encountered an error:
Compilation Error in model table (models/table.sql)
invalid syntax for function call expression
line 2
Rookie mistake folks. I had post-hook instead of post_hook. Problem solved

Split jinja string in airflow

I trigger my dag with the API from a lambda function with a trigger on a file upload. I get the file path from the lambda context
i.e. : ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML
I put this variable in the API call to get it back as "{{ dag_run.conf['file_path'] }}"
At some point, I need to extract information from this string by splitting it by / so inside the DAG to use the S3CopyObjectOperator.
So here the first approach I had
from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3_copy_object import S3CopyObjectOperator
from airflow.operators.python_operator import PythonOperator
default_args = {
'owner': 'me',
}
s3_final_destination = {
"bucket_name": "ingestion.archive.dev",
"verification_failed": "validation_failed",
"processing_failed": "processing_failed",
"processing_success": "processing_success"
}
def print_var(file_path,
file_split,
source_bucket,
source_path,
file_name):
data = {
"file_path": file_path,
"file_split": file_split,
"source_bucket": source_bucket,
"source_path": source_path,
"file_name": file_name
}
print(data)
with DAG(
f"test_s3_transfer",
default_args=default_args,
description='Test',
schedule_interval=None,
start_date=datetime(2021, 4, 24),
tags=['ingestion', "test", "context"],
) as dag:
# {"file_path": "ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"}
file_path = "{{ dag_run.conf['file_path'] }}"
file_split = file_path.split('/')
source_bucket = file_split[0]
source_path = "/".join(file_split[1:])
file_name = file_split[-1]
test_var = PythonOperator(
task_id="test_var",
python_callable=print_var,
op_kwargs={
"file_path": file_path,
"file_split": file_split,
"source_bucket": source_bucket,
"source_path": source_path,
"file_name": file_name
}
)
file_verification_fail_to_s3 = S3CopyObjectOperator(
task_id="file_verification_fail_to_s3",
source_bucket_key=source_bucket,
source_bucket_name=source_path,
dest_bucket_key=s3_final_destination["bucket_name"],
dest_bucket_name=f'{s3_final_destination["verification_failed"]}/{file_name}'
)
test_var >> file_verification_fail_to_s3
I use the PythonOperator to check the value I got to debug.
I have the right value in file_path but I got in file_split -> ['ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML']
It's my str in a list and not each part splited like ["ingestion.archive.dev", "yolo", "PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"].
So what's wrong here?
In Airflow the Jinja rendering is not done until task runtime, however, since the parsing of the file_path value as written is performed as top-level code (i.e. outside of an Operator's execute() method or DAG instantiation, the file_path value is initialized as [" {{ dag_run.conf['file_path'] }}"] by the Scheduler. Then when the task is executed, the Jinja rendering begins which is why you see ["ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"] as the value because there is no "/" in the initialized string.
Even if you explicitly split the string within the Jinja expression like file_split="{{ dag_run.conf.file_path.split('/') }}" the value will then be the string representation of the list and not a list object.
However, in Airflow 2.1, you can set render_template_as_native_obj=True as a DAG parameter which will render templated values to a native Python object. Now the string split will render as a list as you expect:
As best practice, you should avoid top-level code since it's executed every Scheduler heartbeat which could lead to some performance issues in your DAG and environment. I would suggest passing the "{{ dag_run.conf['file_path'] }}" expression as an argument to the function which needs it and then execute the parsing logic within the function itself.

jinja giving error on pillar variable with absolute folder path

I got a pillar item which looks something like below:
"fileabspath": [
"Z:\\customer\region\chicago"
]
And on my sls file, I am trying to access the "filepath" pillar value with the below:
{% set fileabspath = salt['pillar.get']('fileabspath', None) %}
customer:
region.filepath:
- name: '{{fileabspath}}'
As soon as job hits above last line, I get below error.
failed: while parsing a block mapping\n in \"<unicode string>\", line 10, column 1\ndid not find expected key\n in \"<unicode string>\"
Any advice how to fix the issue ? Thank you.
You need to escape the backslashes if you're using that data format
"fileabspath": [
"Z:\\\\customer\\region\\chicago"
]

raise an exception in jinja if we passed in a variable that is not present in the template

Is there a method for jinja2 to raise an exception when we pass a variable that is not present in the template?
PS: This is different(or opposite) from raising an exception when a variable is present in the template but it is not passed. For this I use "undefined=StrictUndefined"
When you load your jinja2.Environment, set the 'undefined' parameter to 'jinja2.StrictUndefined', e.g.:
env = jinja2.Environment(loader=<someloader>, undefined=jinja2.StrictUndefined)
You can catch and examine the render exception to see what was missing
EDIT It would help if I read your full question. :)
Maybe this could help you
https://jinja.palletsprojects.com/en/2.11.x/api/#the-meta-api
>>> from jinja2 import Environment, meta
>>> env = Environment()
>>> ast = env.parse('{% set foo = 42 %}{{ bar + foo }}')
>>> meta.find_undeclared_variables(ast)
set(['bar'])
You can also do that:
from jinja2 import Template, StrictUndefined
Template('name: {{ name }} , city: {{ city }}',undefined=StrictUndefined).render(**{"name":"foo","city":"bar"})