datetime.datetime object is not subscriptable - json

I am trying to get the volume create time for all of my EC2 instances. The problem is the boto3 response returns CreateTime as a datetime object, which is not subscriptable. I try to use the strftime() to convert the object to type str but I must be using the wrong syntax or something because I am stil gtting the error. Below is my code and the traceback:
CODE:
import boto3
import json
import os
import csv
from datetime import datetime, date, time
ec2 = boto3.client('ec2')
ec2_response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
for item in ec2_response['Reservations']:
instance_id = item['Instances'][0]['InstanceId']
image_id = item['Instances'][0]['ImageId']
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
print(instance_id,image_id,create_time)
Traceback:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'][0].strftime("%A, %d. %B %Y %I:%M%p")
TypeError: 'datetime.datetime' object is not subscriptable

Firstly,
item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime']
should not be a list. It's an item in the documentation here and it's also an item in the JSON returned by the following aws cli command:
aws --region us-east-1 ec2 describe-instances
I suspect that when you remove the last [0] from
item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'][0]
the line of code completes successfully and list index out of range is raised from a subsequent iteration of the for loop.
It's hard to know why without running the code, but for example an instance with no volumes like this case would cause that line to fail.
You could debug like this to inspect the offending data:
try:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
except Exception as e:
import pdb; pdb.set_trace()
Or this if not attached to a shell:
try:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
except Exception as e:
print("Dumping offending item:")
print(item)
raise e
Secondly, while the AttachTime might be suitable for your use case, it's not necessarily the time at which the volume was created, since they can be created then attached to an instance. If you need the actual creation time, you need to make a second call to describe_volume_status and use the CreateTime field.

Related

Access Xcom in S3ToSnowflakeOperatorof Airflow

My use case is i have an S3 event which triggers a lambda (upon an S3 createobject event), which in turn invokes an Airflow DAG passing in a couple of --conf values (bucketname, filekey).
I am then extracting the key value using a Python operator and storing in an xcom variable. I then want to extract this xcom value within a S3ToSnowflakeOperator and essentially load the file into a Snowflake table.
All parts of the process are working bar the extraction of xcom value within the S3ToSnowflakeOperator task. I basically get the following in my logs.
query: [COPY INTO "raw".SOURCE_PARAMS_JSON FROM #MYSTAGE_PARAMS_DEMO/ files=('{{ ti.xcom...]
which looks like the jinja template is not correctly resolving the xcom value.
My code is as follows:
from airflow import DAG
from airflow.utils import timezone
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash import BashOperator
from airflow.providers.snowflake.transfers.s3_to_snowflake import S3ToSnowflakeOperator
FILEPATH = "demo/tues-29-03-2022-6.json"
args = {
'start_date': timezone.utcnow(),
'owner': 'airflow',
}
with DAG(
dag_id='example_dag_conf',
default_args=args,
schedule_interval=None,
catchup=False,
tags=['params demo'],
) as dag:
def run_this_func(**kwargs):
outkey = '{}'.format(kwargs['dag_run'].conf['key'])
print(outkey)
ti = kwargs['ti']
ti.xcom_push(key='FILE_PATH', value=outkey)
run_this = PythonOperator(
task_id='run_this',
python_callable=run_this_func
)
get_param_val = BashOperator(
task_id='get_param_val',
bash_command='echo "{{ ti.xcom_pull(key="FILE_PATH") }}"',
dag=dag)
copy_into_table = S3ToSnowflakeOperator(
task_id='copy_into_table',
s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"],
snowflake_conn_id=SNOWFLAKE_CONN_ID,
stage=SNOWFLAKE_STAGE,
schema="""\"{0}\"""".format(SNOWFLAKE_RAW_SCHEMA),
table=SNOWFLAKE_RAW_TABLE,
file_format="(type = 'JSON')",
dag=dag,
)
run_this >> get_param_val >> copy_into_table
If I replace
s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"],
with
s3_keys=[FILEPATH]
My operator works fine and the data is loaded into Snowflake. So the error is centered on resolving s3_keys=["{{ ti.xcom_pull(key='FILE_PATH') }}"], i believe?
Any guidance/help would be appreciated. I am using Airflow 2.2.2
I removed the S3ToSnowflakeOperator and replaced with the SnowflakeOperator.
I was then able to reference the xcom value (as above) for the sql param value.
**my xcom value was a derived COPY INTO statement effectively replicating the functionality of the S3ToSnowflakeOperator. With the added advantage of being able to store the metadata file information within the table columns too.

Airflow 1.x TriggerDagRunOperator pass {{ds}} as conf

In Airflow 1.x (not 2.x), I want DAG1 to trigger DAG2.
and want to pass DAG1's templated date {{ds}} as a 'conf' dict parameter like {"day":"{{ds}}"} to DAG2, so that DAG2 can access it via {{dag_run.conf['day']}}
But DAG1 just ends up passing the literal string '{{ds}}' instead of '2021-12-03'
DAG2 uses an SSHOperator, not PythonOperator (for which a solution seems to exist)
DAG 1:
from airflow.operators import TriggerDagRunOperator
def fn_day(context, dagrun_order):
dagrun_order.payload = {"day": "{{ds}}"}
return dagrun_order
trig = TriggerDagRunOperator(
trigger_dag_id="ssh",
task_id='trig',
python_callable=fn_day,
dag=dag)
DAG 2 :
from airflow.contrib.operators.ssh_operator import SSHOperator
ssh = SSHOperator(
ssh_conn_id='ssh_vm',
task_id='echo',
command="echo {{dag_run.conf['day']}}"
)

Split jinja string in airflow

I trigger my dag with the API from a lambda function with a trigger on a file upload. I get the file path from the lambda context
i.e. : ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML
I put this variable in the API call to get it back as "{{ dag_run.conf['file_path'] }}"
At some point, I need to extract information from this string by splitting it by / so inside the DAG to use the S3CopyObjectOperator.
So here the first approach I had
from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.s3_copy_object import S3CopyObjectOperator
from airflow.operators.python_operator import PythonOperator
default_args = {
'owner': 'me',
}
s3_final_destination = {
"bucket_name": "ingestion.archive.dev",
"verification_failed": "validation_failed",
"processing_failed": "processing_failed",
"processing_success": "processing_success"
}
def print_var(file_path,
file_split,
source_bucket,
source_path,
file_name):
data = {
"file_path": file_path,
"file_split": file_split,
"source_bucket": source_bucket,
"source_path": source_path,
"file_name": file_name
}
print(data)
with DAG(
f"test_s3_transfer",
default_args=default_args,
description='Test',
schedule_interval=None,
start_date=datetime(2021, 4, 24),
tags=['ingestion', "test", "context"],
) as dag:
# {"file_path": "ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"}
file_path = "{{ dag_run.conf['file_path'] }}"
file_split = file_path.split('/')
source_bucket = file_split[0]
source_path = "/".join(file_split[1:])
file_name = file_split[-1]
test_var = PythonOperator(
task_id="test_var",
python_callable=print_var,
op_kwargs={
"file_path": file_path,
"file_split": file_split,
"source_bucket": source_bucket,
"source_path": source_path,
"file_name": file_name
}
)
file_verification_fail_to_s3 = S3CopyObjectOperator(
task_id="file_verification_fail_to_s3",
source_bucket_key=source_bucket,
source_bucket_name=source_path,
dest_bucket_key=s3_final_destination["bucket_name"],
dest_bucket_name=f'{s3_final_destination["verification_failed"]}/{file_name}'
)
test_var >> file_verification_fail_to_s3
I use the PythonOperator to check the value I got to debug.
I have the right value in file_path but I got in file_split -> ['ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML']
It's my str in a list and not each part splited like ["ingestion.archive.dev", "yolo", "PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"].
So what's wrong here?
In Airflow the Jinja rendering is not done until task runtime, however, since the parsing of the file_path value as written is performed as top-level code (i.e. outside of an Operator's execute() method or DAG instantiation, the file_path value is initialized as [" {{ dag_run.conf['file_path'] }}"] by the Scheduler. Then when the task is executed, the Jinja rendering begins which is why you see ["ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"] as the value because there is no "/" in the initialized string.
Even if you explicitly split the string within the Jinja expression like file_split="{{ dag_run.conf.file_path.split('/') }}" the value will then be the string representation of the list and not a list object.
However, in Airflow 2.1, you can set render_template_as_native_obj=True as a DAG parameter which will render templated values to a native Python object. Now the string split will render as a list as you expect:
As best practice, you should avoid top-level code since it's executed every Scheduler heartbeat which could lead to some performance issues in your DAG and environment. I would suggest passing the "{{ dag_run.conf['file_path'] }}" expression as an argument to the function which needs it and then execute the parsing logic within the function itself.

"naive datetime is disallowed" when running Bash command using Apache Airflow

It is looking like sqlAlchemy might have had a facelift since the time that the Airflow tutorial were written: it is not accepting a date in the format of YYYY-DD-MM that is shown in the tutorial at http://pythonhosted.org/airflow/tutorial.html :
$airflow test tutorial print_date 2017-12-30
[2017-12-29 19:10:40,695] {__init__.py:45} INFO - Using executor SequentialExecutor
[2017-12-29 19:10:40,745] {models.py:194} INFO - Filling up the DagBag from /git/airflow/home/dags
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 4, in <module>
__import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating', 'airflow')
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
..
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy/engine/default.py", line 623, in _init_compiled
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy/sql/type_api.py", line 1074, in process
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy_utc.py", line 31, in process_bind_param
sqlalchemy.exc.StatementError: (exceptions.ValueError) naive datetime is disallowed [SQL: u'SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid \nFROM task_instance \nWHERE task_instance.dag_id = ? AND task_instance.task_id = ? AND task_instance.execution_date = ?\n LIMIT ? OFFSET ?'] [parameters: [{}]]
What is the format now required by sqlAlchemy ? (It appears to be a matter of a missing timezone - so I'm also looking into that ..)
A format like the following is working:
'2017-12-28T12:27:00Z'
Where the first portion is the date , then a timestamp after the T and then the timezone information.
As per python's official documentation, there are two types of datetime objects - aware and naive
https://docs.python.org/3/library/datetime.html
Date and time objects may be categorized as “aware” or “naive” depending on whether or not they include timezone information.
See the example below -
from datetime import datetime, timezone
date_aware = datetime.now(timezone.utc)
date_naive = datetime.now()
output of 'date_aware.tzinfo' is datetime.timezone.utc
output of 'date_naive.tzinfo' is None
https://docs.python.org/3/library/datetime.html#determining-if-an-object-is-aware-or-naive

How to get current time in jinja

What is the method to get current date time value in jinja tags ?
On my project I need to show current time in UTC at top right corner of site.
I like #Assem's answer. I'm going to demo it in a Jinja2 template.
#!/bin/env python3
import datetime
from jinja2 import Template
template = Template("""
# Generation started on {{ now() }}
... this is the rest of my template...
# Completed generation.
""")
template.globals['now'] = datetime.datetime.utcnow
print(template.render())
Output:
# Generation started on 2017-11-14 15:48:06.507123
... this is the rest of my template...
# Completed generation.
Note:
I rejected an edit from someone who stated "datetime.datetime.utcnow() is a method, not a property". While they are "correct", they misunderstood the intention of NOT including the (). Including () causes the method to be called at the time the object template.globals['now'] is defined. That would cause every use of now in your template to render the same value rather than getting a current result from datetime.datetime.utcnow.
You should use datetime library of Python, get the time and pass it as a variable to the template:
>>> import datetime
>>> datetime.datetime.utcnow()
'2015-05-15 05:22:17.953439'
For anyone who wants to add the now tag back into Jinja2 permanently, there is a great example for how to add it as a Jinja extension here:
https://www.webforefront.com/django/useandcreatejinjaextensions.html
This requires defining a new Jinja extension within a python script inside your Django web app, and then importing it as an extension in settings.py.
In your_web_app/some_file.py (I put it in the same directory as settings.py):
from jinja2 import lexer, nodes
from jinja2.ext import Extension
from django.utils import timezone
from django.template.defaultfilters import date
from django.conf import settings
from datetime import datetime
class DjangoNow(Extension):
tags = set(['now'])
def _now(self, date_format):
tzinfo = timezone.get_current_timezone() if settings.USE_TZ else None
formatted = date(datetime.now(tz=tzinfo),date_format)
return formatted
def parse(self, parser):
lineno = next(parser.stream).lineno
token = parser.stream.expect(lexer.TOKEN_STRING)
date_format = nodes.Const(token.value)
call = self.call_method('_now', [date_format], lineno=lineno)
token = parser.stream.current
if token.test('name:as'):
next(parser.stream)
as_var = parser.stream.expect(lexer.TOKEN_NAME)
as_var = nodes.Name(as_var.value, 'store', lineno=as_var.lineno)
return nodes.Assign(as_var, call, lineno=lineno)
else:
return nodes.Output([call], lineno=lineno)
Then, in settings.py, add the DjangoNow class to the extensions list under OPTIONS within TEMPLATES:
TEMPLATES = [
...
{
'BACKEND':'django.template.backends.jinja2.Jinja2',
...
'OPTIONS': {
...,
'extensions': [
'your-app-name.some-file.DjangoNow',
],
}
}
]
Then you can use the now tag just as you would use it with Django, for example:
{% now 'U' %}
I faced the same problem, but finally found jinja2-time:
from jinja2 import Environment
env = Environment(extensions=['jinja2_time.TimeExtension'])
# Timezone 'local', default format -> "2015-12-10"
template = env.from_string("{% now 'local' %}")
# Timezone 'utc', explicit format -> "Thu, 10 Dec 2015 15:49:01"
template = env.from_string("{% now 'utc', '%a, %d %b %Y %H:%M:%S' %}")
# Timezone 'Europe/Berlin', explicit format -> "CET +0100"
template = env.from_string("{% now 'Europe/Berlin', '%Z %z' %}")
# Timezone 'utc', explicit format -> "2015"
template = env.from_string("{% now 'utc', '%Y' %}")
template.render()
More about jinja2-time extension installation:
jinja2-time is available for download from PyPI via pip:
$ pip install jinja2-time
It will automatically install jinja2 along with arrow.
To add the current date stamp to an HTML document generated from a HTML template using Jinja2:
#In the HTML template do:
{{ date.strftime('%B %d, %Y') }} <!-- Output format "Month DD, YYYY" -->
(Minimum) Python code:
## Tested in Python 3.6.8
import os
import datetime
import jinja2
JINJA_ENVIRONMENT = jinja2.Environment(
loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
extensions=['jinja2.ext.autoescape'],
autoescape=True)
#When building the values to pass to jinja
self.datapool=dict()
self.datapool['date']= datetime.datetime.now()
template = JINJA_ENVIRONMENT.get_template('./templates/base-html-template.html')
report=template.render(self.datapool)