"naive datetime is disallowed" when running Bash command using Apache Airflow - sqlalchemy

It is looking like sqlAlchemy might have had a facelift since the time that the Airflow tutorial were written: it is not accepting a date in the format of YYYY-DD-MM that is shown in the tutorial at http://pythonhosted.org/airflow/tutorial.html :
$airflow test tutorial print_date 2017-12-30
[2017-12-29 19:10:40,695] {__init__.py:45} INFO - Using executor SequentialExecutor
[2017-12-29 19:10:40,745] {models.py:194} INFO - Filling up the DagBag from /git/airflow/home/dags
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 4, in <module>
__import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating', 'airflow')
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
..
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy/engine/default.py", line 623, in _init_compiled
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy/sql/type_api.py", line 1074, in process
File "build/bdist.macosx-10.12-x86_64/egg/sqlalchemy_utc.py", line 31, in process_bind_param
sqlalchemy.exc.StatementError: (exceptions.ValueError) naive datetime is disallowed [SQL: u'SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid \nFROM task_instance \nWHERE task_instance.dag_id = ? AND task_instance.task_id = ? AND task_instance.execution_date = ?\n LIMIT ? OFFSET ?'] [parameters: [{}]]
What is the format now required by sqlAlchemy ? (It appears to be a matter of a missing timezone - so I'm also looking into that ..)

A format like the following is working:
'2017-12-28T12:27:00Z'
Where the first portion is the date , then a timestamp after the T and then the timezone information.

As per python's official documentation, there are two types of datetime objects - aware and naive
https://docs.python.org/3/library/datetime.html
Date and time objects may be categorized as “aware” or “naive” depending on whether or not they include timezone information.
See the example below -
from datetime import datetime, timezone
date_aware = datetime.now(timezone.utc)
date_naive = datetime.now()
output of 'date_aware.tzinfo' is datetime.timezone.utc
output of 'date_naive.tzinfo' is None
https://docs.python.org/3/library/datetime.html#determining-if-an-object-is-aware-or-naive

Related

ib_insync "contract can't be hashed"

A process I have that's been running for a couple of years suddenly stopped working.
I avoided updating much in the way of python and packages to avoid that..
I've now updated ib_insync to the latest version, and no improvement. debugging a little gives me this:
the code
import ib_insync as ibis
ib = ibis.IB()
contract = ibis.Contract()
contract.secType = 'STK'
contract.currency = 'USD'
contract.exchange = 'SMART'
contract.localSymbol = 'AAPL'
ib.qualifyContracts(contract)
Result:
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/client.py", line 244, in send
if field in empty:
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/contract.py", line 153, in hash
raise ValueError(f'Contract {self} can't be hashed')
ValueError: Contract Contract(secType='STK', exchange='SMART', currency='USD', localSymbol='AAPL') can't be hashed
Exception ignored in: <bound method IB.del of <IB connected to 127.0.0.1:7497 clientId=6541>>
Traceback (most recent call last):
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/ib.py", line 233, in del
File "/Users/macuser/anaconda3/lib/python3.6/site-packages/ib_insync/ib.py", line 281, in disconnect
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1306, in info
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1442, in _log
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1452, in handle
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1514, in callHandlers
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 863, in handle
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1069, in emit
File "/Users/macuser/anaconda3/lib/python3.6/logging/init.py", line 1059, in _open
NameError: name 'open' is not defined
| => python --version
Python 3.6.4 :: Anaconda, Inc.
I am not OP, but have had a similar problem. I am attempting to send an order to IB using ib_insync.
contract = Stock('DKS','SMART','USD')
order = LimitOrder('SELL', 1, 1)
try:
ib.qualifyContracts(contract)
trade = ib.placeOrder(contract, order)
except Exception as e:
print(e)
Here is the exception that is returned:
Contract Stock(symbol='DKS', exchange='SMART', currency='USD') can't be hashed
I understand that lists and other mutable can't be hashed, but I don't understand why this wouldn't work. It pretty clearly follows the examples in the ib_insync docs.
FYI - I was able to resolve this issue by updating to the latest ib_insync version. Perhaps this will help you as well.

datetime.datetime object is not subscriptable

I am trying to get the volume create time for all of my EC2 instances. The problem is the boto3 response returns CreateTime as a datetime object, which is not subscriptable. I try to use the strftime() to convert the object to type str but I must be using the wrong syntax or something because I am stil gtting the error. Below is my code and the traceback:
CODE:
import boto3
import json
import os
import csv
from datetime import datetime, date, time
ec2 = boto3.client('ec2')
ec2_response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
for item in ec2_response['Reservations']:
instance_id = item['Instances'][0]['InstanceId']
image_id = item['Instances'][0]['ImageId']
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
print(instance_id,image_id,create_time)
Traceback:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'][0].strftime("%A, %d. %B %Y %I:%M%p")
TypeError: 'datetime.datetime' object is not subscriptable
Firstly,
item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime']
should not be a list. It's an item in the documentation here and it's also an item in the JSON returned by the following aws cli command:
aws --region us-east-1 ec2 describe-instances
I suspect that when you remove the last [0] from
item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'][0]
the line of code completes successfully and list index out of range is raised from a subsequent iteration of the for loop.
It's hard to know why without running the code, but for example an instance with no volumes like this case would cause that line to fail.
You could debug like this to inspect the offending data:
try:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
except Exception as e:
import pdb; pdb.set_trace()
Or this if not attached to a shell:
try:
create_time = item['Instances'][0]['BlockDeviceMappings'][0]['Ebs']['AttachTime'].strftime("%A, %d. %B %Y %I:%M%p")
except Exception as e:
print("Dumping offending item:")
print(item)
raise e
Secondly, while the AttachTime might be suitable for your use case, it's not necessarily the time at which the volume was created, since they can be created then attached to an instance. If you need the actual creation time, you need to make a second call to describe_volume_status and use the CreateTime field.

CSV Reader works, Trouble with CSV writer

I am writing a very simple python script to READ a CSV (no problem) and to write to another CSV (issue):
System info:
Windows 10
Powershell
Python 3.6.5 :: Anaconda, Inc.
Sample Data: Office Events
The purpose is to filter events based on criteria, and to write to another CSV with desired criteria.
For Example:
I would like to read from this CSV and write the events where Registrations (or column 4) is Greater than 0 (remove rows with registrations = 0)
# SCRIPT TO FILTER EVENTS TO BE PROCESSED
import os
import time
import shutil
import os.path
import fnmatch
import csv
import glob
import pandas
# Location of file containing ALL events
path = r'allEvents.csv'
# Writes to writer
writer = csv.writer(open(r'RegisteredEvents' + time.strftime("%m_%d_%Y-%I_%M_%S") + '.csv', "wb"))
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
#writer.writerow([r'Event Name', r'Start Date', r'End Date', r'Registrations', r'Total Revenue', r'ID', r'Status'])
#writer.writerow([b'Event Name', b'Start Date', b'End Date', b'Registrations', b'Total Revenue', b'ID', b'Status'])
def checkRegistrations(file):
reader = csv.reader(file)
data = list(reader)
for row in data:
#if row[3] > str(0):
if row[3] > int(0):
writer.writerow(([data]))
The Error I continue to get is:
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
TypeError: a bytes-like object is required, not 'str'
I have tried using the various commented out statements
For Example:
"" vs r"" vs r'' vs b''
if row[3] > int(0) **vs** if row[3] > str(0)
Every time I execute my script, It creates the file.. so the first csv writer line works (create and open the file)... the second line (to write the headers) is when the error appears...
Perhaps I am getting mixed up with syntax due to python versions, or perhaps I am misusing the CSV library, or (more than likely) I have endless to learn about data type IO and conversion... someone please help!!
I am aware of the excess of import libraries -- script came from another basic script to move files from one location to another based on filename and output a rowcounter for each file being moved.
With that being said, I may be unaware of any missing/ needed libraries
Please let me know if you have any questions, concerns or clarifications
Thanks in advance!
It looks like you are calling:
writer = csv.writer(open('file.csv', 'wb'))
The 'wb' argument is the file mode. The 'b' means that you are opening the file that you are writing to in binary mode. You are then trying to write a string which isn't what it is expecting.
Try getting rid of the 'b' in the 'wb'.
writer = csv.writer(open('file.csv', 'w'))
Let me know if that works for you.

How to specify gunicorn log max size

I'm running gunicorn as:
guiconrn --bind=0.0.0.0:5000 --log-file gunicorn.log myapp:app
Seems like gunicorn.log keeps growing. Is there a way to specify a max size of the log file, so that if it reaches max size, it'll just override it.
Thanks!!
TLDR;
I believe there might be a "python only" solution using the rotating file handler provided in the internal lib of python. (at least 3.10)
To test
I created a pet project for you to fiddle with:
Create the following python file
test_logs.py
import logging
import logging.config
import time
logging.config.fileConfig(fname='log.conf', disable_existing_loggers=False)
while True:
time.sleep(0.5)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')
Create the following config file
log.conf
[loggers]
keys=root
[handlers]
keys=rotatingHandler
[formatters]
keys=sampleFormatter
[logger_root]
level=DEBUG
handlers=rotatingHandler
[handler_rotatingHandler]
class=logging.handlers.RotatingFileHandler
level=DEBUG
formatter=sampleFormatter
args=('./logs/logs.log', 'a', 1200, 1, 'utf-8')
[formatter_sampleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
Create the ./logs directory
Run python test_logs.py
To Understand
As you may have noticed already, the setting that allow for this behaviour is logging.handlers.RotatingFileHandler and the provided arguments args=('./logs/logs.log', 'a', 1200, 10, 'utf-8')
RotatingFileHandler is a stream handler writing to a file. That allow for 2 parameters of interest:
maxBytes set arbitrarily at 1200
backupCount set arbitrarily to 10
The behaviour is that upon reaching 1200 Bytes in size, the file is closed, renamed to /logs/logs.log.<a number up to 10> and a new file is opened.
BUT is any of maxBytes or backupCount is 0. No rotation is done !
In Gunicorn
As per the documentation you can feed a config file.
This could look like:
guiconrn --bind=0.0.0.0:5000 --log-config log.conf myapp:app
You will need to tweak it to your existing setup.
On Ubuntu/Linux, suggest to use logrotate to manage your logs, do like this: https://stackoverflow.com/a/55643449/6705684
Since Python>3.3, With RotatingFileHandler, here is my solution(MacOS/Windows/Linux/...) :
import os
import logging
from logging.handlers import RotatingFileHandler
fmt_str = '[%(asctime)s]%(module)s - %(funcName)s - %(message)s'
fmt = logging.Formatter(fmt_str)
def rotating_logger(name, fmt=fmt,
level=logging.INFO,
logfile='.log',
maxBytes=10 * 1024 * 1024,
backupCount=5,
**kwargs
):
logger = logging.getLogger(name)
hdl = RotatingFileHandler(logfile, maxBytes=maxBytes, backupCount=backupCount)
hdl.setLevel(level)
hdl.setFormatter(fmt)
logger.addHandler(hdl)
return logger
more refer:
https://docs.python.org/3/library/logging.handlers.html#rotatingfilehandler

neo4j: How to load CSV using conditional correctly?

What I am trying to do is to import a dataset with a tree data structure inside from CSV to neo4j. Nodes are stored along with their parent node and depth level (max 6) in the tree. So I try to check depth level using CASE and then add a node to its parent like this (creating a node just for 1st level so far for testing purpose):
export FILEPATH=file:///Example.csv
CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line,
line.`Level` AS level,
line.`ParentCodeID_Cal` AS parentCode,
line.`CodeSet` AS codeSet,
line.`Category` AS nodeCategory,
line.`Type` AS nodeType,
line.`L1code` AS l1Code, line.`L1Description` AS l1Description, line.`L1Name` AS l1Name, line.`L1NameAb` AS l1NameAb,
line.`L2code` AS l2Code, line.`L2Description` AS l2Description, line.`L2Name` AS l2Name, line.`L2NameAb` AS l2NameAb,
line.`L3code` AS l3Code, line.`L3Description` AS l3Description, line.`L3Name` AS l3Name, line.`L3NameAb` AS l3NameAb,
line.`L1code` AS l4Code, line.`L4Description` AS l4Description, line.`L4Name` AS l4Name, line.`L4NameAb` AS l4NameAb,
line.`L1code` AS l5Code, line.`L5Description` AS l5Description, line.`L5Name` AS l5Name, line.`L5NameAb` AS l5NameAb,
line.`L1code` AS l6Code, line.`L6Description` AS l6Description, line.`L6Name` AS l6Name, line.`L6NameAb` AS l6NameAb,
codeSet + parentCode AS nodeId
CASE line.`Level`
WHEN '1' THEN CREATE (n0:Node{id:nodeId, description:l1Description, name:l1Name, nameAb:l1NameAb, category:nodeCategory, type:nodeType})
ELSE
END;
But I get this result:
WARNING: Invalid input 'S': expected 'l/L' (line 17, column 3 (offset:
982)) "CASE level " ^
I'm aware there is a mistake at syntax.
I'm using neo4j 3.0.4 & Windows 10 (using neo4j shell running it with D:\Program Files\Neo4j CE 3.0.4\bin>java -classpath neo4j-desktop-3.0.4.jar org.neo4j.shell.StartClient).
You have several syntax errors. For example, a CASE clause cannot contain a CREATE clause.
In any case, you should be able to greatly simplify your Cypher. For example, this might suit your needs:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line, ('l' + line.Level) AS prefix
CREATE (:Node{
id: line.CodeSet + line.ParentCodeID_Cal,
description: line[prefix + 'Description'],
name: line[prefix + 'Name'],
nameAb: line[prefix + 'NameAb'],
category: line.Category,
type: line.Type})