Apache airflow - Unable to templatize queue name inherited from baseOperator - jinja2

I have a Custom operator that inherits baseoperator. I am trying to templatize 'queue' name so that task can be picked up by different Celery worker.
But it uses raw template string (un rendered jinja string) as the queue name instead of rendered string.
The same flow works if I give the intended queue name directly as a simple string.
from airflow import DAG
from operators.check_operator import CheckQueueOperator
from datetime import datetime, timedelta
from airflow.operators.python_operator import BranchPythonOperator
from airflow.utils.dates import days_ago
default_args = {
'schedule_interval': None, # exclusively “externally triggered” DAG
'owner': 'admin',
'description': 'This helps to quickly check queue templatization',
'start_date': days_ago(1),
'retries': 0,
'retry_delay': timedelta(minutes=5),
'provide_context': True
}
# this goes to wrong queue --> {{ dag_run.conf["queue"]}}
with DAG('test_queue', default_args=default_args) as dag:
t1 = CheckQueueOperator(task_id='check_q',
queue='{{ dag_run.conf["queue"]}}'
)
In the above this scenario :
In RabbitMQ, I see the task being queued under queue name '{{ dag_run.conf["queue"]}}' (raw template string )
In Airflow, under Rendered template I am able to see properly rendered value for queue field
In the screenshot, we see docker-desktop as queue name. It's my test queue and also my default airflow queue. It works perfectly if I give this queue name as a direct string.
#this goes to right queue --> my_target_queue
with DAG('test_queue', default_args=default_args) as dag:
t1 = CheckQueueOperator(task_id='check_q',
queue='my_target_queue'
)
CheckQueueOperator code :
from airflow.models.baseoperator import BaseOperator
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
'''
Validate if queue can be templatized in base operator
'''
class CheckQueueOperator(BaseOperator):
template_fields = ['queue']
#apply_defaults
def __init__(
self,
*args,
**kwargs
):
super(CheckQueueOperator, self).__init__(*args, **kwargs)
def execute(self, context):
self.log.info('*******************************')
self.log.info('Queue name %s', self.queue)
return
Stack details:
Apache Airflow version - 1.10.12
Using CeleryExecutor
Using RabbitMQ

The queue attribute is reserved (perhaps not officially, but in practice) by the BaseOperator and while you may be able to hoodwink the webserver into rendering the attribute, the parts of Airflow that handle scheduling and task execution don't perform rendering prior to reading the queue attribute.

Related

Context Function error while using Jinja2 when trying to build templates

Im following ChristopherGS's tutorial for FASTapi but i'm stuck on part 6 because I believe his syntax may be already deprecated.
I get AttributeError: module 'jinja2' has no attribute 'contextfunction at the end when the program stops. How do I solve this, I've been stuck here for 3 days.
This is my code:
from fastapi.templating import Jinja2Templates
from typing import Optional, Any
from pathlib import Path
from app.schemas import RecipeSearchResults, Recipe, RecipeCreate
from app.recipe_data import RECIPES
BASE_PATH = Path(__file__).resolve().parent
TEMPLATES = Jinja2Templates(directory=str(BASE_PATH / "templates"))
app = FastAPI(title="Recipe API", openapi_url="/openapi.json")
api_router = APIRouter()
# Updated to serve a Jinja2 template
# https://www.starlette.io/templates/
# https://jinja.palletsprojects.com/en/3.0.x/templates/#synopsis
#api_router.get("/", status_code=200)
def root(request: Request) -> dict:
"""
Root GET
"""
return TEMPLATES.TemplateResponse(
"index.html",
{"request": request, "recipes": RECIPES},
)
#api_router.get("/recipe/{recipe_id}", status_code=200, response_model=Recipe)
def fetch_recipe(*, recipe_id: int) -> Any:
"""
Fetch a single recipe by ID
"""
result = [recipe for recipe in RECIPES if recipe["id"] == recipe_id]
if not result:
# the exception is raised, not returned - you will get a validation
# error otherwise.
raise HTTPException(
status_code=404, detail=f"Recipe with ID {recipe_id} not found"
)
return result[0]
if __name__ == "__main__":
# Use this for debugging purposes only
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8001, log_level="debug")
It could be due to the version mismatch bewteen jinja and starlette(fastapi).
I faced similar issue with the latest fastapi docker image(python3.9). It was resolved by installing an older version of jinja2.
Try downgrading jinja2, if you are using jinja2 >3.0.3:
pip install jinja2==3.0.3
Other option would be to upgrade fastapi/starlette.
Ref:
FastAPI Jinja2Templates - Error while running initialising templates?
https://github.com/pallets/jinja/blob/1b714c7e82c73575d1dba48f560db07fe9a5cb74/CHANGES.rst#version-310

Converting JSON into a DataFrame within FastAPI app

I am trying to create an API for customer churn at a bank. I have completed the model and now want to create the API using FastAPI. My problem is converting the JSON passed data to a dataframe to be able to run it through the model. Here is the code.
from fastapi import FastAPI
from starlette.middleware.cors import CORSMiddleware
from pycaret.classification import *
import pandas as pd
import uvicorn # ASGI
import pickle
import pydantic
from pydantic import BaseModel
class customer_input(BaseModel):
CLIENTNUM:int
Customer_Age:int
Gender:str
Dependent_count:int
Education_Level:str
Marital_Status:str
Income_Category:str
Card_Category:str
Months_on_book:int
Total_Relationship_Count:int
Months_Inactive_12_mon:int
Contacts_Count_12_mon:int
Credit_Limit:float
Total_Revolving_Bal:int
Avg_Open_To_Buy:float
Total_Amt_Chng_Q4_Q1:float
Total_Trans_Amt:int
Total_Trans_Ct:int
Total_Ct_Chng_Q4_Q1:float
Avg_Utilization_Ratio:float
app = FastAPI()
#Loading the saved model from pycaret
model = load_model('BankChurnersCatboostModel25thDec2020')
origins = [
'*'
]
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=['GET','POST'],
allow_headers=['Content-Type','application/xml','application/json'],
)
#app.get("/")
def index():
return {"Nothing to see here"}
#app.post("/predict")
def predict(data: customer_input):
# Convert input data into a dictionary
data = data.dict()
# Convert the dictionary into a dataframe
my_data = pd.DataFrame([data])
# Predicting using pycaret
prediction = predict_model(model, my_data)
return prediction
# Only use below 2 lines when testing on localhost -- remove when deploying
if __name__ == '__main__':
uvicorn.run(app, host='127.0.0.1', port=8000)
When I test this out I get the Internal Server Error from the OpenAPI interface so I check my cmd and the error says
ValueError: [TypeError("'numpy.int64' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')]
How can I have the data that is passed into the predict function successfully convert into a dataframe. Thank you.
Ok so I fixed this by changing the customer_input class. Any int types I changed to a float and that fixed it. I don't understand why though. Can anyone explain?
Fundamentally those int values are only meant to be an integer because they are all discrete values (i.e choosing number of dependents in a bank) but I guess I could put a constrain on the front-end.

How do I Force db commit of a single save in django-orm in a celery task

I am using django and celery. I have a long running celery task and I would like it to report progress. I am doing this:
#shared_task
def do_the_job(tracker_id, *args, **kwargs):
while condition:
#Do a long operation
tracker = ProgressTracker.objects.get(pk=tracker_id)
tracker.task_progress = F('task_progress') + 1
tracker.last_update = timezone.now()
tracker.save(update_fields=['task_progress', 'last_update'])
The problem is that the view that is supposed to show the progress to the user cannot see the updates until the task finishes. Is there a way to get the django orm to ignore transactions for just this one table? Or just this one write?
You can use bound tasks to define custom states for your tasks and set/update the state during execution:
#celery.task(bind=True)
def show_progress(self, n):
for i in range(n):
self.update_state(state='PROGRESS', meta={'current': i, 'total': n})
You can dump the state of currently executing tasks to get the progress:
>>> from celery import Celery
>>> app = Celery('proj')
>>> i = app.control.inspect()
>>> i.active()

Using python argparse arguments as variable values within a json file

I've googled this quite a bit and am unable to find helpful insight. Basically, I need to take the user input from my argparse arguments from a python script (as shown below) and plug those values into a json file (packerfile.json) located in the same working directory. I have been experimenting with subprocess, invoke and plumbum libraries without being able to "find the shoe that fits".
From the following code, I have removed all except for the arguments as to clean up:
#!/usr/bin/python
import os, sys, subprocess
import argparse
import json
from invoke import run
import packer
parser = argparse.ArgumentParser()
parser._positionals.title = 'Positional arguments'
parser._optionals.title = 'Optional arguments'
parser.add_argument("--access_key",
required=False,
action='store',
default=os.environ['AWS_ACCESS_KEY_ID'],
help="AWS access key id")
parser.add_argument("--secret_key",
required=False,
action='store',
default=os.environ['AWS_SECRET_ACCESS_KEY'],
help="AWS secret access key")
parser.add_argument("--region",
required=False,
action='store',
help="AWS region")
parser.add_argument("--guest_os_type",
required=True,
action='store',
help="Operating system to install on guest machine")
parser.add_argument("--ami_id",
required=False,
help="AMI ID for image base")
parser.add_argument("--instance_type",
required=False,
action='store',
help="Type of instance determines overall performance (e.g. t2.medium)")
parser.add_argument("--ssh_key_path",
required=False,
action='store',
default=os.environ['HOME']+'/.ssh',
help="SSH key path (e.g. ~/.ssh)")
parser.add_argument("--ssh_key_name",
required=True,
action='store',
help="SSH key name (e.g. mykey)")
args = parser.parse_args()
print(vars(args))
json example code:
{
"variables": {
"aws_access_key": "{{ env `AWS_ACCESS_KEY_ID` }}",
"aws_secret_key": "{{ env `AWS_SECRET_ACCESS_KEY` }}",
"magic_reference_date": "{{ isotime \"2006-01-02\" }}",
"aws_region": "{{ env 'AWS_REGION' }}",
"aws_ami_id": "ami-036affea69a1101c9",
"aws_instance_type": "t2.medium",
"image_version" : "0.1.0",
"guest_os_type": "centos7",
"home": "{{ env `HOME` }}"
},
so, the user input for the --region as shown in the python script shoul get plugged into the value for aws_region in the json file.
I am aware of how to print the value of args. The full command that I am providing to the script is: python packager.py --region us-west-2 --guest_os_type rhel7 --ssh_key_name test_key and the printed results are {'access_key': 'REDACTED', 'secret_key': 'REDACTED', 'region': 'us-west-2', 'guest_os_type': 'rhel7', 'ami_id': None, 'instance_type': None, 'ssh_key_path': '/Users/REDACTEDt/.ssh', 'ssh_key_name': 'test_key'} .. what i need is to import thos values into the packerfile.json variables list.. preferably in a way that i can reuse it (so it musn't overwrite the file)
Note: I have also been experimenting with using python to export local environment variables then having the JSON file pick them up, but that doesn't really seem like a viable solution.
I think that the best solution might be to take all of these arguments and export them to its own JSON file called variables.json and import these variables from JSON (variables.json) to JSON (packerfile.json) as a seperate process. Still could use guidence here though :)
You might use the __dict__ attribute from the SimpleNamespace that is returned by the ArgumentParser. Like so:
import json
parsed = parser.parse_args()
with open('packerfile.json', 'w') as f:
json.dump(f, parsed.__dict__)
If required, you could use add_argument(dest='attrib_name') to customise attribute names.
I was actually able to come up with a pretty simple solution.
args = parser.parse_args()
print(json.dumps(vars(args), indent=4))
s.call("echo '%s' > variables.json && packer build -var-file=variables.json packerfile.json" % json_formatted, shell=True)
arguments are captured under the variable args and dumped to the output with json.dump while vars is making sure to also dump the arguments with their key values and I currently have to run my code with >> vars.json but I'll insert logic to have python handle that.
Note: s == subprocess in s.call

Multiprocessing, sqlAlchemy and scoped_sessions

I want to run multiple strategies in concurrent processes. I came up with something like this
import logging
import multiprocessing
import os
from sqlalchemy.orm import scoped_session, Session
from pyutil.sql.interfaces.symbols.symbol import Symbol
from pyutil.sql.session import get_one_or_create
class StratRunner(object):
def __init__(self, session_scope, logger=None):
assert isinstance(session_scope, scoped_session)
self.__session_scope = session_scope
self.__logger = logger or logging.getLogger(__name__)
# this function is the target for mp.Process
def _run(self, strategy):
self.__logger.debug("Pid {pid}".format(pid=os.getpid()))
symbols = self.symbols
self.__logger.info("Run strategy {s}".format(s=strategy))
configuration = strategy.configuration()
strategy.upsert(portfolio=configuration.portfolio, symbols=symbols, days=5)
def run_strategies(self):
# loop over all active strategies!
jobs = []
# we are in the main thread here...
for s in self.active_strategies:
# what shall I give to the Process? The strategy object, the strategy_id, a session instance, the session_scope...
job = multiprocessing.Process(target=self._run, kwargs={"strategy": s})
job.name = s.name
jobs.append(job)
run_jobs(jobs, logger=self.__logger)
#property
def symbols(self):
return {s.name: s for s in self.__session_scope().query(Symbol)}
#property
def active_strategies(self):
return self.__session_scope().query(Strategy).filter(Strategy.active == True).all()
I am aware of tons of documentation on this project but I am overwhelmed.
I loop over the rows of a table (The active_strategies). class Strategies(Base)... . I then hand over the strategy object to the _run method and update the strategy object within the very same method. Please feel free to shred my code.
I am in particular puzzled about what to give to the _run method? Shall I hand over the strategy object, the strategy ID, the session, the scoped_session, ... ?
I have now created a runner object:
import abc
import logging
import os
from sqlalchemy.orm import sessionmaker
class Runner(object):
__metaclass__ = abc.ABCMeta
def __init__(self, engine, logger=None):
self.__engine = engine
self._logger = logger or logging.getLogger(__name__)
self.__jobs = []
#property
def _session(self):
""" Create a fresh new session... """
self.__engine.dispose()
factory = sessionmaker(self.__engine)
return factory()
def _run_jobs(self):
self._logger.debug("PID main {pid}".format(pid=os.getpid()))
for job in self.jobs:
# all jobs get the trigge
self._logger.info("Job {j}".format(j=job.name))
job.start()
for job in self.jobs:
self._logger.info("Wait for job {j}".format(j=job.name))
job.join()
self._logger.info("Job {j} done".format(j=job.name))
#property
def jobs(self):
return self.__jobs
#abc.abstractmethod
def run(self):
""" Described in the child class """
In particular this class can provide a fresh session (via ._session). However, using this setup I see plenty of :
psycopg2.OperationalError: server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.