jinja2.exceptions.TemplateSyntaxError: expected token ':', got '}' - jinja2

I'm trying to use an xcom_pull inside an SQL phrase executed by a Snowflake operator in Airflow.
I need the task_id name to use a variable since I want to support different tasks.
I tried this syntax but seems it is not being rendered ok.
Anyone has an idea how to do it?
This is the Python code:
for product, val in PRODUCTS_TO_EXTRACT_INC.items():
product_indicator, prefix = val
params['product_prefix'] = prefix
calculate_to_date = SnowflakeOperator(
dag=dag,
task_id=f'calculate_to_date_{prefix}',
snowflake_conn_id = SF_CONNECTION_ID,
warehouse=SF_WAREHOUSE,
database=BI_DB,
schema=STG_SCHEMA,
role=SF_ROLE,
sql= [ """
{SQL_FILE}
""".format(SQL_FILE="{% include '" + QUERIES_DIR + ETL + "/calculate_to_date.sql'" + " %}")
],
params=params
)
This is the SQL code for calculate_to_date.sql:
select '{{{{ (ti.xcom_pull(key="return_value", task_ids=["calculate_from_date_{}"])[0][0]).get("FROM_DATE") }}}}'.format(params.product_prefix) AS TO_DATE
This is the error message:
File "/home/airflow/gcs/dags/Test/queries/fact_subscriptions_events/calculate_to_date.sql", line 11, in template
select '{{{{ (ti.xcom_pull(key="return_value", task_ids=["calculate_from_date_{}"])[0][0]).get("FROM_DATE") }}}}'.format(params.product_prefix)
jinja2.exceptions.TemplateSyntaxError: expected token ':', got '}'

the correct syntax is
select '{{ (ti.xcom_pull(key="return_value", task_ids="calculate_from_date_{}".format(params.product_prefix))[0]).get("FROM_DATE") }}' AS TO_DATE
it works like a charm

Related

Chain Functions, I'm getting an error from PyCharm

Chain Functions...the how to is alluding me.
if monthly_investment (float(input("Enter monthly investment:\t ")) = > 1) and (monthly_investment < = 1000):
Getting "expected ',' or ')', I don't see the need for either...

Python 3 psycopg2 COPY from stdin failed: error in .read()

I am trying to apply the code found on this page, in particular part 'Copy Data from String Iterator' of the Table of Contents, but run into an issue with my code.
Since not all lines coming from the generator (here log_lines) can be imported into the PostgreSQL database, I try to filter the correct lines (here row) using itertools.filterfalse like in the codeblock below:
def copy_string_iterator(connection, log_lines) -> None:
with connection.cursor() as cursor:
create_staging_table(cursor)
log_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (
row['date'],
row['time'],
row['cs_uri_query'],
row['s_contentpath'],
row['sc_status'],
row['s_computername'],
...
row['sc_substates'],
row['s_port'],
row['cs_version'],
row['c_protocol'],
row.update({'cs_cookie':'x'}),
row['timetakenms'],
row['cs_uri_stem'],
))) + '\n')
for row in filterfalse(lambda line: "#" in line.get('date'), log_lines)
)
cursor.copy_from(log_string_iterator, 'log_table', sep = '|')
When I run this, cursor.copy_from() gives me the following error:
QueryCanceled: COPY from stdin failed: error in .read() call
CONTEXT: COPY log_table, line 112910
I understand why this error happens, it is because in the test file I use there are only 112909 lines that meet the filterfalse condition. But why does it try to copy line 112910 and throw the error and not just stop?
Since Python doesn't have a coalescing operator, add something like:
(map(clean_csv_value, (
row['date'] if 'date' in row else None,
:
row['cs_uri_stem'] if 'cs_uri_stem' in row else None,
))) + '\n')
for each of your fields so you can handle any missing fields in the JSON file. Of course the fields should be nullable in the db if you use None otherwise replace with None with some default value for that field.

Double Quotes in temporary JSON variable on MySQL using R

I have a table in MYSQL that contains the user interactions with a Web Page, I needed to extract the rows for the users where the date of that interaction is lower than a certain benchmark date and that benchmark date is different for each customer (I extract that date from a different database).
My approach was to set a json variable in which the key is a user and the value is the benchmark date, and used it in the query to extract the intended fields.
Example in R:
#MainDF contains the user and the benchmark date from a different database
json_str <- mapply(function(uid, bench_date){
paste0(
'{','"',cust,'"', ':', '"', bench_date, '"','}'
)
}, MainDF[, 'uid'],
MainDF[, 'date']
)
json_str <- paste0("'", '[', paste0(json_str , collapse = ','), ']', "'")
temp_var <- paste('set #test=', json_str)
The intention was to make temp_var to be like:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
but it actually looks like :
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
then create the main query:
main_Q <- "select user_id, date
from interaction
where 1=1
and json_contains(json_keys(#test), concat('\"',user_id,'\"')) = 1
and date <= json_unquote(json_extract(#test,
concat('$.','\"',user_id, '\"')
)
)
"
For the execution, first, set the temporal variable and then execute the main query
dbSendQuery(connection, temp_var)
resp <- dbSendQuery(connection, main_Q )
target_df <- fetch(resp, n=-1)
dbClearResult(resp )
When I test a fraction of it in a SQL IDE it does works. However, in R it doesn't return anything.
I think that the issue is that R escape the double quotes in temp_var and SQL end up reading
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
which is not won't work.
For example if I execute:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
select json_keys(#test)
it will return an array with the keys, but that is not the case with
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
select json_keys(#test)
I am not sure how to solve the issue, but I need double quotes to specify the JSON. Is there any other approach that I should try or a way to make this work?
First, I think it is generally better to use a well-known library/package for converting to/from JSON, for several reasons.
This gives you a string that you should be able to place just about anywhere.
json_str <- jsonlite::toJSON(setNames(as.list(MainDF$date), MainDF$uid), auto_unbox=TRUE)
json_str
# {"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}
And while looking at the object on the R console will give the escaped-doublequotes,
as.character(json_str)
# [1] "{\"0001\":\"2010-05-05\",\"0012\":\"2015-05-05\",\"0101\":\"2018-07-20\"}"
that is merely R's representation (shows all strings within double-quotes, and therefore needs to escape any double-quotes within the string).
Adding it into some script should be straight-forward:
cat(paste('set #test=', sQuote(json_str)), '\n')
# set #test= '{"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}'
I'm assuming that having each on its own row is not critical. If it is, and indentation is important, perhaps this is more your style:
spaces <- strrep(' ', 2+nchar('set #test = '))
cat(paste0('set #test = ', sQuote(gsub(",", paste0(",\n", spaces), json_str))), '\n')
# set #test = '{"0001":"2010-05-05",
# "0012":"2015-05-05",
# "0101":"2018-07-20"}'
Data:
MainDF <- read.csv(stringsAsFactors=FALSE, colClasses='character', text='
uid,date
0001,2010-05-05
0012,2015-05-05
0101,2018-07-20')

Error DeserializeJSON() MySQL json_object

I am getting back a JSON string from a MySQL 5.7 query in ColdFusion 9.0.1. Here is my query:
SELECT (
SELECT GROUP_CONCAT(
JSON_OBJECT(
'nrtype', nrt.nrtype,
'number', nr.number
)
)
) AS nrJSON
FROM ...
The returned data looks like this:
{"nrtype": "Phone 1", "number": "12345678"},{"nrtype": "E-Mail 1", "number": "some#email.com"}
But as soon as I try to use DeserializeJSON() on it I am getting the following error:
JSON parsing failure at character 44:',' in {"nrtype": "Phone 1", "number": "12345678"},{"nrtype": "E-Mail 1", "number": "some#email.com"}
I am a little confused. What I want to get is a structure created by the DeserializeJSON() function.
What can I do?
That is not valid JSON as the parser is describing. If you wrap that JSON within square brackets '[' and ']' it would be valid (or at least parsable). They will make it an array of structures. Not sure how to make MySQL return the data within those brackets?
I guess you could add the brackets using ColdFusion but I would prefer to have the source do it correctly.
jsonhack = '[' & queryname.nrJSON & ']';
datarecord = DeserializeJSON(jsonhack);
writeDump(datarecord);
I created an example with your data that you can see here - trycf.com gist
From the comments
The solution indeed was [to add the following to the SQL statement]:
CONTACT('[',
GROUP_CONCAT(
JSON_OBJECT(...)
),
']')
If you have columns with some already containing JSON format String, try this : https://stackoverflow.com/a/45278722/2282880
Portion of code with JSON_MERGE() :
...
CONCAT(
'{"elements": [',
GROUP_CONCAT(
JSON_MERGE(
JSON_OBJECT(
'type', T2.`type`,
'data', T2.`data`
),
CONCAT('{"info": ', T2.`info`, '}')
)
),
']}'
) AS `elements`,
...

Querying cassandra error no viable alternative at input 'ALLOW'

I'm trying to run a query on Cassandra through spark.
When running this command:
val test = sc.cassandraTable[Person](keyspace,table)
.where("name=?","Jane").collect
I get the appropriate output for the query.
When I try to use the where statement to enter the query as a whole string I get an error.
I receive the query as a json:
{"clause": " name = 'Jane' "}
then turn it into a string.
When running
val query = (json \ "clause").get.as[String]
//turns json value into a string
val test = sc.cassandraTable[Person](keyspace,table)
.where(query).collect
I get the following error:
java.io.IOException: Exception during preparation of SELECT "uuid", "person", "age" FROM "test"."users" WHERE token("uuid") > ? AND token("uuid") <= ? AND name = Jane ALLOW FILTERING: line 1:232 no viable alternative at input 'ALLOW' (...<= ? AND name = [Jane] ALLOW...)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:288)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:302)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$18.apply(CassandraTableScanRDD.scala:328)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$18.apply(CassandraTableScanRDD.scala:328)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at com.datastax.spark.connector.util.CountingIterator.foreach(CountingIterator.scala:4)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
I suspect that when I turn the json value " name = 'Jane' " into a string, I lose the single quotes hence I get " name = Jane " which of course raises an error. I tried escaping the single quotes with \ and with a second pair of single quotes around the name Jane {"clause": " name = ''Jane'' "}. It doesn't solve the issue.
Edit: After further testing it's definitely the json that loses the single quotes and CQL needs them to perform the query. Can anyone suggest a way to escape/save the presence of the single quotes? I tried escaping with \ double single quotes '' . Is there a way to use JSON to provide proper whole CQL statements?
Please use Unicode character \u0027.