How to write this in SQL Alchemy - sqlalchemy

I need to write a query in SQL Alchemy to check some string parameters against a field that contains an array of string (Postgre)
city
state
address_line_1
zip_code
phone_numbers
are all of type text[]
select_statement = bdb.get_select_statement(business_schema)\
.where(((text('array[:acity] <# city')
and text('array[:astate] <# state')
and text('array[:aaddress] <# address_line_1'))
or
(text('array[:aaddress] <# address_line_1') and text('array[:azip_code] <# zip_code')))
and (text('array[:aphone] <# phone_numbers')))\
.params(aaddress = address_line_1, acity = city, astate = state, azip_code = zip_code, aphone = phone_number)
The problem is that i receive an exception when I do this, "Boolean value of this clause is not defined".
The plain SQL to be written is:
select * from business where ('address_line1' = ANY (address_line_1)
and 'acity' = ANY (city)
and 'state' = ANY (state)
or
('adress_line1' = ANY (address_line_1) and 'zip' = ANY (zip_code))
and
'phone' = ANY (phone_numbers)
Any ideas on how to do it?,
Thanks in advance!

you need to use the and_() and or_() methods, or alternatively the && and || operators, not the Python and and or keywords.
Also, the operations you're doing with array indexing and "<#" are easier to do (in 0.8) like this:
mytable.c.array[:"acity"].op('<#')(mytable.c.city)
see ARRAY.

With SqlAlchemy 0.8, this can be written as:
mytable.c.myarraycolumn.contains(['something'])
Or, with a declrative class:
query.filter(MyTable.myarraycolumn.contains(['something']))
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#sqlalchemy.dialects.postgresql.ARRAY

Here is how we got this to work, in SQLAlchemy 0.9.
SQLAlchemy doesn't know how to convert a Python type into the array element type in these queries. Since our array elements were VARCHAR(256) instead of TEXT, we had to add a cast expression inside the array literal.
session.query.filter(
models.TableClass.arraycolumn.contains(
# contains() generates #>, and #> always takes an array,
# it's more like has-subset
array([
# the array() literal constructor needs an iterable
cast(
'array-element-to-find',
# SQLAlchemy does not know how to convert a Python string
# to an SQL VARCHAR type here
String(256),
)
])
)
).all()
about #> http://www.postgresql.org/docs/9.1/static/functions-array.html
about the array literal http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#sqlalchemy.dialects.postgresql.array
note about cast https://groups.google.com/forum/#!topic/sqlalchemy/5aTmT4rUJo4

Related

Pattern matching using regex with Scala Anorm

I'm Using Scala(2.11) and playframework(2.3) and trying to run a query using a helper function to get results through pattern matching. The function is as follows
def resultsfunc() = {
val gradeRegex = "^Class 5\."
val currRegex = "\.NCERT$"
DB.withConnection{ implicit c =>
val filterQuery = SQL(
"""
select * from tbl_graphs
where graph_name REGEXP '{grade_regex}' and
graph_name REGEXP '{curr_regex}' and org_id = 4
""")
.on("grade_regex" -> gradeRegex,
"curr_regex" -> currRegex)
filterQuery().map{ graphRecord =>
new ResultObj(graphRecord[Long]("id"),
graphRecord[String]("name"))
}.toList
}
}
I don't get any errors but I get empty result even though there are multiple records that match the pattern. The same query works if I try to run in mysql workbench and when I tried to print filterQuery the arguments were also mapped correctly.
Should Pattern matching with regex must be carried out differently in Scala Anorm ?
It has absolutely nothing to do specifically with Anorm.
Make sure that executing manually the query with exactly the same data and parameter, you get result.
When using JDBC (even through Anorm), string parameter must not be quoted in the query statement (... '{grade_regex}' ...).
Since a long time, it's recommended to use Anorm interpolation (SQL"SELECT x FROM y WHERE z = ${v}") rather than SQL(..) function.

Format jinja template as INT in operator parameter in Airflow

I'm trying to format a jinja template parameter as an integer so I can pass it to an operator which expects INT (could be custom or PythonOperator) and I'm not able to.
See sample DAG below. I'm using the built-in Jinja filter | int but that's not working - the type remains <class 'str'>
I'm still new with Airflow but I don't think this is possible based on what I've read about Jinja/Airflow works. I see two main workarounds:
Change the operator parameter to expect string and handle the conversion underneath.
Handle this conversion in a separate PythonOperator which converts the string to an int and export that using xcom/task context. (I think this will work but not sure)
Please let me know of any other workarounds
def greet(mystr):
print (mystr)
print(type(mystr))
default_args = {
'owner': 'airflow',
'start_date': days_ago(2)
}
dag = DAG(
'template_dag',
default_args=default_args,
description='template',
schedule_interval='0 13 * * *'
)
with dag:
# foo = "{{ var.value.my_custom_var | int }}" # from variable
foo = "{{ execution_date.int_timestamp | int }}" # built in macro
# could be MyCustomOperator
opr_greet = PythonOperator(task_id='greet',
python_callable=greet,
op_kwargs={'mystr': foo}
)
opr_greet
Airflow 1.10.11
Updated answer:
As of Airflow 2.1, you can pass render_template_as_native_obj=True to the dag and Airflow will return the Python type (dict, int, etc) instead of string. See this pull request
dag = DAG(
dag_id="example_template_as_python_object",
schedule_interval=None,
start_date=days_ago(2),
render_template_as_native_obj=True,
)
Old answer for prior versions:
I found a related question that provides the best workaround, IMO.
Airflow xcom pull only returns string
The trick is to use a PythonOperator, do the datatype conversion there and then call the main operator with the parameter. Below is an example in converting a json string to a dict object. Same can apply for converting string to int, etc.
def my_func(ds, **kwargs):
ti = kwargs['ti']
body = ti.xcom_pull(task_ids='privious_task_id')
import_body= json.loads(body)
op = CloudSqlInstanceImportOperator(
project_id=GCP_PROJECT_ID,
body= import_body,
instance=INSTANCE_NAME,
gcp_conn_id='postgres_default',
task_id='sql_import_task',
validate_body=True,
)
op.execute()
p = PythonOperator(task_id='python_task', python_callable=my_func)
I believe Jinja is going always to return you a string: the template is a string and replacing values inside the template will return you a string.
If you are sure that foo is always an integer, you can do
opr_greet = PythonOperator(task_id='greet',
python_callable=greet,
op_kwargs={'mystr': int(foo)}
)
Update: it looks like Airflow uses the render method from Jinja2, which returns a Unicode string.
At this point, if you can modify greet, it is easier to manage the input parameter in that function.

Django / PostgresQL jsonb (JSONField) - convert select and update into one query

Versions: Django 1.10 and Postgres 9.6
I'm trying to modify a nested JSONField's key in place without a roundtrip to Python. Reason is to avoid race conditions and multiple queries overwriting the same field with different update.
I tried to chain the methods in the hope that Django would make a single query but it's being logged as two:
Original field value (demo only, real data is more complex):
from exampleapp.models import AdhocTask
record = AdhocTask.objects.get(id=1)
print(record.log)
> {'demo_key': 'original'}
Query:
from django.db.models import F
from django.db.models.expressions import RawSQL
(AdhocTask.objects.filter(id=25)
.annotate(temp=RawSQL(
# `jsonb_set` gets current json value of `log` field,
# take a the nominated key ("demo key" in this example)
# and replaces the value with the json provided ("new value")
# Raw sql is wrapped in triple quotes to avoid escaping each quote
"""jsonb_set(log, '{"demo_key"}','"new value"', false)""",[]))
# Finally, get the temp field and overwrite the original JSONField
.update(log=F('temp’))
)
Query history (shows this as two separate queries):
from django.db import connection
print(connection.queries)
> {'sql': 'SELECT "exampleapp_adhoctask"."id", "exampleapp_adhoctask"."description", "exampleapp_adhoctask"."log" FROM "exampleapp_adhoctask" WHERE "exampleapp_adhoctask"."id" = 1', 'time': '0.001'},
> {'sql': 'UPDATE "exampleapp_adhoctask" SET "log" = (jsonb_set(log, \'{"demo_key"}\',\'"new value"\', false)) WHERE "exampleapp_adhoctask"."id" = 1', 'time': '0.001'}]
It would be much nicer without RawSQL.
Here's how to do it:
from django.db.models.expressions import Func
class ReplaceValue(Func):
function = 'jsonb_set'
template = "%(function)s(%(expressions)s, '{\"%(keyname)s\"}','\"%(new_value)s\"', %(create_missing)s)"
arity = 1
def __init__(
self, expression: str, keyname: str, new_value: str,
create_missing: bool=False, **extra,
):
super().__init__(
expression,
keyname=keyname,
new_value=new_value,
create_missing='true' if create_missing else 'false',
**extra,
)
AdhocTask.objects.filter(id=25) \
.update(log=ReplaceValue(
'log',
keyname='demo_key',
new_value='another value',
create_missing=False,
)
ReplaceValue.template is the same as your raw SQL statement, just parametrized.
(jsonb_set(log, \'{"demo_key"}\',\'"another value"\', false)) from your query is now jsonb_set("exampleapp.adhoctask"."log", \'{"demo_key"}\',\'"another value"\', false). The parentheses are gone (you can get them back by adding it to the template) and log is referenced in a different way.
Anyone interested in more details regarding jsonb_set should have a look at table 9-45 in postgres' documentation: https://www.postgresql.org/docs/9.6/static/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE
Rubber duck debugging at its best - in writing the question, I've realised the solution. Leaving the answer here in hope of helping someone in future:
Looking at the queries, I realised that the RawSQL was actually being deferred until query two, so all I was doing was storing the RawSQL as a subquery for later execution.
Solution:
Skip the annotate step altogether and use the RawSQL expression straight into the .update() call. Allows you to dynamically update PostgresQL jsonb sub-keys on the database server without overwriting the whole field:
(AdhocTask.objects.filter(id=25)
.update(log=RawSQL(
"""jsonb_set(log, '{"demo_key"}','"another value"', false)""",[])
)
)
> 1 # Success
print(connection.queries)
> {'sql': 'UPDATE "exampleapp_adhoctask" SET "log" = (jsonb_set(log, \'{"demo_key"}\',\'"another value"\', false)) WHERE "exampleapp_adhoctask"."id" = 1', 'time': '0.001'}]
print(AdhocTask.objects.get(id=1).log)
> {'demo_key': 'another value'}

Spark UDF returns a length of field instead of length of value

Consider the code below
object SparkUDFApp {
def main(args: Array[String]) {
val df = ctx.read.json(".../example.json")
df.registerTempTable("example")
val fn = (_: String).length // % 10
ctx.udf.register("len10", fn)
val res0 = ctx sql "SELECT len10('id') FROM example LIMIT 1" map {_ getInt 0} collect
println(res0.head)
}
}
JSON example
{"id":529799371026485248,"text":"Example"}
The code should return a length of the field value from JSON (e.g. 'id' has value 18). But instead of returning '18' it returns '2', which is the length of 'id' I suppose.
So my question is how to rewrite UDF to fix it?
The problem is that you are passing the string id as a literal to your UDF so it is interpreted as one instead of a column (notice that it has 2 letters this is why it returns such number). To solve this just change the way how you formulate the SQL query.
E.g.
val res0 = ctx sql "SELECT len10(id) FROM example LIMIT 1" map {_ getInt 0} collect
// Or alternatively
val len10 = udf(word => word.length)
df.select(len10(df("id")).as("length")).show()

Order of fields in a type for FileHelpers

I'm reading a simple CSV file with Filehelpers - the file is just a key, value pair. (string, int64)
The f# type I've written for this is:
type MapVal (key:string, value:int64) =
new()= MapVal("",0L)
member x.Key = key
member x.Value = value
I'm missing something elementary here, but FileHelpers always assumes the order of fields to be the reverse of what I've specified - as in Value, Key.
let dfe = new DelimitedFileEngine(typeof<MapVal>)
let recs = dfe.ReadFile(#"D:\e.dat")
recs |> Seq.length
What am I missing here?
The order of primary constructor parameters doesn't necessarily determine the order that fields occur within a type (in fact, depending on how the parameters are used, they may not even result in a field being generated). The fact that FileHelpers doesn't provide a way to use properties instead of fields is unforunate, in my opinion. If you want better control over the physical layout of the class, you'll need to declare the fields explicitly:
type MapVal =
val mutable key : string
val mutable value : int64
new() = { key = ""; value = 0L }
new(k, v) = { key = k; value = v }
member x.Key = x.key
member x.Value = x.value
The library uses the order of the field in the declaration, but looks like that F# words different, in the last the last stable version of the library you can use the [FieldOrder(1)] attribute to provide the order of the fields.
http://teamcity.codebetter.com/viewLog.html?buildId=lastSuccessful&buildTypeId=bt66&tab=artifacts&guest=1
Cheers