sqlalchemy/postgres: arithmetic on JSON fields? - json

I am using sqlalchemy on a postgres database, and I'm trying to do arithmetic in a SELECT on two JSON fields which represent floats. However, I have not figured out how to make this work.
Assume I have properly defined a table called transactions which contains a JSON column called cost_data, and assume that this JSON structure contains two attributes called cost and subtotal which represent float values.
In a SELECT statement, I generate the sum of those two fields as follows:
(cast(transactions.c.cost_data['subtotal'], sqlalchemy.Float) + cast(transactions.c.cost_data['cost'], sqlalchemy.Float)).label('total_cost')
This generates the following SQL fragment ...
CAST((transactions.cost_data -> %(cost_data_6)s) AS FLOAT) + CAST((transactions.cost_data -> %(cost_data_7)s) AS FLOAT) AS total_cost
(where cost_data_6 and cost_data_7 get set to subtotal and cost, respectively).
However, I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) cannot cast type json to double precision
If I remove the casting and do it as follows, it also fails ...
(transactions.c.cost_data['subtotal'] + transactions.c.cost_data['cost']).label('total_cost')
I get this error ...
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) operator does not exist: json || json
LINE 9: ... (transactions.cost_data -> 'subtotal') || (transa...
^
Clearly, this is because the fields come in as strings, and the "+" operator gets interpreted as string concatenation.
Also, if I use the Python float operator, it also fails ...
(float(transactions.c.cost_data['subtotal']) + float(transactions.c.cost_data['cost'])).label('total_cost')
The python interpreter doesn't even execute the code, and it gives this error:
TypeError: float() argument must be a string or a number, not 'BinaryExpression'
So how can I perform the addition of those two fields using sqlalchemy?
PS: the following is a typical cost_data column value ...
{"cost":3.99,"subtotal":12.34}

OK. I finally figured it out. I have to pass each reference through the astext operator before applying cast, as follows ...
(transactions.c.cost_data['subtotal'].astext.cast(sqlalchemy.Float) + transactions.c.cost_data['cost'].astext.cast(sqlalchemy.Float)).label('total_cost')

The accepted answer works when only using SQLAlchemy with PostgreSQL, but it is not portable to other databases that have JSON types.
If you want to write code that is portable between PostgreSQL, MySQL, and SQLite, then you should cast a JSON column using one of the column's as_*() methods.
OP's answer would look like:
transactions.c.cost_data['subtotal'].as_float()
The methods .as_float(), .as_string(), .as_boolean(), and .as_integer() are available on SQLAlchemy JSON columns since SQLAlchemy version 1.3.11. More documentation here.

Related

How to extract values from a numeric-keyed nested JSON field in MySQL

I have a MySQL table with a JSON column called sent. The entries in the column have information like below:
{
"data": {
"12":"1920293"
}
}
I'm trying to use the mysql query:
select sent->"$.data.12" from mytable
but I get an exception:
Invalid JSON path expression. The error is around character position 9.
Any idea How I can extract the information? The query works fine for non-numeric subfields.
#Ibrahim,
You have an error in your code. If you use number (or spaced words) as key in a JSON data type in MySQL, you'll need to double-quote it.
Therefore, the correct MySQL statement in your case is:
select sent->'$.data."12"' FROM mytable;
Thanks,
#JeffreyKilelo

MySQL 5.7 - Query to set the value of a JSON key to a JSON Object

Using MySQL 5.7, how to set the value of a JSON key in a JSON column to a JSON object rather than a string.
I used this query:
SELECT json_set(profile, '$.twitter', '{"key1":"val1", "key2":"val2"}')
from account WHERE id=2
Output:
{"twitter": "{\"key1\":\"val1\", \"key2\":\"val2\"}", "facebook": "value", "googleplus": "google_val"}
But it seems like it considers it as a string since the output escapes the JSON characters in it. Is it possible to do that without using JSON_OBJECT()?
There's a couple of options that I know of:
Use the JSON_UNQUOTE function to unquote the output (ie not cast it to string) as documented here
Possibly use the ->> operator and select a specific path, documented here
Has a lot of implications, but you could disable backslashes as an escape character. I haven't tried this, so I don't even know if that works, but it's mentioned in the docs
On balance, I'd either use the ->> operator, or handle the conversion on the client side, depending on what you want to do.

How to aggregate (min/max etc.) over Django JSONField data?

I'm using Django 1.9 with its built-in JSONField and Postgres 9.4.
In my model's attrs json field I store objects with some values, including numbers. And I need to aggregate over them to find min/max values.
Something like this:
Model.objects.aggregate(min=Min('attrs__my_key'))
Also, it would be useful to extract specific keys:
Model.objects.values_list('attrs__my_key', flat=True)
The above queries fail with
FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted."
Is it possible somehow?
Notes:
I know how to make a plain Postgres query to do the job, but am searching specifically for an ORM solution to have the ability to filter etc.
I suppose this can be done with a (relatively) new query expressions/lookups API, but I haven't studied it yet.
From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform instead of RawSQL .
In django 1.10 you have to copy/paste KeyTransform to you own KeyTextTransform and replace the -> operator with ->> and #> with #>> so it returns text instead of json objects.
Model.objects.annotate(
val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')
You can even include KeyTextTransforms in SearchVectors for full text search
Model.objects.annotate(
search=SearchVector(
KeyTextTransform('jsonb_text_field_key', 'json_field'))
)
).filter(search='stuff I am searching for')
Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.
For those who interested, I've found the solution (or workaround at least).
from django.db.models.expressions import RawSQL
Model.objects.annotate(
val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')
Note that attrs->>%s expression will become smth like attrs->>'width' after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.
/// A little bit offtopic ///
And one more tricky issue not related to django itself but that is needed to be handled somehow. As attrs is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width key. In this case you will get DataError from postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's ok. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.
create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
return cast($1 as numeric);
exception
when invalid_text_representation then
return null;
end;
$$ language plpgsql immutable;
And then use it to cast text to numbers:
Model.objects.annotate(
val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')
Thus we get quite solid solution for such a dynamic thing as json.
I know this is a bit late (several months) but I came across the post while trying to do this. Managed to do it by:
1) using KeyTextTransform to convert the jsonb value to text
2) using Cast to convert it to integer, so that the SUM works:
q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField())) \
.aggregate(Sum('numeric_val'))
print(q)
where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.
Hope this helps somebody!
It is possible to do this using a Postgres function
https://www.postgresql.org/docs/9.5/functions-json.html
from django.db.models import Func, F, FloatField
from django.db.models.expressions import Value
from django.db.models.functions import Cast
text = Func(F(json_field), Value(json_key), function='jsonb_extract_path_text')
floatfield = Cast(text, FloatField())
Model.objects.aggregate(min=Min(floatfield))
This is much better than using the RawQuery because it doesn't break if you do a more complex query, where Django uses aliases and where there are field name collisions. There is so much going on with the ORM that can bite you with hand written implementations.
Since Django 3.1 the KeyTextTransform function on a JSON field works for all database backends. It maps to the ->> operator in Postgres.
It can be used to annotate a specific JSON value inside a JSONField on the queryset results before you aggregate it. A more clear example how to utilize this:
First we need to annotate the key you want to aggregate. So if you have a Django model with a JSONField named data and the JSON containing looks like this:
{
"age": 43,
"name" "John"
}
You would annotate the queryset as following:
from django.db.models import IntegerField
from django.db.models.fields.json import KeyTextTransform
qs = Model.objects.annotate(
age=Cast(
KeyTextTransform("age", "data"), models.IntegerField()
)
The Cast is needed to stay compatible with all database backend.
Now you can aggregate to your liking:
from django.db.models import Min, Max, Avg, IntegerField
from django.db.models.functions import Cast, Round
qs.aggregate(
min_age=Round(Min("age")),
max_age=Round(Max("age")),
avg_age=Cast(Round(Avg("age")), IntegerField()),
)
>>> {'min_age': 25, 'max_age' 82:, 'avg_age': 33}
Seems there is no native way to do it.
I worked around like this:
my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)
This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).
from django.db.models.functions import Cast
from django.db.models import Max, Min
qs = Model.objects.annotate(
val=Cast('attrs__key', FloatField())
).aggregate(
min=Min("val"),
max=Max("val")
)

Handling Unicode sequences in postgresql

I have some JSON data stored in a JSON (not JSONB) column in my postgresql database (9.4.1). Some of these JSON structures contain unicode sequences in their attribute values. For example:
{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }
When I try to query this JSON column (even if I'm not directly trying to access the device_name attribute), I get the following error:
ERROR: unsupported Unicode escape sequence
Detail: \u0000 cannot be converted to text.
You can recreate this error by executing the following command on a postgresql server:
select '{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }'::json->>'client_id'
The error makes sense to me - there is simply no way to represent the unicode sequence NULL in a textual result.
Is there any way for me to query the same JSON data without having to perform "sanitation" on the incoming data? These JSON structures change regularly so scanning a specific attribute (device_name in this case) would not be a good solution since there could easily be other attributes that might hold similar data.
After some more investigations, it seems that this behavior is new for version 9.4.1 as mentioned in the changelog:
...Therefore \u0000 will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store \u0000 in json columns so long as no processing is done on the values...
Was this really the intention? Is a downgrade to pre 9.4.1 a viable option here?
As a side note, this property is taken from the name of the client's mobile device - it's the user that entered this text into the device. How on earth did a user insert NULL and REPLACEMENT CHARACTER values?!
\u0000 is the one Unicode code point which is not valid in a string. I see no other way than to sanitize the string.
Since json is just a string in a specific format, you can use the standard string functions, without worrying about the JSON structure. A one-line sanitizer to remove the code point would be:
SELECT (regexp_replace(the_string::text, '\\u0000', '', 'g'))::json;
But you can also insert any character of your liking, which would be useful if the zero code point is used as some form of delimiter.
Note also the subtle difference between what is stored in the database and how it is presented to the user. You can store the code point in a JSON string, but you have to pre-process it to some other character before processing the value as a json data type.
The solution by Patrick didn't work out of the box for me. Regardless there was always an error thrown. I then researched a little more and was able to write a small custom function that fixed the issue for me.
First I could reproduce the error by writing:
select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails
Then I added a custom function which I used in my query:
CREATE OR REPLACE FUNCTION null_if_invalid_string(json_input JSON, record_id UUID)
RETURNS JSON AS $$
DECLARE json_value JSON DEFAULT NULL;
BEGIN
BEGIN
json_value := json_input ->> 'location';
EXCEPTION WHEN OTHERS
THEN
RAISE NOTICE 'Invalid json value: "%". Returning NULL.', record_id;
RETURN NULL;
END;
RETURN json_input;
END;
$$ LANGUAGE plpgsql;
To call the function do this. You should not receive an error.
select null_if_invalid_string('{ "a": "null \u0000 escape" }', id) from my_table
Whereas this should return the json as expected:
select null_if_invalid_string('{ "a": "null" }', id) from my_table
You can fix all entries with SQL like this:
update ___MY_TABLE___
set settings = REPLACE(settings::text, '\u0000', '' )::json
where settings::text like '%\u0000%'
I found solution that works for me
SELECT (regexp_replace(the_string::text, '(?<!\\)\\u0000', '', 'g'))::json;
Note the match pattern '(?<!\)\u0000'.
Just for websearchers, who strand here:
This is not a solution to the exact question, but in some similar cases the solution, if you just don't want those datasets containing nullbytes in your json. Just add:
AND json NOT LIKE '%\u0000%'
in your WHERE statement.
You could also use the REPLACE SQL-syntax to sanitize the data:
REPLACE(source_field, '\u0000', '' );

Get data type of JSON field in Postgres

I have a Postgres JSON column where some columns have data like:
{"value":90}
{"value":99.9}
...whereas other columns have data like:
{"value":"A"}
{"value":"B"}
The -> operator (i.e. fields->'value') would cast the value to JSON, whereas the ->> operator (i.e. fields->>'value') casts the value to text, as reported by pg_typeof. Is there a way to find the "actual" data type of a JSON field?
My current approach would be to use Regex to determine whether the occurrence of fields->>'value' in fields::text is surrounded by double quotes.
Is there a better way?
As #pozs mentioned in comment, from version 9.4 there are available json_typeof(json) and jsonb_typeof(jsonb) functions
Returns the type of the outermost JSON value as a text string. Possible types are object, array, string, number, boolean, and null.
https://www.postgresql.org/docs/current/functions-json.html
Applying to your case, an example of how this could be used for this problem:
SELECT
json_data.key,
jsonb_typeof(json_data.value) AS json_data_type,
COUNT(*) AS occurrences
FROM tablename, jsonb_each(tablename.columnname) AS json_data
GROUP BY 1, 2
ORDER BY 1, 2;
I ended up getting access to PLv8 in my environment, which made this easy:
CREATE FUNCTION value_type(fields JSON) RETURNS TEXT AS $$
return typeof fields.value;
$$ LANGUAGE plv8;
As mentioned in the comments, there will be a native function for this in 9.4.