Sum value from multiple JSON data in postgress sql - json

I have datatype JSON stored on postgress db, e g. column:jsondata.
Each row on db table represent one JSON data of the same format,e.q.
{
"ID": "001",
"Name": "Britney",
"DebtAmount": "100.23"
}
There are multiple records with the above data with different ID.
How do I do JSON query to get the total sum of Debt Amount accross multiple records?
Thanks alot for your help.
If there is already existing solution, please point me to it.

You can use json_each_text() function in order to extract key-value pairs from the JSON object after fixing the current value in order to get a valid JSON value, and apply a SUM() aggregation through filtering out the key names DebtAmount along with applying float convertion for the values such as
SELECT SUM(value::float)
FROM t,
LATERAL json_each_text(js)
WHERE key = 'DebtAmount'
Demo

Related

Aggregating JSON arrays and calculating set union size in MySQL

I have a use case where I need to calculate set overlaps over arbitrary time periods.
My data looks like this, when loaded into pandas. In MySQL the user_ids is stored with the data type JSON.
I need to calculate the size of the union set when grouping by the date column. E.g, in the example below, if 2021-01-31 is grouped with 2021-02-28, then the result should be
In [1]: len(set([46, 44, 14] + [44, 7, 36]))
Out[1]: 5
Doing this in Python is trivial, but I'm struggling with how to do this in MySQL.
Aggregating the arrays into an array of arrays is easy:
SELECT
date,
JSON_ARRAYAGG(user_ids) as uids
FROM mytable
GROUP BY date
but after that I face two problems:
How to flatten the array of arrays into a single array
How to extract distinct values (e.g. convert the array into a set)
Any suggestions? Thank you!
PS. In my case I can probably get by with doing the flattening and set conversion on the client side, but I was pretty surprised at how difficult something simple like this turned out to be... :/
As mentioned in other comments, storing JSON arrays in your database really is sub-optimal and should really be avoided. Aside from that, it actually is easier to first extract the JSON array (and get the result you wanted from your second point):
SELECT mytable.date, jtable.VAL as user_id
FROM mytable, JSON_TABLE(user_ids, '$[*]' COLUMNS(VAL INT PATH '$')) jtable;
From here on out, we can group the dates again and recombine the user_ids into a JSON array with the JSON_ARRAYAGG function you already found:
SELECT mytable.date, JSON_ARRAYAGG(jtable.VAL) as user_ids
FROM mytable, JSON_TABLE(user_ids, '$[*]' COLUMNS(VAL INT PATH '$')) jtable
GROUP BY mytable.date;
You can try this out in this DB fiddle.
NOTE: this does require mysql 8+/mariaDB 10.6+.
Thank you for the answers.
For anybody who's interested, the solution that I ended up with in the end was to store the data like this:
And then do the set calculations in pandas.
(
df.groupby(pd.Grouper(key="date", freq="QS"),).aggregate(
num_unique_users=(
"user_ids",
lambda uids: len(set([_ for ul in uids for _ in ul])),
),
)
)
I was able to reduce a 20GiB table to around 300MiB, which is fast enough to query and retrieve data from.

How to query an array field (AWS Glue)?

I have a table in AWS Glue, and the crawler has defined one field as array.
The content is in S3 files that have a json format.
The table is TableA, and the field is members.
There are a lot of other fields such as strings, booleans, doubles, and even structs.
I am able to query them all using a simpel query such as:
SELECT
content.my_boolean,
content.my_string,
content.my_struct.value
FROM schema.tableA;
The issue is when I add content.members into the query.
The error I get is: [Amazon](500310) Invalid operation: schema "content" does not exist.
Content exists because i am able to select other fiels from the main key in the json (content).
Probably is something related with how to perform the query agains array field in Spectrum.
Any idea?
You have to rename the table to extract the fields from the external schema:
SELECT
a.content.my_boolean,
a.content.my_string,
a.content.my_struct.value
FROM schema.tableA a;
I had the same issue on my data, I really don't know why it needs this cast but it works. If you need to access elements of an array you have to explod it like:
SELECT member.<your-field>,
FROM schema.tableA a, a.content.members as member;
Reference
You need to create a Glue Classifier.
Select JSON as Classifier type
and for the JSON Path input the following:
$[*]
then run your crawler. It will infer your schema and populate your table with the correct fields instead of just one big array. Not sure if this was what you were looking for but figured I'd drop this here just in case others had the same problem I had.

Extract certain members of JSON array in MySQL

I have a table in MySQL where each row contains JSON returned from another system. The JSON will look something like:
[{"userId": "Dave"},{"userId": "Mary", "errorCode" : "DB Fail"}, {"userId": "Lorenza", "errorCode": "Web Error"}]
and I'm only interested in the members of the array containing an error code. In the future, these will be parsed into seperate rows of their own table, but in the meantime does MySql offer a way to extract only these with an errorCode?
I can use JSON_EXTRACT to extract the errorCodes only
JSON_EXTRACT(jsonData, '$[*].errorCode') AS errorCodes
but I really want the rest of the member (userId in the example above)
You could use the JSON_CONTAINS function to find the records with errorCode and then then use JSON_EXTRACT on those records. Put the JSON_CONTAINS in the where clause
I don't think you could do this with a single query without known boundaries of the number of elements, but you could use a stored procedure to run a loop.
e.g. each iteration runs LOCATE to find the position of "errorCode", and uses that location to run SUBSTR and/or SUBSTRING_INDEX to get the userid value and append it to another variable. The looped variable would just be the offset used in the LOCATE query.

How to find the minimum value in a jsonb data using postgres?

I have this jsonb data format-
data
{"20161110" : {"6" : ["12", "11", "171.00"],"12" : ["13", "11", "170.00"],"18" : ["16", "11", "174.00"]}}
I want to find the minimum value out of the prices,
In this case-170.00 .
I have tried indexing but able to find data for specific terms(6,12,18) but not the minimum out of them.
What I have tried-
data::json->(select key from json_each_text(data::json) limit 1))::json#>>'{6,2}'
which gives me result for 6th term that is 171.00
If you want the minimum value of the third element in the arrays, then you will have to unpack the JSON document to get to the array to compare values. That goes somewhat like this (and assuming you indeed have a jsonb column and a primary key called id):
SELECT id, min((arr ->> 2)::numeric) AS min_price
FROM ( SELECT id, jdoc
FROM my_table, jsonb_each(data) d (key, jdoc) ) sub,
jsonb_each(jdoc) doc (key, arr)
GROUP BY 1;
In PostgreSQL there are table functions, functions that return a set of rows, like jsonb_each(). You should use these functions in the FROM list. These table functions can implicitly refer to columns from tables defined earlier in the list, like FROM my_table, jsonb_each(my_table.data), in which case a link between the two sources is made as if a join condition were specified between the two; in practice, the function gets called once for each of the rows of the source table and the function output is added to the list of available columns.
The JSON functions work only on the level of the JSON document that is explicitly specified. That could be the entire document (my_table.data in this case) or down to some path that you can specify. I am assuming here that the first key is a date value and that you therefore do not know the key in advance. The same goes for the sub-document. In these cases you use functions like jsonb_each(). The array position you apparently know exactly, so you can just index the array to find the price information. Note that these are apparently also in JSON format, so you should get the price as a text value with the ->> operator and then cast that to numeric so you can feed it to the min() function.
I created this funciton for this. Hope it helps
CREATE
OR
replace FUNCTION min_json (data json)
returns numeric AS $$
BEGIN
RETURN
(
SELECT Min(value)
FROM (
SELECT (Hstore(Json_each_text(data))->'value')::numeric AS value) AS t);
END;
$$ language plpgsql;

MySQL getting JSON value

In my MySql players table I have a column called achievements and it is a text field which in this particular row has this value:
[
{
"value":11,
"globalID":23000000
},
{
"value":11,
"globalID":23000001
},
{
"value":11,
"globalID":23000002
},
...
{
"value":6044730,
"globalID":23000065
}
]
Near the bottom of the array you can see this object:
{
"value":48,
"globalID":23000062
},
I need to be able to parse the value field and show it as a warhero field. But how can I do this? The globalID will stay the same but the value changes. And because the globalID is after the value value I can't use what was used in this post: https://stackoverflow.com/a/21596032/4942382
What SQL query would I need to run to get that value?
Thanks!
The design of a table does not even meet the first normalisation level if it stores a non-atomic value in a single column, which is the case with this type of JSON encoded values.
Now if you have no access to the JSON functions available to MySql version 5.7+, and your globalID has a fixed number of digits, then you could do some string matching as follows.
For example, if you need the value that goes with globalID 23000062, then you could do this:
SELECT players.*,
CAST(
SUBSTRING_INDEX(
SUBSTRING_INDEX(achievements, '"globalID":23000062', 1),
'"value":',
-1
)
AS UNSIGNED) AS json_extracted_value
FROM players
WHERE INSTR(achievements, '"globalID":23000062') > 0
But really, you should seriously consider redesigning your database.
You should have never saved the JSON in your table, you already broke the rules of relational database design.
mysql cannot make sense of the data, so no SQL query will help you, you have 2 options:
Fix your database design, create tables to hold that data instead of JSON
Fetch the data, decode the JSON, and do all type of manipulations and hacks to get your desired value.