Postgres query on json with empty value - json

I have a query in JSON to filter out the data based on data present inside JSON field .
Table name: audit_rules
Column name: rule_config (json)
rule_config contains JSON which contain 'applicable_category' as an attribute in it.
Example
{
"applicable_category":[
{
"status":"active",
"supported":"yes",
"expense_type":"Meal",
"acceptable_variation":0.18,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Car Rental",
"acceptable_variation":0.0,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Airfare",
"acceptable_variation":0.0,
"minimum_value":75
},
{
"status":"active",
"supported":"yes",
"expense_type":"Hotel",
"acceptable_variation":0.0,
"minimum_value":75
}
],
"minimum_required_keys":[
"amount",
"date",
"merchant",
"location"
],
"value":[
0,
0.5
]
}
But some of the rows doesn't have any data or doesn't have the 'applicable_category' attribute in it.
So while running following query i am getting error:
select s.*,j from
audit_rules s
cross join lateral json_array_elements ( s.rule_config#>'{applicable_category}' ) as j
WHERE j->>'expense_type' in ('Direct Bill');
Error: SQL Error [22023]: ERROR: cannot call json_array_elements on a scalar

You can restrict the result to only rows that contain an array:
select j.*
from audit_rules s
cross join lateral json_array_elements(s.rule_config#>'{applicable_category}') as j
WHERE json_typeof(s.rule_config -> 'applicable_category') = 'array'
and j ->> 'expense_type' in ('Meal')

Related

JSON object: Query a value from unkown node based on a condition

I'm trying to query two values (DISCOUNT_TOTAL and ITEM_TOTAL) from a JSON object in a PostgreSQL database. Take the following query as reference:
SELECT
mt.customer_order
totals -> 0 -> 'amount' -> centAmount DISCOUNT_TOTAL
totals -> 1 -> 'amount' -> centAmount ITEM_TOTAL
FROM
my_table mt
to_jsonb(my_table.my_json -> 'data' -> 'order' -> 'totals') totals
WHERE
mt.customer_order in ('1000001', '1000002')
The query code works just fine, the big problem is that, for some reason out of my control, the values DISCOUNT_TOTAL and ITEM_TOTAL some times change their positions in the JSON object from one customer_order to other:
JSON Object
So i cannot aim to totals -> 0 -> 'amount' -> centAmount assuming that it contains the value related to type : DISCOUNT_TOTAL (same for type: ITEM_TOTAL). Is there any work around to get the correct centAmount for each type?
Use a path query instead of hardcoding the array positions:
with sample (jdata) as (
values (
'{
"data": {
"order": {
"email": "something",
"totals": [
{
"type": "ITEM_TOTAL",
"amount": {
"centAmount": 14990
}
},
{
"type": "DISCOUNT_TOTAL",
"amount": {
"centAmount": 6660
}
}
]
}
}
}'::jsonb)
)
select jsonb_path_query_first(
jdata,
'$.data.order.totals[*] ? (#.type == "DISCOUNT_TOTAL").amount.centAmount'
) as discount_total,
jsonb_path_query_first(
jdata,
'$.data.order.totals[*] ? (#.type == "ITEM_TOTAL").amount.centAmount'
) as item_total
from sample;
db<>fiddle here
EDIT: In case your PostgreSQL version does not support json path queries, you can do it by expanding the array into rows and then doing a pivot by case and sum:
with sample (order_id, jdata) as (
values ( 1,
'{
"data": {
"order": {
"email": "something",
"totals": [
{
"type": "ITEM_TOTAL",
"amount": {
"centAmount": 14990
}
},
{
"type": "DISCOUNT_TOTAL",
"amount": {
"centAmount": 6660
}
}
]
}
}
}'::jsonb)
)
select order_id,
sum(
case
when el->>'type' = 'DISCOUNT_TOTAL' then (el->'amount'->'centAmount')::int
else 0
end
) as discount_total,
sum(
case
when el->>'type' = 'ITEM_TOTAL' then (el->'amount'->'centAmount')::int
else 0
end
) as item_total
from sample
cross join lateral jsonb_array_elements(jdata->'data'->'order'->'totals') as a(el)
group by order_id;
db<>fiddle here

SQL JSON Array, extract field from each element items into concatenated string

I've got a JSON column containing an array of items:
[
{
"type": "banana"
},
{
"type": "apple"
},
{
"type": "orange"
}
]
I want to select one column with a concatenated type, resulting in 'banana, apple, orange'.
Thanks,
David
You need to parse and aggregate the stored JSON:
SELECT
JsonColumn,
NewColumn = (
SELECT STRING_AGG(JSON_VALUE([value], '$.type'), ',')
WITHIN GROUP (ORDER BY CONVERT(int, [key]))
FROM OPENJSON(t.JsonColumn)
)
FROM (VALUES
('[{"type":"banana"},{"type":"apple"},{"type":"orange"}]')
) t (JsonColumn)
Result:
JsonColumn
NewColumn
[{"type":"banana"},{"type":"apple"},{"type":"orange"}]
banana,apple,orange

How to query nested array with heterogeneous elements in PostgreSQL JSONB column

I have a JSONB field in PostgreSQL (12.5) table Data_Source with the data like that inside:
{
"C1": [
{
"id": 13371,
"class": "class_A1",
"inputs": {
"input_A1": 403096
},
"outputs": {
"output_A1": 403097
}
},
{
"id": 10200,
"class": "class_A2",
"inputs": {
"input_A2_1": 403096,
"input_A2_2": 403095
},
"outputs": {
"output_A2": [
[
403098,
{
"output_A2_1": 403101
},
{
"output_A2_2": [
403099,
403100
]
}
]
],
"output_A2_3": 403102,
"output_A2_4": 403103,
"output_A2_5": 403104
}
}
]
}
Could you please suggest me some SQL query to extract outputs from the JSONB field.
What I need to get as a results:
Output:
name
value
output_A1
403096
output_A2
403098
output_A2_1
403101
output_A2_2
403099
output_A2_2
403100
output_A2_3
403102
output_A2_4
403103
output_A2_5
403104
Any ideas?
Whenever an array is encountered, then JSONB_ARRAY_ELEMENTS(), or an object is encountered, then JSONB_EACH() functions might be applied, along with auxiliary JSONB_TYPEOF() function to determine respective types, consecutively. At the end, combine the results whether of type array or object or not through use of UNION ALL such as
WITH j AS
(
SELECT j2.*, JSONB_TYPEOF(j2.value) AS type
FROM t,
JSONB_EACH(jsdata) AS j0(k,v),
JSONB_ARRAY_ELEMENTS(v) AS j1,
JSONB_EACH((j1.value ->> 'outputs')::JSONB) AS j2
), jj AS
(
SELECT key,j1.*,JSONB_TYPEOF(j1.value::JSONB) AS type
FROM j,
JSONB_ARRAY_ELEMENTS(value) AS j0(v),
JSONB_ARRAY_ELEMENTS(v) AS j1
WHERE type = 'array'
), jjj AS
(
SELECT key,j0.v,JSONB_TYPEOF(j0.v::JSONB) AS type,k
FROM jj,
JSONB_EACH(value) AS j0(k,v)
WHERE type IN ('array','object')
)
SELECT key,value
FROM
(
SELECT key,value,type
FROM j
UNION ALL
SELECT key,value,type
FROM jj
UNION ALL
SELECT k,v,type
FROM jjj
) jt
WHERE type NOT IN ('array','object')
UNION ALL
SELECT k,value
FROM jjj,JSONB_ARRAY_ELEMENTS(v) AS j0
WHERE type IN ('array','object')
Demo

How To Merge Multiple PostgreSQL JSON Arrays That Are The Result Of Another JSON Array Update On All Of Its Elements?

Given the following two table columns jsonb type:
dividend_actual
{
"dividends": [
{
"amount": "2.9800",
"balanceDate": "2020-06-30T00:00:00Z"
},
{
"amount": "4.3100",
"balanceDate": "2019-06-30T00:00:00Z"
}
],
"lastUpdated": "2020-11-16T14:50:51.289649512Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
dividend_forecast
{
"dividends": [
{
"amount": "2.3035",
"balanceDate": "2021-06-01T00:00:00Z"
},
{
"amount": "3.0452",
"balanceDate": "2022-06-01T00:00:00Z"
},
{
"amount": "3.1845",
"balanceDate": "2023-06-01T00:00:00Z"
}
],
"lastForecasted": "2020-11-13T00:00:00Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
I would like to merge both dividends arrays from dividend_actual and dividend_forecast, but before merging them I want to add an extra field (forecast) on every single object.
I did try the following:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
(f.dividend_forecast->'dividends' || jsonb '{"forecast": true}') ||
(f.dividend_actual->'dividends' || jsonb '{"forecast": false}')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
The above query gives me the following results:
{"forecast":true}
{"forecast":false}
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z"}
But what I need instead is the following output:
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z","forecast":true}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z","forecast":true}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z","forecast":true}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z","forecast":false}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z","forecast":false}
It turns out that it is not possible to update multiple jsons objects within a json array in a single operation by default.
To be able to do that a Postgres function needs to be created:
-- the params are the same as in aforementioned `jsonb_set`
CREATE OR REPLACE FUNCTION update_json_array_elements(target jsonb, path text[], new_value jsonb)
RETURNS jsonb language sql AS $$
-- aggregate the jsonb from parts created in LATERAL
SELECT jsonb_agg(updated_jsonb)
-- split the target array to individual objects...
FROM jsonb_array_elements(target) individual_object,
-- operate on each object and apply jsonb_set to it. The results are aggregated in SELECT
LATERAL jsonb_set(individual_object, path, new_value) updated_jsonb
$$;
The above function was suggested by kubak in this answer: https://stackoverflow.com/a/53712268/782390
Combined with this query:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
update_json_array_elements(f.dividend_forecast->'dividends', '{forecast}', 'true') ||
update_json_array_elements(f.dividend_actual->'dividends', '{forecast}', 'false')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
I then get the following output, that it is exactly what I need:
{"amount":"3.1845","forecast":true,"balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","forecast":true,"balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","forecast":true,"balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","forecast":false,"balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","forecast":false,"balanceDate":"2019-06-30T00:00:00Z"}

Issue in JOIN query in apache drill

File stored in Hive:
[
{
"occupation": "guitarist",
"fav_game": "football",
"name": "d1"
},
{
"occupation": "dancer",
"fav_game": "chess",
"name": "k1"
},
{
"occupation": "traveller",
"fav_game": "cricket",
"name": "p1"
},
{
"occupation": "drummer",
"fav_game": "archery",
"name": "d2"
},
{
"occupation": "farmer",
"fav_game": "cricket",
"name": "k2"
},
{
"occupation": "singer",
"fav_game": "football",
"name": "s1"
}
]
CSV file in hadoop:
name,age,city
d1,23,delhi
k1,23,indore
p1,23,blore
d2,25,delhi
k2,30,delhi
s1,25,delhi
I queried them individually, it's working fine. Then, I tried join query:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on d.name = u.name
I got the following issue:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error Fragment 0:0 [Error Id: b01db9c8-fb35-4ef8-a1c0-31b68ff7ae8d on IMPETUS-DSRV03.IMPETUS.CO.IN:31010]
Please refer this https://drill.apache.org/docs/data-type-conversion/
We need to do explicit typecasting to deal with such scenario.
Consider we have a JSON file employee.json and a csv file sample.csv. In order to query on both at the same time , in one query we need to do type casting.
0: jdbc:drill:zk=local> select emp.employee_id, dept.department_description, phy.columns[2], phy.columns[3] FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/tmp/sample.csv` phy where CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
Here we are typecasting CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) so that equality does not fail.
Refer this for more detail:- http://www.devinline.com/2015/11/apache-drill-setup-and-SQL-query-execution.html#multiple_src
You need to cast even though by default it has taken varchar. Try this:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on cast(d.name as VARCHAR) = cast(u.name as VARCHAR)
But you cannot refer to column name directly from csv. you need to consider columns[0] for name.