I have a query in JSON to filter out the data based on data present inside JSON field .
Table name: audit_rules
Column name: rule_config (json)
rule_config contains JSON which contain 'applicable_category' as an attribute in it.
Example
{
"applicable_category":[
{
"status":"active",
"supported":"yes",
"expense_type":"Meal",
"acceptable_variation":0.18,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Car Rental",
"acceptable_variation":0.0,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Airfare",
"acceptable_variation":0.0,
"minimum_value":75
},
{
"status":"active",
"supported":"yes",
"expense_type":"Hotel",
"acceptable_variation":0.0,
"minimum_value":75
}
],
"minimum_required_keys":[
"amount",
"date",
"merchant",
"location"
],
"value":[
0,
0.5
]
}
But some of the rows doesn't have any data or doesn't have the 'applicable_category' attribute in it.
So while running following query i am getting error:
select s.*,j from
audit_rules s
cross join lateral json_array_elements ( s.rule_config#>'{applicable_category}' ) as j
WHERE j->>'expense_type' in ('Direct Bill');
Error: SQL Error [22023]: ERROR: cannot call json_array_elements on a scalar
You can restrict the result to only rows that contain an array:
select j.*
from audit_rules s
cross join lateral json_array_elements(s.rule_config#>'{applicable_category}') as j
WHERE json_typeof(s.rule_config -> 'applicable_category') = 'array'
and j ->> 'expense_type' in ('Meal')
Related
I'm trying to query two values (DISCOUNT_TOTAL and ITEM_TOTAL) from a JSON object in a PostgreSQL database. Take the following query as reference:
SELECT
mt.customer_order
totals -> 0 -> 'amount' -> centAmount DISCOUNT_TOTAL
totals -> 1 -> 'amount' -> centAmount ITEM_TOTAL
FROM
my_table mt
to_jsonb(my_table.my_json -> 'data' -> 'order' -> 'totals') totals
WHERE
mt.customer_order in ('1000001', '1000002')
The query code works just fine, the big problem is that, for some reason out of my control, the values DISCOUNT_TOTAL and ITEM_TOTAL some times change their positions in the JSON object from one customer_order to other:
JSON Object
So i cannot aim to totals -> 0 -> 'amount' -> centAmount assuming that it contains the value related to type : DISCOUNT_TOTAL (same for type: ITEM_TOTAL). Is there any work around to get the correct centAmount for each type?
Use a path query instead of hardcoding the array positions:
with sample (jdata) as (
values (
'{
"data": {
"order": {
"email": "something",
"totals": [
{
"type": "ITEM_TOTAL",
"amount": {
"centAmount": 14990
}
},
{
"type": "DISCOUNT_TOTAL",
"amount": {
"centAmount": 6660
}
}
]
}
}
}'::jsonb)
)
select jsonb_path_query_first(
jdata,
'$.data.order.totals[*] ? (#.type == "DISCOUNT_TOTAL").amount.centAmount'
) as discount_total,
jsonb_path_query_first(
jdata,
'$.data.order.totals[*] ? (#.type == "ITEM_TOTAL").amount.centAmount'
) as item_total
from sample;
db<>fiddle here
EDIT: In case your PostgreSQL version does not support json path queries, you can do it by expanding the array into rows and then doing a pivot by case and sum:
with sample (order_id, jdata) as (
values ( 1,
'{
"data": {
"order": {
"email": "something",
"totals": [
{
"type": "ITEM_TOTAL",
"amount": {
"centAmount": 14990
}
},
{
"type": "DISCOUNT_TOTAL",
"amount": {
"centAmount": 6660
}
}
]
}
}
}'::jsonb)
)
select order_id,
sum(
case
when el->>'type' = 'DISCOUNT_TOTAL' then (el->'amount'->'centAmount')::int
else 0
end
) as discount_total,
sum(
case
when el->>'type' = 'ITEM_TOTAL' then (el->'amount'->'centAmount')::int
else 0
end
) as item_total
from sample
cross join lateral jsonb_array_elements(jdata->'data'->'order'->'totals') as a(el)
group by order_id;
db<>fiddle here
I've got a JSON column containing an array of items:
[
{
"type": "banana"
},
{
"type": "apple"
},
{
"type": "orange"
}
]
I want to select one column with a concatenated type, resulting in 'banana, apple, orange'.
Thanks,
David
You need to parse and aggregate the stored JSON:
SELECT
JsonColumn,
NewColumn = (
SELECT STRING_AGG(JSON_VALUE([value], '$.type'), ',')
WITHIN GROUP (ORDER BY CONVERT(int, [key]))
FROM OPENJSON(t.JsonColumn)
)
FROM (VALUES
('[{"type":"banana"},{"type":"apple"},{"type":"orange"}]')
) t (JsonColumn)
Result:
JsonColumn
NewColumn
[{"type":"banana"},{"type":"apple"},{"type":"orange"}]
banana,apple,orange
I have a JSONB field in PostgreSQL (12.5) table Data_Source with the data like that inside:
{
"C1": [
{
"id": 13371,
"class": "class_A1",
"inputs": {
"input_A1": 403096
},
"outputs": {
"output_A1": 403097
}
},
{
"id": 10200,
"class": "class_A2",
"inputs": {
"input_A2_1": 403096,
"input_A2_2": 403095
},
"outputs": {
"output_A2": [
[
403098,
{
"output_A2_1": 403101
},
{
"output_A2_2": [
403099,
403100
]
}
]
],
"output_A2_3": 403102,
"output_A2_4": 403103,
"output_A2_5": 403104
}
}
]
}
Could you please suggest me some SQL query to extract outputs from the JSONB field.
What I need to get as a results:
Output:
name
value
output_A1
403096
output_A2
403098
output_A2_1
403101
output_A2_2
403099
output_A2_2
403100
output_A2_3
403102
output_A2_4
403103
output_A2_5
403104
Any ideas?
Whenever an array is encountered, then JSONB_ARRAY_ELEMENTS(), or an object is encountered, then JSONB_EACH() functions might be applied, along with auxiliary JSONB_TYPEOF() function to determine respective types, consecutively. At the end, combine the results whether of type array or object or not through use of UNION ALL such as
WITH j AS
(
SELECT j2.*, JSONB_TYPEOF(j2.value) AS type
FROM t,
JSONB_EACH(jsdata) AS j0(k,v),
JSONB_ARRAY_ELEMENTS(v) AS j1,
JSONB_EACH((j1.value ->> 'outputs')::JSONB) AS j2
), jj AS
(
SELECT key,j1.*,JSONB_TYPEOF(j1.value::JSONB) AS type
FROM j,
JSONB_ARRAY_ELEMENTS(value) AS j0(v),
JSONB_ARRAY_ELEMENTS(v) AS j1
WHERE type = 'array'
), jjj AS
(
SELECT key,j0.v,JSONB_TYPEOF(j0.v::JSONB) AS type,k
FROM jj,
JSONB_EACH(value) AS j0(k,v)
WHERE type IN ('array','object')
)
SELECT key,value
FROM
(
SELECT key,value,type
FROM j
UNION ALL
SELECT key,value,type
FROM jj
UNION ALL
SELECT k,v,type
FROM jjj
) jt
WHERE type NOT IN ('array','object')
UNION ALL
SELECT k,value
FROM jjj,JSONB_ARRAY_ELEMENTS(v) AS j0
WHERE type IN ('array','object')
Demo
Given the following two table columns jsonb type:
dividend_actual
{
"dividends": [
{
"amount": "2.9800",
"balanceDate": "2020-06-30T00:00:00Z"
},
{
"amount": "4.3100",
"balanceDate": "2019-06-30T00:00:00Z"
}
],
"lastUpdated": "2020-11-16T14:50:51.289649512Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
dividend_forecast
{
"dividends": [
{
"amount": "2.3035",
"balanceDate": "2021-06-01T00:00:00Z"
},
{
"amount": "3.0452",
"balanceDate": "2022-06-01T00:00:00Z"
},
{
"amount": "3.1845",
"balanceDate": "2023-06-01T00:00:00Z"
}
],
"lastForecasted": "2020-11-13T00:00:00Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
I would like to merge both dividends arrays from dividend_actual and dividend_forecast, but before merging them I want to add an extra field (forecast) on every single object.
I did try the following:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
(f.dividend_forecast->'dividends' || jsonb '{"forecast": true}') ||
(f.dividend_actual->'dividends' || jsonb '{"forecast": false}')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
The above query gives me the following results:
{"forecast":true}
{"forecast":false}
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z"}
But what I need instead is the following output:
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z","forecast":true}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z","forecast":true}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z","forecast":true}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z","forecast":false}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z","forecast":false}
It turns out that it is not possible to update multiple jsons objects within a json array in a single operation by default.
To be able to do that a Postgres function needs to be created:
-- the params are the same as in aforementioned `jsonb_set`
CREATE OR REPLACE FUNCTION update_json_array_elements(target jsonb, path text[], new_value jsonb)
RETURNS jsonb language sql AS $$
-- aggregate the jsonb from parts created in LATERAL
SELECT jsonb_agg(updated_jsonb)
-- split the target array to individual objects...
FROM jsonb_array_elements(target) individual_object,
-- operate on each object and apply jsonb_set to it. The results are aggregated in SELECT
LATERAL jsonb_set(individual_object, path, new_value) updated_jsonb
$$;
The above function was suggested by kubak in this answer: https://stackoverflow.com/a/53712268/782390
Combined with this query:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
update_json_array_elements(f.dividend_forecast->'dividends', '{forecast}', 'true') ||
update_json_array_elements(f.dividend_actual->'dividends', '{forecast}', 'false')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
I then get the following output, that it is exactly what I need:
{"amount":"3.1845","forecast":true,"balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","forecast":true,"balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","forecast":true,"balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","forecast":false,"balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","forecast":false,"balanceDate":"2019-06-30T00:00:00Z"}
File stored in Hive:
[
{
"occupation": "guitarist",
"fav_game": "football",
"name": "d1"
},
{
"occupation": "dancer",
"fav_game": "chess",
"name": "k1"
},
{
"occupation": "traveller",
"fav_game": "cricket",
"name": "p1"
},
{
"occupation": "drummer",
"fav_game": "archery",
"name": "d2"
},
{
"occupation": "farmer",
"fav_game": "cricket",
"name": "k2"
},
{
"occupation": "singer",
"fav_game": "football",
"name": "s1"
}
]
CSV file in hadoop:
name,age,city
d1,23,delhi
k1,23,indore
p1,23,blore
d2,25,delhi
k2,30,delhi
s1,25,delhi
I queried them individually, it's working fine. Then, I tried join query:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on d.name = u.name
I got the following issue:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error Fragment 0:0 [Error Id: b01db9c8-fb35-4ef8-a1c0-31b68ff7ae8d on IMPETUS-DSRV03.IMPETUS.CO.IN:31010]
Please refer this https://drill.apache.org/docs/data-type-conversion/
We need to do explicit typecasting to deal with such scenario.
Consider we have a JSON file employee.json and a csv file sample.csv. In order to query on both at the same time , in one query we need to do type casting.
0: jdbc:drill:zk=local> select emp.employee_id, dept.department_description, phy.columns[2], phy.columns[3] FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/tmp/sample.csv` phy where CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
Here we are typecasting CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) so that equality does not fail.
Refer this for more detail:- http://www.devinline.com/2015/11/apache-drill-setup-and-SQL-query-execution.html#multiple_src
You need to cast even though by default it has taken varchar. Try this:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on cast(d.name as VARCHAR) = cast(u.name as VARCHAR)
But you cannot refer to column name directly from csv. you need to consider columns[0] for name.