Split JSON into columns in a dynamic way in Big Query - json

I have the following JSON:
{
"rewards": {
"reward_1": {
"type": "type 1",
"amount": "amount 1"
},
"reward_2": {
"type": "type 2",
"amount": "amount 2"
},
"reward_3": {
"type": "type 3",
"amount": "amount 3"
},
"reward_4": {
"type": "type 4",
"amount": "amount 4"
}
}
}
This JSON is dynamic and I don't necessarily know how many rewards it will get, here it's 4 but it can be 2 or 8 etc.
I want to write a query in Big Query that will parse those values dynamically without knowing how many of them exist, and then split them into column, like this:
Thank you!

Hope these are helpful.
since a JSON data is dynamic, first step is to find a max reward sequence. (I've used a regular expression and max_reward UDF.)
and then, extract each reward from a json rewards field in an iterative way.
lastly, make the result to be a wide form using PIVOT query.
If you want a more generic solution, you need to use BigQuery dynamic SQL to generate PIVOT columns. I've hard-coded them in the query.
('reward_1', 'reward_2', 'reward_3', 'reward_4')
query:
CREATE TEMP TABLE sample AS
SELECT 1 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your json here
UNION ALL
SELECT 2 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your another json here
;
CREATE TEMP FUNCTION extract_reward(json STRING, seq INT64)
RETURNS STRUCT<type STRING, amount STRING>
LANGUAGE js AS """
return JSON.parse(json)['reward_' + seq];
""";
CREATE TEMP FUNCTION max_reward(arr ARRAY<STRING>) AS ((
SELECT MAX(CAST(v AS INT64)) FROM UNNEST(arr) v
));
SELECT * FROM (
SELECT id,
'reward_' || seq AS reward,
extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq) AS value
FROM sample, UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq
) PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1', 'reward_2', 'reward_3', 'reward_4'));
output:
▶ Split a reward STRUCT column into separate columns
SELECT * FROM (
SELECT id,
'reward_' || seq || '_' || IF (offset = 0, 'type', 'amount') AS reward,
value
FROM sample,
UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq,
UNNEST([extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq)]) pair,
UNNEST([pair.type, pair.amount]) value WITH OFFSET
) PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1_type', 'reward_2_type', 'reward_3_type', 'reward_4_type', 'reward_1_amount', 'reward_2_amount', 'reward_3_amount', 'reward_4_amount'));
output:

Related

Get element value in a mysql nested json data

What would be the right way of getting Ajax, i.e. the value for the last occurence for key child1Dob1, from a json field that has a data structure that looks like the below,
{
"data": {
"data": {
"data": {
"child1Dob1": "Andy"
},
"child1Dob1": "Bob"
},
"child1Dob1": "Rick"
},
"child1Dob1": "Ajax"
}
The below query was an attempt from a similar question but i am getting a null value, so obviously i am missing something.
SELECT JSON_EXTRACT(`containerValue`,CONCAT("$.data[",JSON_LENGTH(`containerValue` ->> '$.data')-1,"]")) from myTable where containerKey = 'theContainer';
For CREATE TABLE test (data JSON):
WITH RECURSIVE
cte AS (
SELECT data, data -> '$.data' subdata
FROM test
UNION ALL
SELECT subdata, subdata -> '$.data'
FROM cte
WHERE subdata IS NOT NULL
)
SELECT data ->> '$.child1Dob1'
FROM cte
WHERE subdata IS NULL;

SQL JSON Array, extract field from each element items into concatenated string

I've got a JSON column containing an array of items:
[
{
"type": "banana"
},
{
"type": "apple"
},
{
"type": "orange"
}
]
I want to select one column with a concatenated type, resulting in 'banana, apple, orange'.
Thanks,
David
You need to parse and aggregate the stored JSON:
SELECT
JsonColumn,
NewColumn = (
SELECT STRING_AGG(JSON_VALUE([value], '$.type'), ',')
WITHIN GROUP (ORDER BY CONVERT(int, [key]))
FROM OPENJSON(t.JsonColumn)
)
FROM (VALUES
('[{"type":"banana"},{"type":"apple"},{"type":"orange"}]')
) t (JsonColumn)
Result:
JsonColumn
NewColumn
[{"type":"banana"},{"type":"apple"},{"type":"orange"}]
banana,apple,orange

Complex JSON using JSON_MODIFY without nested arrays or escape characters (WITHOUT_ARRAY_WRAPPER)

I am using JSON_MODIFY to build complex JSON. Moving from MySQL I am struggling with the JSON functions provided by SQL Server. The issue I'm having is that SQL Server seems to construct all JSON objects in an array. There is the WITHOUT_ARRAY_WRAPPER mechanism, which seems like it should do what I want, however; there are two undesirable consequences.
It only returns one result depending on how it is used
The result is a single string with escape characters
I have constructed a simple query which illustrates my needs and the issue.
QUERY 1
SELECT JSON_MODIFY(
JSON_QUERY('{"definitions": {"id": "INT", "name": "VARCHAR(23)"}}'),
'append $.data',
(
SELECT * FROM (
SELECT 1 AS id, '123abc' AS "name" UNION
SELECT 2 AS id, '234bcd' AS "name"
) AS "data"
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
)
) AS "data";
OUTPUT 1
{
"definitions":{
"id":"INT",
"name":"VARCHAR(23)"
},
"data":[
"{\"id\":1,\"name\":\"123abc\"},{\"id\":2,\"name\":\"234bcd\"}"
]
}
QUERY 2
SELECT JSON_MODIFY(
JSON_QUERY('{"definitions": {"id": "INT", "name": "VARCHAR(23)"}}'),
'append $.data',
(
SELECT * FROM (
SELECT 1 AS id, '123abc' AS "name" UNION
SELECT 2 AS id, '234bcd' AS "name"
) AS "data"
FOR JSON PATH
)
) AS "data";
OUTPUT 2
{
"definitions":{
"id":"INT",
"name":"VARCHAR(23)"
},
"data":[
[
{"id":1, "name":"123abc"},
{"id":2, "name":"234bcd"}
]
]
}
QUERY 1
The data object is an array (which is expected), but the problem is what is in the array... A single string with escape characters.
QUERY 2
The data object is an array, which contains an array. In order to access the actual array of data, I would use something like for each obj in data[0].... The problem this poses is, for anyone consuming the JSON object, I would have to tell them:
"In this particular object the data element is an array of
arrays--You'll want to use the first and only the first
element to access the actual array of data."
I've naively tried many different combinations of JSON_MODIFY, JSON_QUERY, and CONCAT to no avail. How can I properly use JSON_MODIFY to get the following output, without the double array in data?
{
"definitions":{
"id":"INT",
"name":"VARCHAR(23)"
},
"data":[
{"id":1, "name":"123abc"},
{"id":2, "name":"234bcd"}
]
}
You are over-thinking this by trying to JSON_MODIFY an existing object.
Construct the definitions and data properties that you need, inside a subquery if necessary.
Then use FOR JSON a second time to get the outer object.
SELECT
definitions = JSON_QUERY('{"id": "INT", "name": "VARCHAR(23)"}'),
data =
(
SELECT id, name
FROM (VALUES
(1, '123abc'),
(2, '234bcd')
) v(id, name)
FOR JSON PATH
)
FOR JSON PATH;
SQL Fiddle
By trial and error, I found the solution.
Removed the append keyword from the path parameter in the JSON_MODIFY statement
Removed the WITHOUT_ARRAY_WRAPPER parameter from the FOR JSON statement.
Now the results are as expected and I don't need to explain to any consumers to "Just use data[0]"
The Query
SELECT JSON_MODIFY(
JSON_QUERY('{"definitions": {"id": "INT", "name": "VARCHAR(23)"}}'),
'$.data',
(
SELECT * FROM (
SELECT 1 AS id, '123abc' AS "name" UNION
SELECT 2 AS id, '234bcd' AS "name"
) AS "data"
FOR JSON PATH
)
) AS "data";
Produces the following output
{
"definitions":{
"id":"INT",
"name":"VARCHAR(23)"
},
"data":[
{"id":1, "name":"123abc"},
{"id":2, "name":"234bcd"}
]
}

How To Merge Multiple PostgreSQL JSON Arrays That Are The Result Of Another JSON Array Update On All Of Its Elements?

Given the following two table columns jsonb type:
dividend_actual
{
"dividends": [
{
"amount": "2.9800",
"balanceDate": "2020-06-30T00:00:00Z"
},
{
"amount": "4.3100",
"balanceDate": "2019-06-30T00:00:00Z"
}
],
"lastUpdated": "2020-11-16T14:50:51.289649512Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
dividend_forecast
{
"dividends": [
{
"amount": "2.3035",
"balanceDate": "2021-06-01T00:00:00Z"
},
{
"amount": "3.0452",
"balanceDate": "2022-06-01T00:00:00Z"
},
{
"amount": "3.1845",
"balanceDate": "2023-06-01T00:00:00Z"
}
],
"lastForecasted": "2020-11-13T00:00:00Z",
"providerUpdateDate": "2020-11-16T00:00:00Z"
}
I would like to merge both dividends arrays from dividend_actual and dividend_forecast, but before merging them I want to add an extra field (forecast) on every single object.
I did try the following:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
(f.dividend_forecast->'dividends' || jsonb '{"forecast": true}') ||
(f.dividend_actual->'dividends' || jsonb '{"forecast": false}')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
The above query gives me the following results:
{"forecast":true}
{"forecast":false}
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z"}
But what I need instead is the following output:
{"amount":"3.1845","balanceDate":"2023-06-01T00:00:00Z","forecast":true}
{"amount":"3.0452","balanceDate":"2022-06-01T00:00:00Z","forecast":true}
{"amount":"2.3035","balanceDate":"2021-06-01T00:00:00Z","forecast":true}
{"amount":"2.9800","balanceDate":"2020-06-30T00:00:00Z","forecast":false}
{"amount":"4.3100","balanceDate":"2019-06-30T00:00:00Z","forecast":false}
It turns out that it is not possible to update multiple jsons objects within a json array in a single operation by default.
To be able to do that a Postgres function needs to be created:
-- the params are the same as in aforementioned `jsonb_set`
CREATE OR REPLACE FUNCTION update_json_array_elements(target jsonb, path text[], new_value jsonb)
RETURNS jsonb language sql AS $$
-- aggregate the jsonb from parts created in LATERAL
SELECT jsonb_agg(updated_jsonb)
-- split the target array to individual objects...
FROM jsonb_array_elements(target) individual_object,
-- operate on each object and apply jsonb_set to it. The results are aggregated in SELECT
LATERAL jsonb_set(individual_object, path, new_value) updated_jsonb
$$;
The above function was suggested by kubak in this answer: https://stackoverflow.com/a/53712268/782390
Combined with this query:
SELECT
dividends
FROM
stock_financial AS f
INNER JOIN instrument AS i ON i.id = f.instrument_id,
jsonb_array_elements(
update_json_array_elements(f.dividend_forecast->'dividends', '{forecast}', 'true') ||
update_json_array_elements(f.dividend_actual->'dividends', '{forecast}', 'false')
) AS dividends
WHERE
i.symbol = 'ASX_CBA'
ORDER BY
dividends ->>'balanceDate' DESC;
I then get the following output, that it is exactly what I need:
{"amount":"3.1845","forecast":true,"balanceDate":"2023-06-01T00:00:00Z"}
{"amount":"3.0452","forecast":true,"balanceDate":"2022-06-01T00:00:00Z"}
{"amount":"2.3035","forecast":true,"balanceDate":"2021-06-01T00:00:00Z"}
{"amount":"2.9800","forecast":false,"balanceDate":"2020-06-30T00:00:00Z"}
{"amount":"4.3100","forecast":false,"balanceDate":"2019-06-30T00:00:00Z"}

converting xml to json using postgresql

I am working on converting XML to j son string using the PostgreSQL.
we have attributecentric XML and would like to know how to convert it to j son.
Example XML:
<ROOT><INPUT id="1" name="xyz"/></ROOT>
Need to get the j son as follows:
{ "ROOT": { "INPUT": { "id": "1", "name": "xyz" }}}
got the above json format from an online tool.
any help or guidance will be appreciated.
Regards
Abdul Azeem
Basically, breakdown of this problem is the following:
extract values from given XML using xpath() function
generate and combine JSON entries using json_object_agg() function
The only tricky thing here is to combine key,value pairs together from xpath()/json_object_agg() results, which are{ "id": "1"} and { "name" : "xyz"}.
WITH test_xml(data) AS ( VALUES
('<ROOT><INPUT id="1" name="xyz"/></ROOT>'::XML)
), attribute_id(value) AS (
-- get '1' value from id
SELECT (xpath('//INPUT/#id',test_xml.data))[1] AS value
FROM test_xml
), attribute_name(value) AS (
-- get 'xyz' value from name
SELECT (xpath('//INPUT/#name',test_xml.data))[1] AS value
FROM test_xml
), json_1 AS (
-- generate JSON 1 {"id": "1"}
SELECT json_object_agg('id',attribute_id.value) AS payload
FROM attribute_id
), json_2 AS (
-- generate JSON 2 {"name" : "xyz"}
SELECT json_object_agg('name',attribute_name.value) AS payload FROM attribute_name
), merged AS (
-- Generate INPUT result - Step 1 - combine JSON 1 and 2 as single key,value source
SELECT key,value
FROM json_1,json_each(json_1.payload)
UNION ALL
SELECT key,value
FROM json_2,json_each(json_2.payload)
), input_json_value AS (
-- Generate INPUT result - Step 2 - use json_object_agg to create JSON { "id" : "1", "name" : "xyz" }
SELECT json_object_agg(merged.key,merged.value) AS data
FROM merged
), input_json AS (
-- Generate INPUT JSON as expected { "INPUT" : { "id" : "1", "name" : "xyz" } }
SELECT json_object_agg('INPUT',input_json_value.data) AS data
FROM input_json_value
)
-- Generate final reult
SELECT json_object_agg('ROOT',input_json.data)
FROM input_json;