Unnest non-array JSON in BigQuery - json

I have data arriving as separate events in JSON form resembling:
{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}
I'd like to unnest the data to essentially have one row per 'packet' (there may be any number of packets).
id
name
value
1234
packet1
1
1234
packet2
2
I've looked at using the unnest function with the various JSON functions but it seems limited to working with arrays. I have not been able to find a way to treat the 'data' field as if it were an array.
At the moment, I cannot change these events to store packets in an array, and ideally the unnesting should be happening within BigQuery.

1. Regular expressions
There might be other ways but you can consider below approach using regular expressions.
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, name, value
FROM sample_table,
UNNEST (REGEXP_EXTRACT_ALL(events, r'"name":"(\w+)"')) name WITH offset
JOIN UNNEST (REGEXP_EXTRACT_ALL(events, r'"value":([0-9.]+)')) value WITH offset
USING (offset);
Query results
2. Javascript UDF
or, you might consider below using Javascript UDF.
CREATE TEMP FUNCTION extract_pair(json STRING)
RETURNS ARRAY<STRUCT<name STRING, value STRING>>
LANGUAGE js AS """
result = [];
for (const [key, value] of Object.entries(JSON.parse(json))) {
result.push(value);
}
return result;
""";
WITH sample_table AS (
SELECT """{
"id":1234,
"data":{
"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}
}
}""" AS events
)
SELECT JSON_VALUE(events, '$.id') AS id, obj.*
FROM sample_table, UNNEST(extract_pair(JSON_QUERY(events, '$.data'))) obj;

#Jaytiger's suggestion of unnesting a regex extract led me to the following solution.
The example I showed was simplified - there are more fields within the packets. To avoid requiring separate regex for each field name, I used regex to split/extract each individual packet, and then read the JSON.
This iteration doesn't do everything in one step but works when just looking at packets.
with sample_data
AS (SELECT """{"packet1":{"name":"packet1", "value":1},
"packet2":{"name":"packet2", "value":2}}""" as packets)
select
json_value('{'||packet||'}', "$.name") name,
json_value('{'||packet||'}', "$.value") value
from sample_data,
unnest(regexp_extract_all(packets, r'\:{(.*?)\}')) packet

Related

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

Subquery for element of JSON column

I have a big JSON data in one column called response_return in a Postgres DB, with a response like:
{
"customer_payment":{
"OrderId":"123456789",
"Customer":{
"Full_name":"Francis"
},
"Payment":{
"AuthorizationCode":"9874565",
"Recurrent":false,
"Authenticate":false,
...
}
}
}
I tried to use Postgres functions like -> ,->> ,#> or #> to walk through headers to achieve AuthorizationCode for a query.
When I use -> in customer_payment in a SELECT, returns all after them. If I try with OrderId, it's returned NULL.
The alternatives and sources:
Using The JSON Datatype In PostgreSQL
Operator ->
Allows you to select an element based on its name.
Allows you to select an element within an array based on its index.
Can be used sequentially: ::json->'elementL'->'subelementM'->…->'subsubsubelementN'.
Return type is json and the result cannot be used with functions and operators that require a string-based datatype. But the result can be used with operators and functions that require a json datatype.
Query for element of array in JSON column
This is not helpful because I don't want filter and do not believe that need to transform to array.
If you just want to get a single attribute, you can use:
select response_return -> 'customer_payment' -> 'Payment' ->> 'AuthorizationCode'
from the_table;
You need to use -> for the intermediate access to the keys (to keep the JSON type) and ->> for the last key to return the value as a string.
Alternatively you can provide the path to the element as an array and use #>>
select response_return #>> array['customer_payment', 'Payment', 'AuthorizationCode']
from the_table;
Online example

Unpacking JSON Into Flat Format

I have JSON that resembles the following:
{
"ANNOTATIONS": [
{
"Label": "CommingledProduct",
"Text": "NBP"
},
{
"Label": "CommingledVenue",
"Text": "OTC"
}
]
}
I need to unpack this into a flat table with columns matching the annotation labels. So columns based on the above json become:
comingled_product
comingled_venue
The JSON is coming from a json field in a source table and being unpacked into another table.
I know that I could code as follows:
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT
payload->'ANNOTATIONS'->0->>'Text',
payload->'ANNOTATIONS'->1->>'Text'
FROM my_source_table;
However, I would rather not use the ordinals of the annotations. I would prefer to use some syntax mirroring the psuedo-code below:
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT
payload->'ANNOTATIONS'->'label="ComingledProduct"'->>'Text',
payload->'ANNOTATIONS'->'label="ComingledVenueID"'->>'Text'
FROM my_source_table;
Can anyone tell me if what I'm trying to ahcieve is possible and how to do it? There are more than the two annotations I have included in the sample, so anything that involves multiple joins is probably a no go.
Using PostGres 10.7
demo:db<>fiddle
WITH cte AS (
SELECT
elems.value
FROM
my_source_table,
json_array_elements(payload -> 'ANNOTATIONS') elems
)
SELECT
(SELECT value ->> 'Text' FROM cte WHERE value ->> 'Label' = 'CommingledProduct'),
(SELECT value ->> 'Text' FROM cte WHERE value ->> 'Label' = 'CommingledVenue')
Expanding the array into one row per array element and store this result for further usage into a CTE
This result can be used to query the expected values (without doing the expanding twice)
Could be a little bit faster:
demo:db<>fiddle
SELECT
payload,
MIN(the_text) FILTER (WHERE label = 'CommingledProduct'),
MIN(the_text) FILTER (WHERE label = 'CommingledVenue')
FROM (
SELECT
payload::text AS payload,
elems ->> 'Label' AS label,
elems ->> 'Text' AS the_text
FROM
my_source_table,
json_array_elements(payload -> 'ANNOTATIONS') elems
) s
GROUP BY payload
The answer from #S-Man is great and you should use that for your postgres 10.7. json_path will be added in postgres 12, which will allow you to do something a little bit closer to your pseudo-code, but only with jsonb (not json):
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT jsonb_path_query(payload,
'$.ANNOTATIONS[*] ? (#.Label == "CommingledProduct")')->>'Text',
jsonb_path_query(payload,
'$.ANNOTATIONS[*] ? (#.Label == "CommingledVenue")')->>'Text'
FROM my_source_table;
The jsonb_path_query syntax takes a bit to figure out, but it is basically returning elements of the ANNOTATIONS array for which the Label equals either CommingledProduct or CommingledVenue. jsonb_path_query returns a jsonb object, so we can use the ->> operator to grab the value of 'Text' from the object.

Combine multiple JSON rows into one JSON object in PostgreSQL

I want to combine the following JSON from multiple rows into one single JSON object as a row.
{"Salary": ""}
{"what is your name?": ""}
{"what is your lastname": ""}
Expected output
{
"Salary": "",
"what is your name?": "",
"what is your lastname": ""
}
With only built-in functions, you need to expand the rows into key/value pairs and aggregate that back into a single JSON value:
select jsonb_object_agg(t.k, t.v)
from the_table, jsonb_each(ob) as t(k,v);
If your column is of type json rather than jsonb you need to cast it:
select jsonb_object_agg(t.k, t.v)
from the_table, jsonb_each(ob::jsonb) as t(k,v);
A slightly more elegant solution is to define a new aggregate that does that:
CREATE AGGREGATE jsonb_combine(jsonb)
(
SFUNC = jsonb_concat(jsonb, jsonb),
STYPE = jsonb
);
Then you can aggregate the values directly:
select jsonb_combine(ob)
from the_table;
(Again you need to cast your column if it's json rather than jsonb)
Online example

How to check if a certain string is contained within an array returned by JSON_QUERY

I'm trying to write a SQL statement that will parse some JSON and return only rows where one of the arrays in the JSON Object contains a given value.
Example JSON:
Object 1:
{
"Key1": ["item1", "item2", "item3"]
}
Object 2:
{
"Key1": ["item1", "item3"]
}
I would like to only return rows where JSON_QUERY(object, '$.Key1').Contains("item2") is true (in this example, Object 1).
Of course, this magical function 'Contains()' does not exist in tsql, and I can't find any documentation of a function that performs as I'd like.
EDIT:
My current solution (which I'm not very fond of and would like to replace) checks if the string literal '"item1"' is contained within the value returned by JSON_QUERY. I don't like this, because it's possible an entry in the array could have a value like '123123"item1"123123', and then the conditional would return true.
Christian,
Sounds to me like your SQL Query could use a where clause using a LIKE comparison on a column
WHERE col1 LIKE 'some%funky%string'
The % is a wildcard declaritive
You can use OPENJSON to get a derived result set out of a json list or array:
The followin query uses JSON_QUERY to retrieve the list of strings within Key1. This is passed as argument into OPENJSON to retrive the list of strings as derived table:
DECLARE #jsonTable TABLE(ID INT IDENTITY, JsonString NVARCHAR(MAX));
INSERT INTO #jsonTable VALUES
(N'{
"Key1": ["item1", "item2", "item3"]
}')
,(N'{
"Key1": ["item1", "item3"]
}');
DECLARE #LookFor NVARCHAR(100)='Item2'
SELECT jt.ID
,jt.JsonString
,A.value
FROM #jsonTable jt
OUTER APPLY OPENJSON(JSON_QUERY(jt.JsonString,N'$.Key1')) AS A
--WHERE A.value=#LookFor
Uncomment the final WHERE to reduce the list to the rows with a value of Item2 (as defined in the variable #LookFor).