BigQuery JSON element extraction - json

I have a table in BigQuery with a JSON column, see below.
doc_id
data
222
{...}
333
{...}
The data JSON column looks like the IDs are set as headers.
{
"1675223776617": {
"author": "aaa",
"new": "2023-02-01",
"old": null,
"property": "asd",
"sender": "wew"
},
"1675223776618": {
"author": "aaa",
"new": true,
"old": null,
"property": "asd",
"sender": "ewew"
},
"1675223776619": {
"author": "bbb",
"new": "ySk2btk7",
"old": null,
"property": "qwe",
"sender": "yyy"
}
}
I would like to extract this JSON into this format using SQL in BigQuery.
Note, the header id isn't defined in the JSON.
doc_id
id
author
new
old
property
sender
222
1675223776617
aaa
2023-02-01
null
asd
wew
222
1675223776618
aaa
true
null
asd
ewew
222
1675223776619
bbb
ySk2btk7
null
qwe
yyy
I tried using the JSON_EXTRACT function without any success.

You might consider below approach using javascript UDF.
CREATE TEMP FUNCTION flatten_json(json STRING)
RETURNS ARRAY<STRUCT<id STRING, author STRING, new STRING, old STRING, property STRING, sender STRING>>
LANGUAGE js AS """
result = [];
for (const [key, value] of Object.entries(JSON.parse(json))) {
value["id"] = key; result.push(value);
}
return result;
""";
WITH sample_table AS (
SELECT 222 doc_id, '''{
"1675223776617": {
"author": "aaa",
"new": "2023-02-01",
"old": null,
"property": "asd",
"sender": "wew"
},
"1675223776618": {
"author": "aaa",
"new": true,
"old": null,
"property": "asd",
"sender": "ewew"
},
"1675223776619": {
"author": "bbb",
"new": "ySk2btk7",
"old": null,
"property": "qwe",
"sender": "yyy"
}
}''' data
)
SELECT doc_id, flattened.*
FROM sample_table, UNNEST(flatten_json(json)) flattened;
Query results

Related

How to write JSON to Mysql?

sorry for my bad english.
I am inserting a json into mysql like this:
set #json = '[{"name":"ivan","city":"london","kurs":"1", },{"name":"lena","city":"tokio","kurs":"5"},{"name":"misha","city":"kazan","kurs":"3"}]';
select * from json_table(#json,'$[*]' columns(name varchar(20) path '$.name',
city varchar(20) path '$.city',
kurs varchar(20) path '$.kurs')) as jsontable;
But now there is a task to insert an unknown number of additional properties:
set #json = '[{"name":"ivan","city":"london","kurs":"1","options": [{
"ao_id": 90630,
"name": "Высота предмета",
"value": "3.7 см"
}, {
"ao_id": 90673,
"name": "Ширина предмета",
"value": "4 см"
}, {
"ao_id": 90745,
"name": "Ширина упаковки",
"value": "4 см"
}]},{"name":"lena","city":"tokio","kurs":"5", "options": [{
"ao_id": 90630,
"name": "Высота предмета",
"value": "9.7 см"
}]},{"name":"misha","city":"kazan","kurs":"3", "options": [{
"ao_id": 90999,
"name": "Высота",
"value": "5.7 см"
}]}]';
How can I best do this so that I can access the table in the future (search, index, output)?

Retrieve JSON from sql

My json format in one of the sql columns "jsoncol" in the table "jsontable" is like below.
Kindly help me to get this data using JSON_QUERY or JSON_VALUE
Please pay attention to the brackets and double quotes in the key value pairs...
{
"Company": [
{
"Info": {
"Address": "123"
},
"Name": "ABC",
"Id": 999
},
{
"Info": {
"Address": "456"
},
"Name": "XYZ",
"Id": 888
}
]
}
I am trying to retrieve all the company names using sql query. Thanks in advance
You can use:
SELECT j.name
FROM table_name t
CROSS APPLY JSON_TABLE(
t.value,
'$.Company[*]'
COLUMNS(
name VARCHAR2(200) PATH '$.Name'
)
) j
Which, for the sample data:
CREATE TABLE table_name (
value CLOB CHECK (value IS JSON)
);
INSERT INTO table_name (value)
VALUES ('{
"Company": [
{
"Info": {
"Address": "123"
},
"Name": "ABC",
"Id": 999
},
{
"Info": {
"Address": "456"
},
"Name": "XYZ",
"Id": 888
}
]
}');
Outputs:
NAME
ABC
XYZ
db<>fiddle here
You can easily use JSON_TABLE() function for this case rather provided the DB version is at least 12.1.0.2 such as
SELECT name
FROM jsontable,
JSON_TABLE(jsoncol,
'$' COLUMNS(NESTED PATH '$."Company"[*]'
COLUMNS(name VARCHAR2 PATH '$."Name"')))
Demo

How do you use JSON_QUERY with null json array inside of json object?

SELECT JSON_query([json], '$') from mytable
Returns fine the contents of [json] field
SELECT JSON_query([json], '$.Guid') from mytable
Returns null
SELECT JSON_query([json], '$.Guid[1]') from mytable
Returns null
I've also now tried:
SELECT JSON_query([json], '$[1].Guid')
SELECT JSON_query([json], '$[2].Guid')
SELECT JSON_query([json], '$[3].Guid')
SELECT JSON_query([json], '$[4].Guid')
and they all return null
So I'm stuck as to figuring out how create the path to get to the info. Maybe SQL Server json_query can't handle the null as the first array?
Below is the string that is stored inside of the [json] field in the database.
[
null,
{
"Round": 1,
"Guid": "15f4fe9d-403c-4820-8e35-8a8c8d78c33b",
"Team": "2",
"PlayerNumber": "78"
},
{
"Round": 1,
"Guid": "8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67",
"Team": "2",
"PlayerNumber": "54"
},
{
"Round": 1,
"Guid": "f53cd74b-ed5f-47b3-aab5-2f3790f3cd34",
"Team": "1",
"PlayerNumber": "23"
},
{
"Round": 1,
"Guid": "30297678-f2cf-4b95-a789-a25947a4d4e6",
"Team": "1",
"PlayerNumber": "11"
}
]
You need to follow the comments below your question. I'll just summarize them:
Probably the most appropriate approach in your case is to use OPENJSON() with explicit schema (the WITH clause).
JSON_QUERY() extracts a JSON object or a JSON array from a JSON string and returns NULL. If the path points to a scalar JSON value, the function returns NULL in lax mode and an error in strictmode. The stored JSON doesn't have a $.Guid key, so NULL is the actual result from the SELECT JSON_query([json], '$.Guid') FROM mytable statement.
The following statements provide a working solution to your problem:
Table:
SELECT *
INTO Data
FROM (VALUES
(N'[
null,
{
"Round": 1,
"Guid": "15f4fe9d-403c-4820-8e35-8a8c8d78c33b",
"Team": "2",
"PlayerNumber": "78",
"TheProblem": "doesn''t"
},
{
"Round": 1,
"Guid": "8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67",
"Team": "2",
"PlayerNumber": "54"
},
{
"Round": 1,
"Guid": "f53cd74b-ed5f-47b3-aab5-2f3790f3cd34",
"Team": "1",
"PlayerNumber": "23"
},
{
"Round": 1,
"Guid": "30297678-f2cf-4b95-a789-a25947a4d4e6",
"Team": "1",
"PlayerNumber": "11"
}
]')
) v (Json)
Statements:
SELECT j.Guid
FROM Data d
OUTER APPLY OPENJSON(d.Json) WITH (
Guid uniqueidentifier '$.Guid',
Round int '$.Round',
Team nvarchar(1) '$.Team',
PlayerNumber nvarchar(2) '$.PlayerNumber'
) j
SELECT JSON_VALUE(j.[value], '$.Guid')
FROM Data d
OUTER APPLY OPENJSON(d.Json) j
Result:
Guid
------------------------------------
15f4fe9d-403c-4820-8e35-8a8c8d78c33b
8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67
f53cd74b-ed5f-47b3-aab5-2f3790f3cd34
30297678-f2cf-4b95-a789-a25947a4d4e6

TSQL JSON_QUERY can you use a filter in the JSON Path

I have a table with a column that holds valid JSON, heres an example of the JSON structure:
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}
I need to return the values from MultiSelected collection depending on the value of ListID.
I'm using the following JSON Path to retun value
$.Requirements."$values"[?(#.ListId=='956cf9c5-24ab-47d9-8082-940118f2f1a3')].MultiSelected."$values"
This worked fine in a JSON Expression tester.
But when I try to use it to query the table I get the following error:
JSON path is not properly formatted. Unexpected character '?' is found at position 25.
The query I'm using is as follows:
SELECT ID AS PayloadID,
Items.Item AS ItemsValues
FROM dbo.Payload
CROSS APPLY ( SELECT *
FROM OPENJSON( JSON_QUERY( Payload, '$.Requirements."$values"[?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')].MultiSelected."$values"' ) )
WITH ( Item nvarchar(200) '$' ) ) AS Items
WHERE ID = 3
I've tried replacing
?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')
with 0 and it works fine on SQL Server.
My question is, is filter syntax ?(...) supported in JSON_QUERY or is there something else I should be doing?
The database is running on Azure, were the database compatability level is set to SQL Server 2017 (140).
Thanks for your help in advance.
Andy
I would use openjson twice in stead
drop table if exists #payload
create table #payload(ID int,Payload nvarchar(max))
insert into #payload VALUES
(3,N'
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}'
)
SELECT ID AS PayloadID,
Items.[value]
FROM #Payload a
CROSS APPLY OPENJSON( Payload, '$.Requirements."$values"' ) with ( ListId varchar(50),MultiSelected nvarchar(max) as json) b
CROSS APPLY OPENJSON( MultiSelected,'$."$values"' ) Items
where
a.id=3
AND b.listid='956cf9c5-24ab-47d9-8082-940118f2f1a3'
Reply:
+-----------+--------+
| PayloadID | value |
+-----------+--------+
| 3 | Value1 |
| 3 | Value2 |
| 3 | Value3 |
+-----------+--------+

Error in Nested JSON in HIve

I was trying to load this json data in hive
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
using DDL commands
ADD JAR /home/cloudera/Downloads/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE format.json_serde (
`id` string,
`type` string,
`name` string,
`ppu` float,
batters` struct < `batter`:array < struct <`bid`:string, `btype`:string >>>,
`topping`:array < struct<`tid`:int, `ttype`:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
is throwing me error
FAILED: ParseException line 7:11 cannot recognize input near ':' 'array' '<' in column type </b>
You got typos
ttype`:string should be ttype:string
battersstruct should be batters struct
topping:array should be topping array
JSON SerDe mapping is done by name.
Your structs fields names should match the actual names, e.g. id and not bid or tid, otherwise you'll get NULL values for these fields.
There is already a JSON SerDe whicg is part of the Hive installation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
create external table json_serde
(
id string
,type string
,name string
,ppu float
,batters struct<batter:array<struct<id:string,type:string>>>
,topping array<struct<id:string,type:string>>
)
row format serde
'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from json_serde
;
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | type | name | ppu | batters | topping |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0001 | donut | Cake | 0.550000011920929 | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil'sFood"}]} | [{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}] |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
It worked when i removed the semicolons near topping. Thanks
CREATE EXTERNAL TABLE format.json_serde (
id string,
type string,
name string,
ppu float,
batters struct<batter:array<
struct<bid:string, btype:string >>>,
topping array< struct<tid:string, ttype:string>>
)