I've recently discovered the PostgreSQL can be used store JSON
before I import loads of data I need to understand how to retrieve it in particular the nested objects
This postgresql tutorial is a good starting point but doesn't really explain how to query nested json array
In the sample below I need to select the codes -> code where codes -> level: 1 (adminCode1_iso) is related to adminName1 and if it exists codes -> level: 2 is related to adminName2
CREATE TABLE gn_json (
id serial NOT NULL PRIMARY KEY,
info json NOT NULL
);
comment on table gn_json is 'How PG holds json';
insert into gn_json (info)
VALUES ('{
"adminCode2": "C3",
"codes": [
{
"code": "ENG",
"level": "1",
"type": "ISO3166-2"
},
{
"code": "CAM",
"level": "2",
"type": "ISO3166-2"
}
],
"adminCode3": "12UE",
"adminName4": "Yelling",
"adminName3": "Huntingdonshire",
"adminCode1": "ENG",
"adminName2": "Cambridgeshire",
"distance": 0,
"countryCode": "GB",
"countryName": "United Kingdom",
"adminName1": "England",
"adminCode4": "12UE085"
}',
'{
"codes": [
{
"code": "81",
"level": "1",
"type": "ISO3166-2"
}
],
"adminCode1": "63",
"distance": 0,
"countryCode": "TH",
"countryName": "Thailand",
"adminName1": "Krabi"
}');
select info ->> 'countryName' as countryName,info ->> 'countryCode' as countryCode,
info ->> 'adminName1' as adminName1, info ->> 'adminCode1' as adminCode1,
info ->> 'adminName2' as adminName2, info ->> 'adminCode2' as adminCode2,
info ->'codes->0->' -> 'code' as adminCode1_iso,
info ->'codes->1->' -> 'code' as adminCode2_iso
FROM gn_json;
Edit Expected outcome
countryname countrycode adminname1 admincode1 adminname2 admincode2 admincode1_iso admincode2_iso
United Kingdom GB England ENG Cambridgeshire C3 ENG CAM
Thailand TH Krabi 63 NULL NULL 81 NULL
Related
SELECT JSON_query([json], '$') from mytable
Returns fine the contents of [json] field
SELECT JSON_query([json], '$.Guid') from mytable
Returns null
SELECT JSON_query([json], '$.Guid[1]') from mytable
Returns null
I've also now tried:
SELECT JSON_query([json], '$[1].Guid')
SELECT JSON_query([json], '$[2].Guid')
SELECT JSON_query([json], '$[3].Guid')
SELECT JSON_query([json], '$[4].Guid')
and they all return null
So I'm stuck as to figuring out how create the path to get to the info. Maybe SQL Server json_query can't handle the null as the first array?
Below is the string that is stored inside of the [json] field in the database.
[
null,
{
"Round": 1,
"Guid": "15f4fe9d-403c-4820-8e35-8a8c8d78c33b",
"Team": "2",
"PlayerNumber": "78"
},
{
"Round": 1,
"Guid": "8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67",
"Team": "2",
"PlayerNumber": "54"
},
{
"Round": 1,
"Guid": "f53cd74b-ed5f-47b3-aab5-2f3790f3cd34",
"Team": "1",
"PlayerNumber": "23"
},
{
"Round": 1,
"Guid": "30297678-f2cf-4b95-a789-a25947a4d4e6",
"Team": "1",
"PlayerNumber": "11"
}
]
You need to follow the comments below your question. I'll just summarize them:
Probably the most appropriate approach in your case is to use OPENJSON() with explicit schema (the WITH clause).
JSON_QUERY() extracts a JSON object or a JSON array from a JSON string and returns NULL. If the path points to a scalar JSON value, the function returns NULL in lax mode and an error in strictmode. The stored JSON doesn't have a $.Guid key, so NULL is the actual result from the SELECT JSON_query([json], '$.Guid') FROM mytable statement.
The following statements provide a working solution to your problem:
Table:
SELECT *
INTO Data
FROM (VALUES
(N'[
null,
{
"Round": 1,
"Guid": "15f4fe9d-403c-4820-8e35-8a8c8d78c33b",
"Team": "2",
"PlayerNumber": "78",
"TheProblem": "doesn''t"
},
{
"Round": 1,
"Guid": "8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67",
"Team": "2",
"PlayerNumber": "54"
},
{
"Round": 1,
"Guid": "f53cd74b-ed5f-47b3-aab5-2f3790f3cd34",
"Team": "1",
"PlayerNumber": "23"
},
{
"Round": 1,
"Guid": "30297678-f2cf-4b95-a789-a25947a4d4e6",
"Team": "1",
"PlayerNumber": "11"
}
]')
) v (Json)
Statements:
SELECT j.Guid
FROM Data d
OUTER APPLY OPENJSON(d.Json) WITH (
Guid uniqueidentifier '$.Guid',
Round int '$.Round',
Team nvarchar(1) '$.Team',
PlayerNumber nvarchar(2) '$.PlayerNumber'
) j
SELECT JSON_VALUE(j.[value], '$.Guid')
FROM Data d
OUTER APPLY OPENJSON(d.Json) j
Result:
Guid
------------------------------------
15f4fe9d-403c-4820-8e35-8a8c8d78c33b
8e91596b-cc33-4ce7-bfc0-ac3d1dc5eb67
f53cd74b-ed5f-47b3-aab5-2f3790f3cd34
30297678-f2cf-4b95-a789-a25947a4d4e6
I'm using MySQL 5.6.43 and trying to search through in JSON array in a table. My JSON in MySQL database is like this:
[
{
"Name": "AAA",
"CountryCode": "AFG",
"District": "Kabol",
"Population": 1780000
},
{
"Name": "BBB",
"CountryCode": "AFG",
"District": "Qandahar",
"Population": 237500
},
{
"Name": "CCC",
"CountryCode": "USD",
"District": "Qandahar",
"Population": 237500
}
]
I want to take the just AFG CountryCode's. So my result should be like this:
[
{
"Name": "AAA",
"CountryCode": "AFG",
"District": "Kabol",
"Population": 1780000
},
{
"Name": "BBB",
"CountryCode": "AFG",
"District": "Qandahar",
"Population": 237500
}
]
How can I achieve that?
SELECT test.data, JSON_ARRAYAGG(JSON_OBJECT('Name',jsontable.Name,
'CountryCode',jsontable.CountryCode,
'District',jsontable.District,
'Population',jsontable.Population)) filtered
FROM test,
JSON_TABLE(test.data,
'$[*]' COLUMNS (Name VARCHAR(255) PATH '$.Name',
CountryCode VARCHAR(255) PATH '$.CountryCode',
District VARCHAR(255) PATH '$.District',
Population VARCHAR(255) PATH '$.Population')) jsontable
WHERE jsontable.CountryCode = 'AFG'
GROUP BY test.data;
fiddle
In MySQL 5.6 you must use string functions - JSON is not implemented in this version yet:
SELECT CONCAT('[', GROUP_CONCAT('{', SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(test.data, '}', nums.num), '}', -1), '{', -1), '}'), ']') filtered
FROM test
CROSS JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) nums
WHERE LOCATE('"CountryCode":"AFG"', SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(test.data, ' ', ''), '}', nums.num), '}', -1))
GROUP BY test.data
fiddle
If source array may contain more than 4 objects per value then expand nums subquery.
Restriction: Neither key nor value to be searched must contain a space.
I have JSON stored in a SQL Server database table in the below format. I have been able to fudge a way to get the values I need but feel like there must be a better way to do it using T-SQL. The JSON is output from a report in the below format where the column names in "columns" correspond to the "rows"-"data" array values.
So column "Fiscal Month" corresponds to data value "11", "Fiscal Year" to "2019", etc.
{
"report": "Property ETL",
"id": 2648,
"columns": [
{
"name": "Fiscal Month",
"dataType": "int"
},
{
"name": "Fiscal Year",
"dataType": "int"
},
{
"name": "Portfolio",
"dataType": "varchar(50)"
},
{
"name": "Rent",
"dataType": "int"
}
],
"rows": [
{
"rowName": "1",
"type": "Detail",
"data": [
11,
2019,
"West Group",
10
]
},
{
"rowName": "2",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "3",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "Totals: ",
"type": "Total",
"data": [
null,
null,
null,
30
]
}
]
}
In order to get at the data in the 'data' array I currently have a 2 step process in T-SQL where I create a temp table, and insert the row key/values from '$.Rows' there. Then I can then select the individual columns for each row
CREATE TABLE #TempData
(
Id INT,
JsonData VARCHAR(MAX)
)
DECLARE #json VARCHAR(MAX);
DECLARE #LineageKey INT;
SET #json = (SELECT JsonString FROM Stage.Report);
SET #LineageKey = (SELECT LineageKey FROM Stage.Report);
INSERT INTO #TempData(Id, JsonData)
(SELECT [key], value FROM OPENJSON(#json, '$.rows'))
MERGE [dbo].[DestinationTable] TARGET
USING
(
SELECT
JSON_VALUE(JsonData, '$.data[0]') AS FiscalMonth,
JSON_VALUE(JsonData, '$.data[1]') AS FiscalYear,
JSON_VALUE(JsonData, '$.data[2]') AS Portfolio,
JSON_VALUE(JsonData, '$.data[3]') AS Rent
FROM #TempData
WHERE JSON_VALUE(JsonData, '$.data[0]') is not null
) AS SOURCE
...
etc., etc.
This works, but I want to know if there is a way to directly select the data values without the intermediate step of putting it into the temp table. The documentation and examples I've read seem to all require that the data have a name associated with it in order to access it. When I try and access the data directly at a position by index I just get Null.
I hope I understand your question correctly. If you know the columns names you need one OPENJSON() call with explicit schema, but if you want to read the JSON structure from $.columns, you need a dynamic statement.
JSON:
DECLARE #json nvarchar(max) = N'{
"report": "Property ETL",
"id": 2648,
"columns": [
{
"name": "Fiscal Month",
"dataType": "int"
},
{
"name": "Fiscal Year",
"dataType": "int"
},
{
"name": "Portfolio",
"dataType": "varchar(50)"
},
{
"name": "Rent",
"dataType": "int"
}
],
"rows": [
{
"rowName": "1",
"type": "Detail",
"data": [
11,
2019,
"West Group",
10
]
},
{
"rowName": "2",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "3",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "Totals: ",
"type": "Total",
"data": [
null,
null,
null,
30
]
}
]
}'
Statement for fixed structure:
SELECT *
FROM OPENJSON(#json, '$.rows') WITH (
[Fiscal Month] int '$.data[0]',
[Fiscal Year] int '$.data[1]',
[Portfolio] varchar(50) '$.data[2]',
[Rent] int '$.data[3]'
)
Dynamic statement:
DECLARE #stm nvarchar(max) = N''
SELECT #stm = CONCAT(
#stm,
N',',
QUOTENAME(j2.name),
N' ',
j2.dataType,
N' ''$.data[',
j1.[key],
N']'''
)
FROM OPENJSON(#json, '$.columns') j1
CROSS APPLY OPENJSON(j1.value) WITH (
name varchar(50) '$.name',
dataType varchar(50) '$.dataType'
) j2
SELECT #stm = CONCAT(
N'SELECT * FROM OPENJSON(#json, ''$.rows'') WITH (',
STUFF(#stm, 1, 1, N''),
N')'
)
PRINT #stm
EXEC sp_executesql #stm, N'#json nvarchar(max)', #json
Result:
--------------------------------------------
Fiscal Month Fiscal Year Portfolio Rent
--------------------------------------------
11 2019 West Group 10
11 2019 East Group 10
11 2019 East Group 10
30
Yes, it is possible without temporary table:
DECLARE #json NVARCHAR(MAX) =
N'
{
"report": "Property ETL",
"id": 2648,
"columns": [
{
"name": "Fiscal Month",
"dataType": "int"
},
{
"name": "Fiscal Year",
"dataType": "int"
},
{
"name": "Portfolio",
"dataType": "varchar(50)"
},
{
"name": "Rent",
"dataType": "int"
}
],
"rows": [
{
"rowName": "1",
"type": "Detail",
"data": [
11,
2019,
"West Group",
10
]
},
{
"rowName": "2",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "3",
"type": "Detail",
"data": [
11,
2019,
"East Group",
10
]
},
{
"rowName": "Totals: ",
"type": "Total",
"data": [
null,
null,
null,
30
]
}
]
}
}';
And query:
SELECT s.value,
rowName = JSON_VALUE(s.value, '$.rowName'),
[type] = JSON_VALUE(s.value, '$.type'),
s2.[key],
s2.value
FROM OPENJSON(JSON_QUERY(#json, '$.rows')) s
CROSS APPLY OPENJSON(JSON_QUERY(s.value, '$.data')) s2;
db<>fiddle demo
Or as a single row per detail:
SELECT s.value,
rowName = JSON_VALUE(s.value, '$.rowName'),
[type] = JSON_VALUE(s.value, '$.type'),
JSON_VALUE(s.value, '$.data[0]') AS FiscalMonth,
JSON_VALUE(s.value, '$.data[1]') AS FiscalYear,
JSON_VALUE(s.value, '$.data[2]') AS Portfolio,
JSON_VALUE(s.value, '$.data[3]') AS Rent
FROM OPENJSON(JSON_QUERY(#json, '$.rows')) s;
db<>fiddle demo 2
I have just realised on my AWS Aurora postgres cluster having functions with temp_tables are not friendly with read replicas. I need to do a re-write (using CTEs) - anyway.... How do I take a json object with arrays nested and flatten them to a table like so:
{
"data": [
{
"groupName": "TeamA",
"groupCode": "12",
"subGroupCodes": [
"11"
]
},
{
"groupName": "TeamB",
"groupCode": "13",
"subGroupCodes": [
"15", "22"
]
}
]
}
I would like the output table to be:
groupName groupCode subGroupCodes
TeamA 12 11
TeamB 13 15
TeamB 13 22
I know I can get most of the way there with:
SELECT j."groupCode" as int, j."groupName" as pupilgroup_name
FROM json_to_recordset(p_in_filters->'data') j ("groupName" varchar(50), "groupCode" int)
But I just need to get the subGroupCodes as well but unpacking the array and joining to the correct parent groupCodes.
You need to first unnest the array, and then another unnest to get the subgroup codes:
with data (j) as (
values ('{
"data": [
{
"groupName": "TeamA",
"groupCode": "12",
"subGroupCodes": [
"11"
]
},
{
"groupName": "TeamB",
"groupCode": "13",
"subGroupCodes": [
"15", "22"
]
}
]
}'::jsonb)
)
select e ->> 'groupName' as group_name,
e ->> 'groupCode' as code,
sg.*
from data d
cross join lateral jsonb_array_elements(d.j -> 'data') as e(g)
cross join lateral jsonb_array_elements_text(g -> 'subGroupCodes') as sg(subgroup_code)
I was trying to load this json data in hive
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
using DDL commands
ADD JAR /home/cloudera/Downloads/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE format.json_serde (
`id` string,
`type` string,
`name` string,
`ppu` float,
batters` struct < `batter`:array < struct <`bid`:string, `btype`:string >>>,
`topping`:array < struct<`tid`:int, `ttype`:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
is throwing me error
FAILED: ParseException line 7:11 cannot recognize input near ':' 'array' '<' in column type </b>
You got typos
ttype`:string should be ttype:string
battersstruct should be batters struct
topping:array should be topping array
JSON SerDe mapping is done by name.
Your structs fields names should match the actual names, e.g. id and not bid or tid, otherwise you'll get NULL values for these fields.
There is already a JSON SerDe whicg is part of the Hive installation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
create external table json_serde
(
id string
,type string
,name string
,ppu float
,batters struct<batter:array<struct<id:string,type:string>>>
,topping array<struct<id:string,type:string>>
)
row format serde
'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from json_serde
;
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | type | name | ppu | batters | topping |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0001 | donut | Cake | 0.550000011920929 | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil'sFood"}]} | [{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}] |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
It worked when i removed the semicolons near topping. Thanks
CREATE EXTERNAL TABLE format.json_serde (
id string,
type string,
name string,
ppu float,
batters struct<batter:array<
struct<bid:string, btype:string >>>,
topping array< struct<tid:string, ttype:string>>
)