T-SQL OpenJson Nested Array - json

Iam Struggling with following JSOn Structure
Declare #Json Nvarchar(max)
Set #Json = '
{
"entities": [
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1851"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "01_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
},
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1852"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "02_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
}
],
"TotalResults": 2
}'
My Output should look like this:
test-id|test-name|Type|Children-count
1851 |01_DUMMY |run |0
1852 |02_DUMMY |run |0
I tried to use the Examples posted here but none is matching my Needs.
My closest apporach was this T-SQL Syntax
Select
*
From OPENJSON (#JSON,N'$.entities') E
CROSS APPLY OPENJSON (E.[value]) F
CROSS APPLY OPENJSON (F.[value],'$') V where F.type = 4
My next idea was to use this SQL CODE to open the next nested Array but iam always getting an error msg(
Lookup Error - SQL Server Database Error: Incorrect syntax near the
keyword 'CROSS'.
)
Select
*
From OPENJSON (#JSON,N'$.entities') E
CROSS APPLY OPENJSON (E.[value]) F
CROSS APPLY OPENJSON (F.[value]) V where F.type = 4
CROSS APPLY OPENJSON (V.[value]) N
Iam not sure how to get Closer to my needed Output.
To be honest I just started with T-SQL and never worked before with JSON Files.
Regards Johann

This is rather deeply nested. I think, you've got the right idea to dive deeper and deeper using a serie of OPENJSON. Try it like this to get your values:
Declare #Json Nvarchar(max)
Set #Json = '
{
"entities": [
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1851"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "01_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
},
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1852"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "02_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
}
],
"TotalResults": 2
}';
--This is the query
WITH ReadJson AS
(
SELECT A.TotalResults
,C.[Type]
,C.[children-count]
,D.[Name]
,E.*
FROM OPENJSON(#Json)
WITH(TotalResults INT, entities NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.entities) B
CROSS APPLY OPENJSON(B.[value])
WITH(Fields NVARCHAR(MAX) AS JSON,[Type] VARCHAR(100),[children-count] INT) C
CROSS APPLY OPENJSON(C.Fields)
WITH([Name] VARCHAR(100),[values] NVARCHAR(MAX) AS JSON) D
CROSS APPLY OPENJSON(D.[values])
WITH([value] VARCHAR(100)) E
)
SELECT * FROM ReadJson;
The result
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-id | 1851 |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-name | 01_DUMMY |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-id | 1852 |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-name | 02_DUMMY |
+---+-----+---+-----------+----------+
Do you think you can manage the rest?

Related

Azure Synapse Analytics Json Flatten

I'm new to Azure Synapse and currently have the following problem:
I get a json that looks like the following:
{
"2022-02-01":[
{
"shiftId": ,
"employeeId": ,
"duration": ""
},
{
"shiftId": ,
"employeeId": ,
"duration": ""
}
],
"2022-02-02": [
{
"shiftId": ,
"employeeId": ,
"duration": ""
}
],
"2022-02-03": [
{
"shiftId": ,
"employeeId": ,
"duration": ""
},
{
"shiftId": ,
"employeeId": ,
"duration": ""
}
],
"2022-02-4": []
}
Now I would like to convert this so that I get it in a view. I have already tried with a dataflow as array of documents but I get an error.
"Malformed records are detected in schema inference. Parse Mode: FAILFAST"
I want something like:
date shiftId employeeId duration
___________|_________|____________|_________
2022-02-01 | 1234 | 345345 | 420
2022-02-01 | 2345 | 345345 | 124
2022-02-02 | 5345 | 123567 | 424
2022-02-03 | 5675 | 987542 | 123
2022-02-03 | 9456 | 234466 | 754
Azure Synapse Analytics, dedicated SQL pools are actually very capable with JSON, supporting OPENJSON and JSON_VALUE, so you could just use a Stored Procedure with the JSON as a parameter. A simple exmaple:
SELECT
k.[key] AS [shiftDate],
JSON_VALUE( d.[value], '$.shiftId' ) shiftId,
JSON_VALUE( d.[value], '$.employeeId' ) employeeId,
JSON_VALUE( d.[value], '$.duration' ) duration
FROM OPENJSON( #json, '$' ) k
CROSS APPLY OPENJSON( k.value, '$' ) d;
The full code:
DECLARE #json NVARCHAR(MAX) = '{
"2022-02-01": [
{
"shiftId": 1234,
"employeeId": 345345,
"duration": 420
},
{
"shiftId": 2345,
"employeeId": 345345,
"duration": 124
}
],
"2022-02-02": [
{
"shiftId": 5345,
"employeeId": 123567,
"duration": 424
}
],
"2022-02-03": [
{
"shiftId": 5675,
"employeeId": 987542,
"duration": 123
},
{
"shiftId": 9456,
"employeeId": 234466,
"duration": 754
}
]
}'
SELECT
k.[key] AS [shiftDate],
JSON_VALUE( d.[value], '$.shiftId' ) shiftId,
JSON_VALUE( d.[value], '$.employeeId' ) employeeId,
JSON_VALUE( d.[value], '$.duration' ) duration
FROM OPENJSON( #json, '$' ) k
CROSS APPLY OPENJSON( k.value, '$' ) d;
My results:
You could use a Synapse Notebook or Mapping Data Flows if you wanted something more dynamic.

TSQL JSON_QUERY can you use a filter in the JSON Path

I have a table with a column that holds valid JSON, heres an example of the JSON structure:
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}
I need to return the values from MultiSelected collection depending on the value of ListID.
I'm using the following JSON Path to retun value
$.Requirements."$values"[?(#.ListId=='956cf9c5-24ab-47d9-8082-940118f2f1a3')].MultiSelected."$values"
This worked fine in a JSON Expression tester.
But when I try to use it to query the table I get the following error:
JSON path is not properly formatted. Unexpected character '?' is found at position 25.
The query I'm using is as follows:
SELECT ID AS PayloadID,
Items.Item AS ItemsValues
FROM dbo.Payload
CROSS APPLY ( SELECT *
FROM OPENJSON( JSON_QUERY( Payload, '$.Requirements."$values"[?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')].MultiSelected."$values"' ) )
WITH ( Item nvarchar(200) '$' ) ) AS Items
WHERE ID = 3
I've tried replacing
?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')
with 0 and it works fine on SQL Server.
My question is, is filter syntax ?(...) supported in JSON_QUERY or is there something else I should be doing?
The database is running on Azure, were the database compatability level is set to SQL Server 2017 (140).
Thanks for your help in advance.
Andy
I would use openjson twice in stead
drop table if exists #payload
create table #payload(ID int,Payload nvarchar(max))
insert into #payload VALUES
(3,N'
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}'
)
SELECT ID AS PayloadID,
Items.[value]
FROM #Payload a
CROSS APPLY OPENJSON( Payload, '$.Requirements."$values"' ) with ( ListId varchar(50),MultiSelected nvarchar(max) as json) b
CROSS APPLY OPENJSON( MultiSelected,'$."$values"' ) Items
where
a.id=3
AND b.listid='956cf9c5-24ab-47d9-8082-940118f2f1a3'
Reply:
+-----------+--------+
| PayloadID | value |
+-----------+--------+
| 3 | Value1 |
| 3 | Value2 |
| 3 | Value3 |
+-----------+--------+

How to join nested JSON indices to multiple rows in SQL by primary key

I am trying to update several rows in SQL with JSON.
I'd like to match a primary key on a table row to an index nested in an array of JS objects.
Sample data:
let json = [{
"header": object_data,
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": object_data,
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": object_data,
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": object_data,
"text": "some_text"
}}]
}]
Sample table:
i | json | item_id
---+---------------------------+---------
0 | entire_object_at_index_0 | item_id
4 | entire_object_at_index_4 | item_id4
17 | entire_object_at_index_17 | item_id17
entire_object_at_index, meaning appending the item data to the header to create a new object for each row.
"header" "some_data",
"items": [{
"id": {
"i": 0,
"name": "item_id1"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}]
SQL:
update someTable set
json = json_value(#jsons, '$') -- not sure how to join on index here
item_id = json_value(#jsons, '$.items[?].id.name' -- not sure how to select by index here
where [i] = json_query(#jsons, '$.items.id.i')
The requirement to repeat the other properties complicates this a bit, because we need to build a new object explicitly. Even so it's not too hard:
update someTable
set
[json] = (
select (
select
"header" = json_query(#json, '$.header'),
"items" = json_query(N'[' + items.item + N']')
for json path, without_array_wrapper
)
),
item_id = items.item_id
from openjson(#json, '$.items') with (
item nvarchar(max) '$' as json,
item_id varchar(50) '$.id.name',
i int '$.id.i'
) items
join someTable on [someTable].i = items.i
Here I'm assuming the #json has already been unwrapped from its array, as your query seems to assume. If it's not, substitute $.[0] for $ in the outer query.
Update:
It's an attempt to improve my answer (I missed the header part of the JSON content in the original answer). Of course, the #JeroenMostert's answer is an excellent solution, so this is just another possible approach. Note, that if header part of JSON content is scalar value, you should use JSON_VALUE().
Table and JSON:
-- Table
CREATE TABLE #Data (
i int,
[json] nvarchar(max),
item_id nvarchar(100)
)
INSERT INTO #Data
(i, [json], [item_id])
VALUES
(0 , N'entire_object_at_index_0', N'item_id'),
(4 , N'entire_object_at_index_4', N'item_id4'),
(17, N'entire_object_at_index_17', N'item_id17')
-- JSON
DECLARE #json nvarchar(max) = N'[{
"header": {"key": "some_data"},
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": "some_data",
"text": "some_text"
}}]
}]'
Statement:
UPDATE #Data
SET #Data.Json = j.Json
FROM #Data
CROSS APPLY (
SELECT
JSON_QUERY(#json, '$[0].header') AS header,
JSON_QUERY(j.[value], '$') AS items
FROM OPENJSON(#json, '$[0].items') j
WHERE JSON_VALUE(j.[value], '$.id.i') = #Data.[i]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) j ([Json])
Original answer:
One possible approach is to use OPENJSON and appropriate join:
Table and JSON:
-- Table
CREATE TABLE #Data (
i int,
[json] nvarchar(max),
item_id nvarchar(100)
)
INSERT INTO #Data
(i, [json], [item_id])
VALUES
(0 , N'entire_object_at_index_0', N'item_id'),
(4 , N'entire_object_at_index_4', N'item_id4'),
(17, N'entire_object_at_index_17', N'item_id17')
-- JSON
DECLARE #json nvarchar(max) = N'[{
"header": "some_data",
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": "some_data",
"text": "some_text"
}}]
}]'
Statement:
UPDATE #Data
SET [json] = j.[value]
FROM #Data
LEFT JOIN (
SELECT
[value],
JSON_VALUE([value], '$.id.i') AS [i]
FROM OPENJSON(#json, '$[0].items')
) j ON (#Data.[i] = j.[i])

How do you OPENJSON on Arrays of Arrays

I have a JSON structure where there are Sections, consisting of multiple Renders, which consist of multiple Fields.
How do I do 1 OPENJSON call on the lowest level (Fields) to get all information from there?
Here is an example JSON:
Declare #layout NVARCHAR(MAX) = N'
{
"Sections": [
{
"SectionName":"Section1",
"SectionOrder":1,
"Renders":[
{
"RenderName":"Render1",
"RenderOrder":1,
"Fields":[
{
"FieldName":"Field1",
"FieldData":"Data1"
},
{
"FieldName":"Field2",
"FieldData":"Data2"
}
]
},
{
"RenderName":"Render2",
"RenderOrder":2,
"Fields":[
{
"FieldName":"Field1",
"FieldData":"Data1"
},
{
"FieldName":"Field2",
"FieldData":"Data2"
}
]
}
]
},
{
"SectionName":"Section2",
"SectionOrder":2,
"Renders":[
{
"RenderName":"Render1",
"RenderOrder":1,
"Fields":[
{
"FieldName":"Field1",
"FieldData":"Data1"
}
]
},
{
"RenderName":"Render2",
"RenderOrder":2,
"Fields":[
{
"FieldName":"Field1",
"FieldData":"Data1"
},
{
"FieldName":"Field2",
"FieldData":"Data2"
}
]
}
]
}
]
}
'
Here is some example of code of a nested OPENJSON call, which works, but is very complex and can't be generated dynamically, how do I make it one level call?
SELECT SectionName, SectionOrder, RenderName, RenderOrder, FieldName, FieldData FROM (
SELECT SectionName, SectionOrder, RenderName, RenderOrder, Fields FROM (
select SectionName, SectionOrder, Renders
from OPENJSON(#layout,'$.Sections')
WITH (
SectionName nvarchar(MAX) '$.SectionName',
SectionOrder nvarchar(MAX) '$.SectionOrder',
Renders nvarchar(MAX) '$.Renders' as JSON
)
) as Sections
CROSS APPLY OPENJSON(Renders,'$')
WITH (
RenderName nvarchar(MAX) '$.RenderName',
RenderOrder nvarchar(MAX) '$.RenderOrder',
Fields nvarchar(MAX) '$.Fields' as JSON
)
) as Renders
CROSS APPLY OPENJSON(Fields,'$')
WITH (
FieldName nvarchar(MAX) '$.FieldName',
FieldData nvarchar(MAX) '$.FieldData'
)
This is what I would like to achieve:
select FieldName, FieldData
from OPENJSON(#layout,'$.Sections.Renders.Fields')
WITH (
FieldName nvarchar(MAX) '$.Sections.Renders.Fields.FieldName',
FieldData nvarchar(MAX) '$.Sections.Renders.Fields.FieldData'
)
While you can't get away with using only a single OPENJSON, you can simplify your query a bit to make it easier to create dynamically by removing the nested subqueries:
SELECT SectionName, SectionOrder, RenderName, RenderOrder, FieldName, FieldData
FROM OPENJSON(#layout, '$.Sections')
WITH (
SectionName NVARCHAR(MAX) '$.SectionName',
SectionOrder NVARCHAR(MAX) '$.SectionOrder',
Renders NVARCHAR(MAX) '$.Renders' AS JSON
)
CROSS APPLY OPENJSON(Renders,'$')
WITH (
RenderName NVARCHAR(MAX) '$.RenderName',
RenderOrder NVARCHAR(MAX) '$.RenderOrder',
Fields NVARCHAR(MAX) '$.Fields' AS JSON
)
CROSS APPLY OPENJSON(Fields,'$')
WITH (
FieldName NVARCHAR(MAX) '$.FieldName',
FieldData NVARCHAR(MAX) '$.FieldData'
)
EDIT:
If you have a primitive array, you can access the data using the value property after you expose the nested array as a JSON field. Using the JSON from the comment below, you can do this to get the values from a primitive array:
DECLARE #layout NVARCHAR(MAX) = N'{ "id":123, "locales":["en", "no", "se"] }'
SELECT
a.id
, [Locale] = b.value
FROM OPENJSON(#layout, '$')
WITH (
id INT '$.id',
locales NVARCHAR(MAX) '$.locales' AS JSON
) a
CROSS APPLY OPENJSON(a.locales,'$') b
This can be done by CROSS Applying the JSON child node with the parent node and using the JSON_Value() function, like shown below:
DECLARE #json NVARCHAR(1000)
SELECT #json =
N'{
"OrderHeader": [
{
"OrderID": 100,
"CustomerID": 2000,
"OrderDetail": [
{
"ProductID": 2000,
"UnitPrice": 350
},
{
"ProductID": 3000,
"UnitPrice": 450
},
{
"ProductID": 4000,
"UnitPrice": 550
}
]
}
]
}'
SELECT
JSON_Value (c.value, '$.OrderID') as OrderID,
JSON_Value (c.value, '$.CustomerID') as CustomerID,
JSON_Value (p.value, '$.ProductID') as ProductID,
JSON_Value (p.value, '$.UnitPrice') as UnitPrice
FROM OPENJSON (#json, '$.OrderHeader') as c
CROSS APPLY OPENJSON (c.value, '$.OrderDetail') as p
Result
-------
OrderID CustomerID ProductID UnitPrice
100 2000 2000 350
100 2000 3000 450
100 2000 4000 550
select
json_query(questionsList.value, '$.answers'),
json_value(answersList2.value, '$.text[0].text') as text,
json_value(answersList2.value, '$.value') as code
from RiskAnalysisConfig c
outer apply OpenJson(c.Configuration, '$.questions') as questionsList
outer apply OpenJson(questionsList.value, '$.answers') as answersList2
where questionsList.value like '%"CATEGORIES"%'
Sample of List of answers for each question into list
questionsList list level 1
answersList2 list level 2
Json sample
{
"code": "Code x",
"questions": [
{},
{},
{
"code": "CATEGORIES",
"text": [
{
"text": "How old years are you?",
"available": true
}
],
"answers": [
{
"text": [
{
"text": "more than 18",
"available": true
}
],
"text": [
{
"text": "less than 18",
"available": true
}
]
}
]
}
]
}
Partial result
| text | code |
| --- | --- |
| How old years are you? | more than 18 |
| How old years are you? | lass than 18 |
I have the JSON code and inserted into the table called MstJson, the column name which contains JSON code is JSON data.
JSON Code :
[
{
"id":100,
"type":"donut",
"name":"Cake",
"ppu":0.55,
"batters":{
"batter":[
{
"id":"1001",
"type":"Regular"
},
{
"id":"1002",
"type":"Chocolate"
},
{
"id":"1003",
"type":"Blueberry"
},
{
"id":"1004",
"type":"Havmor",
"BusinessName":"HussainM"
},
"id",
"type"
]
},
"topping":[
{
"id":"5001",
"type":"None"
},
{
"id":"5002",
"type":"Glazed"
},
{
"id":"5005",
"type":"Sugar"
},
{
"id":"5007",
"type":"Powdered Sugar"
},
{
"id":"5006",
"type":"Chocolate with Sprinkles"
},
{
"id":"5003",
"type":"Chocolate"
},
{
"id":"5004",
"type":"Maple"
}
]
},
{
"id":"0002",
"type":"donut",
"name":"Raised",
"ppu":0.55,
"batters":{
"batter":[
{
"id":"1001",
"type":"Regular"
}
]
},
"topping":[
{
"id":"5001",
"type":"None"
},
{
"id":"5002",
"type":"Glazed"
},
{
"id":"5005",
"type":"Sugar"
},
{
"id":"5003",
"type":"Chocolate"
},
{
"id":"5004",
"type":"Maple"
}
]
},
{
"id":"0003",
"type":"donut",
"name":"Old Fashioned",
"ppu":0.55,
"batters":{
"batter":[
{
"id":"1001",
"type":"Regular"
},
{
"id":"1002",
"type":"Chocolate"
}
]
},
"topping":[
{
"id":"5001",
"type":"None"
},
{
"id":"5002",
"type":"Glazed"
},
{
"id":"5003",
"type":"Chocolate"
},
{
"id":"5004",
"type":"Maple"
}
]
}
]
Sql Code for OpenJson using CrossApply (Nested Array):
SELECT
d.ID
,a.ID
,a.Type
,a.Name
,a.PPU
,c.Batterid
,c.Battertype
FROM MstJson d
CROSS APPLY
OPENJSON(Jsondata)
WITH
(
ID NVARCHAR(MAX) '$.id'
,Type NVARCHAR(MAX) '$.type'
,Name NVARCHAR(MAX) '$.name'
,PPU DECIMAL(18, 2) '$.ppu'
,Batters NVARCHAR(MAX) '$.batters' AS JSON
) AS a
CROSS APPLY
OPENJSON(Batters, '$')
WITH
(
Batter NVARCHAR(MAX) '$.batter' AS JSON
) AS b
CROSS APPLY
OPENJSON(Batter, '$')
WITH
(
Batterid INT '$.id'
,Battertype NVARCHAR(MAX) '$.type'
) AS c
WHERE d.ID = 12; ---above Json Code is on Id 12 of Table MstJson

Unnesting nested JSON structures in Apache Drill

I have the following JSON (roughly) and I'd like to extract the information from the header and defects fields separately:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890",
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
I have tried to access the individual elements with file.header.timeStamp etc but that returns null. I have tried using flatten(file) but that gives me
Cannot cast org.apache.drill.exec.vector.complex.MapVector to org.apache.drill.exec.vector.complex.RepeatedValueVector
I've looked into kvgen() but don't see how that fits in my case. I tried kvgen(file.header) but that gets me
kvgen function only supports Simple maps as input
which is what I had expected anyway.
Does anyone know how I can get header and defects, so I can process the information contained in them. Ideally, I'd just select the information from header because it contains no arrays or maps, so I can take individual records as they are. For defects I'd simply use FLATTEN(defectParts) to obtain a table of the defective parts.
Any help would be appreciated.
What version of Drill are you using ? I tried querying the following file on latest master (1.7.0-SNAPHOT):
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
And the following queries are working fine:
1.
select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno |
+-----------+
| 3456 |
| 3456 |
+-----------+
2 rows selected (0.098 seconds)
2.
select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
| defects |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
+---------------------------------------------------------------------------------------+
3.
select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno | defectParts |
+-----------+----------------------+
| 3456 | ["003","006","008"] |
| 3456 | ["003","006","008"] |
+-----------+----------------------+
2 rows selected (0.126 seconds)
PS: This should've been a comment but I don't have enough rep yet!
I don't have experience with Apache Drill, but checked the manual. Isn't this what you're looking for?
https://drill.apache.org/docs/selecting-multiple-columns-within-nested-data/
https://drill.apache.org/docs/selecting-nested-data-for-a-column/