Unnesting nested JSON structures in Apache Drill - json

I have the following JSON (roughly) and I'd like to extract the information from the header and defects fields separately:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890",
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
I have tried to access the individual elements with file.header.timeStamp etc but that returns null. I have tried using flatten(file) but that gives me
Cannot cast org.apache.drill.exec.vector.complex.MapVector to org.apache.drill.exec.vector.complex.RepeatedValueVector
I've looked into kvgen() but don't see how that fits in my case. I tried kvgen(file.header) but that gets me
kvgen function only supports Simple maps as input
which is what I had expected anyway.
Does anyone know how I can get header and defects, so I can process the information contained in them. Ideally, I'd just select the information from header because it contains no arrays or maps, so I can take individual records as they are. For defects I'd simply use FLATTEN(defectParts) to obtain a table of the defective parts.
Any help would be appreciated.

What version of Drill are you using ? I tried querying the following file on latest master (1.7.0-SNAPHOT):
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
And the following queries are working fine:
1.
select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno |
+-----------+
| 3456 |
| 3456 |
+-----------+
2 rows selected (0.098 seconds)
2.
select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
| defects |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
+---------------------------------------------------------------------------------------+
3.
select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno | defectParts |
+-----------+----------------------+
| 3456 | ["003","006","008"] |
| 3456 | ["003","006","008"] |
+-----------+----------------------+
2 rows selected (0.126 seconds)
PS: This should've been a comment but I don't have enough rep yet!

I don't have experience with Apache Drill, but checked the manual. Isn't this what you're looking for?
https://drill.apache.org/docs/selecting-multiple-columns-within-nested-data/
https://drill.apache.org/docs/selecting-nested-data-for-a-column/

Related

Parse nested Json to splunk query which has string

I have a multiple result for a macAddress which contains the device details.
This is the sample data
"data": {
"a1:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy-J7",
"mac": "a1:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
},
"a0:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy",
"mac": "a0:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
}
}
}
How can I query in splunk for all the kind of above sample results to get the advancedDeviceId.model and advancedDeviceId.id in tabular format?
I think this will do what you want
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address
Here is a "run anywhere" example that has two events each with two addresses:
| makeresults
| eval _raw="{\"data\":{\"a1:b2:c3:d4:11:21\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Galaxy-J7\",\"mac\":\"a1:b2:c3:d4:11:21\",\"lastSeen\":12,\"model\":\"Android Phoe\",\"id\":1}},\"a0:b2:c3:d4:11:22\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"iPhone 6\",\"mac\":\"a0:b2:c3:d4:11:22\",\"lastSeen\":12,\"model\":\"Apple Phoe\",\"id\":2}}}}"
| append [
| makeresults
| eval _raw="{\"data\":{\"b1:b2:c3:d4:11:23\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Nokia\",\"mac\":\"b1:b2:c3:d4:11:23\",\"lastSeen\":12,\"model\":\"Symbian Phoe\",\"id\":3}},\"b0:b2:c3:d4:11:24\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Windows\",\"mac\":\"b0:b2:c3:d4:11:24\",\"lastSeen\":12,\"model\":\"Windows Phoe\",\"id\":4}}}}"
]
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address

Create a composite object from a complex json object using jq

I have complex configuration file in JSON:
{
"config": {
...,
"extra": {
...
"auth_namespace.com": {
...
"name": "some_name",
"id": 1,
...
}
},
...,
"endpoints": [
{ ...,
"extra": {
"namespace_1.com": {...},
"namespace_auth.com": { "scope": "scope1" }
}},
{ ...
# object without "extra" property
...
},
...,
{ ...
"extra": {
"namespace_1.com": {...},
"namespace_auth.com": { "scope": "scope2" }
}},
{ ...
"extra": {
# scopes may repeat
"namespace_auth.com": { "scope": "scope2" }
}}
]
}
}
And I want to get the output object with properties "name", "id", "scopes". Where "scopes" is an array of unique values.
Something like this:
{
"name": "some_name",
"id": 1,
"scopes": ["scope1", "scope2" ... "scopeN"]
}
I can get these properties separately. But I don't know how to combine them together.
[
.config |
(
.extra["auth_namespace.com"] |
select(.name) |
{name, id}
) as $name_id |
.endpoints[] |
.extra["namespace_auth.com"].scope |
select(.)
] | unique | {scopes: .}
Perhaps the following is closer to what you're looking for:
.config
| (.extra."auth_namespace.com" | {id, name})
+ {scopes: .endpoints
| map( select(has("extra"))
| .extra."namespace_auth.com"
| select(has("scope"))
| .scope )
| unique }
Well, I found a solution. It's ugly, but it works.
Would be grateful if someone could write a more elegant version.
.config
| (
.endpoints
| map(.extra["namespace_auth.com"] | select(.scope) | .[])
| unique
) as $s
| .extra["auth_namespace.com"] | select(.name)
| {name, id, scopes: $s}

group objects by a field and sum another, then produce a CSV report

How can I create a csv from this json? I have:
[
{
"name": "John",
"cash": 5
},
{
"name": "Anna",
"cash": 4
},
{
"name": "Anna",
"cash": 3
},
{
"name": "John",
"cash": 8
}
]
I need group by name and sum the cash and send the result a .csv like:
John,13
Anna,7
Thanks!
JQ has group_by as a builtin, use that and do map(.cash) | add to sum cash values for each group.
group_by(.name)[] | [.[0].name, (map(.cash) | add)] | #csv
Online demo

Karate API framework how to match the response values with the table columns?

I have below API response sample
{
"items": [
{
"id":11,
"name": "SMITH",
"prefix": "SAM",
"code": "SSO"
},
{
"id":10,
"name": "James",
"prefix": "JAM",
"code": "BBC"
}
]
}
As per above response, my tests says that whenever I hit the API request the 11th ID would be of SMITH and 10th id would be JAMES
So what I thought to store this in a table and assert against the actual response
* table person
| id | name |
| 11 | SMITH |
| 10 | James |
| 9 | RIO |
Now how would I match one by one ? like first it parse the first ID and first name from the API response and match with the Tables first ID and tables first name
Please share any convenient way of doing it from KARATE
There are a few possible ways, here is one:
* def lookup = { 11: 'SMITH', 10: 'James' }
* def items =
"""
[
{
"id":11,
"name":"SMITH",
"prefix":"SAM",
"code":"SSO"
},
{
"id":10,
"name":"James",
"prefix":"JAM",
"code":"BBC"
}
]
"""
* match each items contains { name: "#(lookup[_$.id+''])" }
And you already know how to use table instead of JSON.
Please read the docs and other stack-overflow answers to get more ideas.

T-SQL OpenJson Nested Array

Iam Struggling with following JSOn Structure
Declare #Json Nvarchar(max)
Set #Json = '
{
"entities": [
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1851"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "01_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
},
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1852"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "02_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
}
],
"TotalResults": 2
}'
My Output should look like this:
test-id|test-name|Type|Children-count
1851 |01_DUMMY |run |0
1852 |02_DUMMY |run |0
I tried to use the Examples posted here but none is matching my Needs.
My closest apporach was this T-SQL Syntax
Select
*
From OPENJSON (#JSON,N'$.entities') E
CROSS APPLY OPENJSON (E.[value]) F
CROSS APPLY OPENJSON (F.[value],'$') V where F.type = 4
My next idea was to use this SQL CODE to open the next nested Array but iam always getting an error msg(
Lookup Error - SQL Server Database Error: Incorrect syntax near the
keyword 'CROSS'.
)
Select
*
From OPENJSON (#JSON,N'$.entities') E
CROSS APPLY OPENJSON (E.[value]) F
CROSS APPLY OPENJSON (F.[value]) V where F.type = 4
CROSS APPLY OPENJSON (V.[value]) N
Iam not sure how to get Closer to my needed Output.
To be honest I just started with T-SQL and never worked before with JSON Files.
Regards Johann
This is rather deeply nested. I think, you've got the right idea to dive deeper and deeper using a serie of OPENJSON. Try it like this to get your values:
Declare #Json Nvarchar(max)
Set #Json = '
{
"entities": [
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1851"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "01_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
},
{
"Fields": [
{
"Name": "test-id",
"values": [
{
"value": "1852"
}
]
},
{
"Name": "test-name",
"values": [
{
"value": "02_DUMMY"
}
]
}
],
"Type": "run",
"children-count": 0
}
],
"TotalResults": 2
}';
--This is the query
WITH ReadJson AS
(
SELECT A.TotalResults
,C.[Type]
,C.[children-count]
,D.[Name]
,E.*
FROM OPENJSON(#Json)
WITH(TotalResults INT, entities NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.entities) B
CROSS APPLY OPENJSON(B.[value])
WITH(Fields NVARCHAR(MAX) AS JSON,[Type] VARCHAR(100),[children-count] INT) C
CROSS APPLY OPENJSON(C.Fields)
WITH([Name] VARCHAR(100),[values] NVARCHAR(MAX) AS JSON) D
CROSS APPLY OPENJSON(D.[values])
WITH([value] VARCHAR(100)) E
)
SELECT * FROM ReadJson;
The result
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-id | 1851 |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-name | 01_DUMMY |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-id | 1852 |
+---+-----+---+-----------+----------+
| 2 | run | 0 | test-name | 02_DUMMY |
+---+-----+---+-----------+----------+
Do you think you can manage the rest?