Create a composite object from a complex json object using jq - json

I have complex configuration file in JSON:
{
"config": {
...,
"extra": {
...
"auth_namespace.com": {
...
"name": "some_name",
"id": 1,
...
}
},
...,
"endpoints": [
{ ...,
"extra": {
"namespace_1.com": {...},
"namespace_auth.com": { "scope": "scope1" }
}},
{ ...
# object without "extra" property
...
},
...,
{ ...
"extra": {
"namespace_1.com": {...},
"namespace_auth.com": { "scope": "scope2" }
}},
{ ...
"extra": {
# scopes may repeat
"namespace_auth.com": { "scope": "scope2" }
}}
]
}
}
And I want to get the output object with properties "name", "id", "scopes". Where "scopes" is an array of unique values.
Something like this:
{
"name": "some_name",
"id": 1,
"scopes": ["scope1", "scope2" ... "scopeN"]
}
I can get these properties separately. But I don't know how to combine them together.
[
.config |
(
.extra["auth_namespace.com"] |
select(.name) |
{name, id}
) as $name_id |
.endpoints[] |
.extra["namespace_auth.com"].scope |
select(.)
] | unique | {scopes: .}

Perhaps the following is closer to what you're looking for:
.config
| (.extra."auth_namespace.com" | {id, name})
+ {scopes: .endpoints
| map( select(has("extra"))
| .extra."namespace_auth.com"
| select(has("scope"))
| .scope )
| unique }

Well, I found a solution. It's ugly, but it works.
Would be grateful if someone could write a more elegant version.
.config
| (
.endpoints
| map(.extra["namespace_auth.com"] | select(.scope) | .[])
| unique
) as $s
| .extra["auth_namespace.com"] | select(.name)
| {name, id, scopes: $s}

Related

Parse nested Json to splunk query which has string

I have a multiple result for a macAddress which contains the device details.
This is the sample data
"data": {
"a1:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy-J7",
"mac": "a1:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
},
"a0:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy",
"mac": "a0:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
}
}
}
How can I query in splunk for all the kind of above sample results to get the advancedDeviceId.model and advancedDeviceId.id in tabular format?
I think this will do what you want
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address
Here is a "run anywhere" example that has two events each with two addresses:
| makeresults
| eval _raw="{\"data\":{\"a1:b2:c3:d4:11:21\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Galaxy-J7\",\"mac\":\"a1:b2:c3:d4:11:21\",\"lastSeen\":12,\"model\":\"Android Phoe\",\"id\":1}},\"a0:b2:c3:d4:11:22\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"iPhone 6\",\"mac\":\"a0:b2:c3:d4:11:22\",\"lastSeen\":12,\"model\":\"Apple Phoe\",\"id\":2}}}}"
| append [
| makeresults
| eval _raw="{\"data\":{\"b1:b2:c3:d4:11:23\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Nokia\",\"mac\":\"b1:b2:c3:d4:11:23\",\"lastSeen\":12,\"model\":\"Symbian Phoe\",\"id\":3}},\"b0:b2:c3:d4:11:24\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Windows\",\"mac\":\"b0:b2:c3:d4:11:24\",\"lastSeen\":12,\"model\":\"Windows Phoe\",\"id\":4}}}}"
]
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address

How to spare & group objects according key-value

I'm new on this, those are my first steps. I guess I've started with a not simple case.
Let's see:
I have objects, with an ID (name) and a resource group (rgs). Each object may be part of several groups. And what a do need is to get the intersections of the groups.
It is important to say that the object may part of several groups, which are parent-child groups, and I just need to get the parent group. It is easy to identify the parenthoods as they share prefixes.
e.g. Group PROM_FD_ARCNA contains the child groups PROM_FD_ARCNA_TGM and PROM_FD_ARCNA_TGM_TGA.
And the child groups contains the objects itself. But, as long as I can get the information from object, it is over.
The parent groups are PROM_FD_ARCNA, PROM_JOB_ICMP and PROM_JOB_WIN. That is to say, I need to get those objects which belong to the intersections of those groups.
The JSON file which looks like:
[
{
"id_ci": "487006",
"name": "LABTNSARWID625",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGM, PROM_FD_ARCNA_TGM_TGA"
},
{
"id_ci": "5706",
"name": "HCCQ2001",
"id_ci_class": "host",
"rgs": "PROM_JOB_ICMP"
},
{
"id_ci": "9106",
"name": "HCC02155",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGA, PROM_JOB_ICMP"
},
{
"id_ci": "2306",
"name": "VM00006",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGA, PROM_JOB_WIN, PROM_JOB_WIN_TGA"
}
]
If my explanation was not good, I need to get a JSON like this:
PROM_FD_ARCNA, PROM_JOB_ICMP
{
"HCC02155"
}
PROM_FD_ARCNA, PROM_JOB_WIN
{
"VM00006"
}
As those are the intersections.
So far, I tried this:
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_FD_ARCNA")) | .rgs = "PROM_FD_ARCNA"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_JOB_WIN")) | .rgs = "PROM_JOB_WIN"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_JOB_ICMP")) | .rgs = "PROM_JOB_ICMP"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
And the result is:
[
{
"rgs": "PROM_FD_ARCNA",
"Hosts": [
"LABTNSARWID625",
"HCC02155",
"VM00006"
]
}
]
[
{
"rgs": "PROM_JOB_WIN",
"Hosts": [
"VM00006"
]
}
]
[
{
"rgs": "PROM_JOB_ICMP",
"Hosts": [
"HCCQ2001",
"HCC02155"
]
}
]
Of course, the full JSON is quite long and I need to process this as lightweight as possible. Don't know if I've started well or bad.
def to_set(s): reduce s as $_ ( {}; .[ $_ ] = true );
[ "PROM_FD_ARCNA", "PROM_JOB_ICMP", "PROM_JOB_WIN" ] as $roots |
map(
{
name,
has_rg: to_set( .rgs | split( ", " )[] )
}
) as $hosts |
[
range( 0; $roots | length ) as $i | $roots[ $i ] as $g1 |
range( $i+1; $roots | length ) as $j | $roots[ $j ] as $g2 |
{
root_rgs: [ $g1, $g2 ],
names: [
$hosts[] |
select( .has_rg[ $g1 ] and .has_rg[ $g2 ] ) |
.name
]
} |
select( .names | length > 0 )
]
produces
[
{
"root_rgs": [
"PROM_FD_ARCNA",
"PROM_JOB_ICMP"
],
"names": [
"HCC02155"
]
},
{
"root_rgs": [
"PROM_FD_ARCNA",
"PROM_JOB_WIN"
],
"names": [
"VM00006"
]
}
]
Demo on jqplay

jq - Get a higher level key after a selection

Given a JSON like the following:
{
"data": [{
"id": "1a2b3c",
"info": {
"a": {
"number": 0
},
"b": {
"number": 1
},
"c": {
"number": 2
}
}
}]
}
I want to select on a number that is greater than or equal to 2 and for that selection I want to return the values of id and number. I did this like so:
$ jq -r '.data[] | .id as $ID | .info[] | select(.number >= 2) | [$ID, .number]' in.json
[
"1a2b3c",
2
]
Now I would also like to return a higher level key for my selection, in my case I need to return c. How can I accomplish this?
Assuming you want the string "c" instead of 2 in the output, this will work:
$ jq '.data[] | .id as $ID | .info | to_entries[] | select(.value.number >= 2) | [$ID, .key]' input.json
[
"1a2b3c",
"c"
]

Loading JSON file with repeating elements into hive table

Given this simple JSON file:
{
"EVT": {
"EVT_ID": "12345",
"LINES": {
"LINE": {
"LINE_NUM" : 1,
"AMT" : 100,
"EVT_DT" : "2018-01-01"
},
"LINE": {
"LINE_NUM" : 2,
"AMT" : 150,
"EVT_DT" : "2018-01-02"
}
}
}
}
We need to load that into a hive table. The ultimate goal is to flatten the json, something like this:
+--------+----------+-----+------------+
| EVT_ID | Line_Num | Amt | Evt_Dt |
+--------+----------+-----+------------+
| 12345 | 1 | 100 | 2018-01-01 |
| 12345 | 2 | 150 | 2018-01-02 |
+--------+----------+-----+------------+
Here's my current DDL for the table:
create table foo.bar (
`EVT` struct<
`EVT_ID`:string,
`LINES`:struct<
LINE: struct<`LINE_NUM`: int,`AMT`:int,`EVT_DT`:string>
>
>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
It seems like the second "line" is overwriting the first. A simple select * from the table returns;
{"evt_id":"12345","lines":{"line":{"line_num":2,"amt":150,"evt_dt":"2018-01-02"}}}
What am I doing wrong?
The JSON and table definition are wrong. "Repeating elements" is an Array. LINES should be array<struct>, not struct<struct> (note square brackets):
{
"EVT": {
"EVT_ID": "12345",
"LINES": [
{
"LINE_NUM" : 1,
"AMT" : 100,
"EVT_DT" : "2018-01-01"
},
{
"LINE_NUM" : 2,
"AMT" : 150,
"EVT_DT" : "2018-01-02"
}
]
}
}
And you do not need this "LINE": also, because it is just an array element

Unnesting nested JSON structures in Apache Drill

I have the following JSON (roughly) and I'd like to extract the information from the header and defects fields separately:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890",
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
I have tried to access the individual elements with file.header.timeStamp etc but that returns null. I have tried using flatten(file) but that gives me
Cannot cast org.apache.drill.exec.vector.complex.MapVector to org.apache.drill.exec.vector.complex.RepeatedValueVector
I've looked into kvgen() but don't see how that fits in my case. I tried kvgen(file.header) but that gets me
kvgen function only supports Simple maps as input
which is what I had expected anyway.
Does anyone know how I can get header and defects, so I can process the information contained in them. Ideally, I'd just select the information from header because it contains no arrays or maps, so I can take individual records as they are. For defects I'd simply use FLATTEN(defectParts) to obtain a table of the defective parts.
Any help would be appreciated.
What version of Drill are you using ? I tried querying the following file on latest master (1.7.0-SNAPHOT):
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
And the following queries are working fine:
1.
select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno |
+-----------+
| 3456 |
| 3456 |
+-----------+
2 rows selected (0.098 seconds)
2.
select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
| defects |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
+---------------------------------------------------------------------------------------+
3.
select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno | defectParts |
+-----------+----------------------+
| 3456 | ["003","006","008"] |
| 3456 | ["003","006","008"] |
+-----------+----------------------+
2 rows selected (0.126 seconds)
PS: This should've been a comment but I don't have enough rep yet!
I don't have experience with Apache Drill, but checked the manual. Isn't this what you're looking for?
https://drill.apache.org/docs/selecting-multiple-columns-within-nested-data/
https://drill.apache.org/docs/selecting-nested-data-for-a-column/