How do I structure JSON? [closed] - json

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have this data sample that I need to put into a JSON format.
What's the best way/structure to do that? If it helps, I'll be developing an angular product selection tool for this.
Item 1: Federation Phaser
Options:
| FORM FACTOR | PRICE |
| Compact | $545 |
| Pistol Grip | $600 |
Item 2: Sith Lightsaber
Options:
| BLADE COLOR | BLADE COUNT | PRICE |
| Red | Single | $1000 |
| Red | Double | $1750 |
| Blue | Single | $1125 |
| Blue | Double | $1875 |
| Green | Single | $1250 |

JSON is formed by name/value pairs and surrounded by curly braces {}. The name/value pair are separated by commas and the values themselves can be JSON objects or arrays.
Example 1 (Simple):
{
"fruit1": "apple",
"fruit2": "pear"
}
Example 2 (more complex):
{
"fruitBasket1": { "fruit1": "apple", "fruit2": "pear"},
"fruitBasket2": { "fruit1": "grape", "fruit2": "orange"}
}
For your example, you could construct the JSON as follows with an array:
{
"item": {
"name": "Federation Phaser",
"options": [
{
"form": "compact",
"price": "$545"
},
{
"form": "Pistol Grip",
"price": "$600"
}
]
},
"item2": {
"name": "Sith Lightsaber",
"options": [
{
"bladeColor": "red",
"count": "single",
"price": "$1000"
},
{
"bladeColor": "blue",
"count": "double",
"price": "$1875"
}
]
}
}
If you want to have a variable number of "items" you could put them into an array too. For example:
{
"items": [
{
"name": "Federation Phaser",
"options": [{
"form": "compact",
"price": "$545"
},
{
"form": "Pistol Grip",
"price": "$600"
}
]
},
{
"name": "Sith Lightsaber",
"options": [{
"bladeColor": "red",
"count": "single",
"price": "$1000"
},
{
"bladeColor": "blue",
"count": "double",
"price": "$1875"
}
]
}
]
}

Related

jq parse json with stream flag into different json file

I have a json file as below called data.json, I want to parse the data with jq tool in streaming mode(do not load the whole file into memory), because the real data have 20GB
the streaming mode in jq seems to add a flag --stream and it will parse the json file row by row
{
"id": {
"bioguide": "E000295",
"thomas": "02283",
"govtrack": 412667,
"opensecrets": "N00035483",
"lis": "S376"
},
"bio": {
"gender": "F",
"birthday": "1970-07-01"
},
"tooldatareports": [
{
"name": "A",
"tooldata": [
{
"toolid": 12345,
"data": [
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 10
},
{
"time": "2021-01-03",
"value": 5
}
]
},
{
"toolid": 12346,
"data": [
{
"time": "2021-01-01",
"value": 10
},
{
"time": "2021-01-02",
"value": 100
},
{
"time": "2021-01-03",
"value": 50
}
]
}
]
}
]
}
The final result I hope it can become as below
A list contains two dict, each dict contain 2 keys
[
{
"data": [
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 10
},
{
"time": "2021-01-03",
"value": 5
}
]
},
{
"data": [
{
"time": "2021-01-01",
"value": 10
},
{
"time": "2021-01-02",
"value": 100
},
{
"time": "2021-01-03",
"value": 50
}
]
}
]
For this problem, I use the below command line to get a result, but it still has some differences.
cat data.json | jq --stream 'select(.[0][0]=="tooldatareports" and .[0][2]=="tooldata" and .[1]!=null) | .'
the result is not a list contain a lot of dict
for each time and value are separate in the different list
Does anyone have any idea about this?
Here's a solution that does not use truncate_stream:
jq -n --stream '
[fromstream(
inputs
| (.[0] | index("data")) as $ix
| select($ix)
| .[0] |= .[$ix:] )]
' input.json
The following produces the required output:
jq -n --stream '
[{data: fromstream(5|truncate_stream(inputs))}]
' input.json
Needless to say, there are other variations ...
Here's a step-by-step explanation of peak's answers.
First let's convert the json to stream.
https://jqplay.org/s/VEunTmDSkf
[["id","bioguide"],"E000295"]
[["id","thomas"],"02283"]
[["id","govtrack"],412667]
[["id","opensecrets"],"N00035483"]
[["id","lis"],"S376"]
[["id","lis"]]
[["bio","gender"],"F"]
[["bio","birthday"],"1970-07-01"]
[["bio","birthday"]]
[["tooldatareports",0,"name"],"A"]
[["tooldatareports",0,"tooldata",0,"toolid"],12345]
[["tooldatareports",0,"tooldata",0,"data",0,"time"],"2021-01-01"]
[["tooldatareports",0,"tooldata",0,"data",0,"value"],1]
[["tooldatareports",0,"tooldata",0,"data",0,"value"]]
[["tooldatareports",0,"tooldata",0,"data",1,"time"],"2021-01-02"]
[["tooldatareports",0,"tooldata",0,"data",1,"value"],10]
[["tooldatareports",0,"tooldata",0,"data",1,"value"]]
[["tooldatareports",0,"tooldata",0,"data",2,"time"],"2021-01-03"]
[["tooldatareports",0,"tooldata",0,"data",2,"value"],5]
[["tooldatareports",0,"tooldata",0,"data",2,"value"]]
[["tooldatareports",0,"tooldata",0,"data",2]]
[["tooldatareports",0,"tooldata",0,"data"]]
[["tooldatareports",0,"tooldata",1,"toolid"],12346]
[["tooldatareports",0,"tooldata",1,"data",0,"time"],"2021-01-01"]
[["tooldatareports",0,"tooldata",1,"data",0,"value"],10]
[["tooldatareports",0,"tooldata",1,"data",0,"value"]]
[["tooldatareports",0,"tooldata",1,"data",1,"time"],"2021-01-02"]
[["tooldatareports",0,"tooldata",1,"data",1,"value"],100]
[["tooldatareports",0,"tooldata",1,"data",1,"value"]]
[["tooldatareports",0,"tooldata",1,"data",2,"time"],"2021-01-03"]
[["tooldatareports",0,"tooldata",1,"data",2,"value"],50]
[["tooldatareports",0,"tooldata",1,"data",2,"value"]]
[["tooldatareports",0,"tooldata",1,"data",2]]
[["tooldatareports",0,"tooldata",1,"data"]]
[["tooldatareports",0,"tooldata",1]]
[["tooldatareports",0,"tooldata"]]
[["tooldatareports",0]]
[["tooldatareports"]]
Now do .[0] to extract the path portion of stream.
https://jqplay.org/s/XdPrp8RuEj
["id","bioguide"]
["id","thomas"]
["id","govtrack"]
["id","opensecrets"]
["id","lis"]
["id","lis"]
["bio","gender"]
["bio","birthday"]
["bio","birthday"]
["tooldatareports",0,"name"]
["tooldatareports",0,"tooldata",0,"toolid"]
["tooldatareports",0,"tooldata",0,"data",0,"time"]
["tooldatareports",0,"tooldata",0,"data",0,"value"]
["tooldatareports",0,"tooldata",0,"data",0,"value"]
["tooldatareports",0,"tooldata",0,"data",1,"time"]
["tooldatareports",0,"tooldata",0,"data",1,"value"]
["tooldatareports",0,"tooldata",0,"data",1,"value"]
["tooldatareports",0,"tooldata",0,"data",2,"time"]
["tooldatareports",0,"tooldata",0,"data",2,"value"]
["tooldatareports",0,"tooldata",0,"data",2,"value"]
["tooldatareports",0,"tooldata",0,"data",2]
["tooldatareports",0,"tooldata",0,"data"]
["tooldatareports",0,"tooldata",1,"toolid"]
["tooldatareports",0,"tooldata",1,"data",0,"time"]
["tooldatareports",0,"tooldata",1,"data",0,"value"]
["tooldatareports",0,"tooldata",1,"data",0,"value"]
["tooldatareports",0,"tooldata",1,"data",1,"time"]
["tooldatareports",0,"tooldata",1,"data",1,"value"]
["tooldatareports",0,"tooldata",1,"data",1,"value"]
["tooldatareports",0,"tooldata",1,"data",2,"time"]
["tooldatareports",0,"tooldata",1,"data",2,"value"]
["tooldatareports",0,"tooldata",1,"data",2,"value"]
["tooldatareports",0,"tooldata",1,"data",2]
["tooldatareports",0,"tooldata",1,"data"]
["tooldatareports",0,"tooldata",1]
["tooldatareports",0,"tooldata"]
["tooldatareports",0]
["tooldatareports"]
Let me first quickly explain index\1.
index("data") of [["tooldatareports",0,"tooldata",0,"data",0,"time"],"2021-01-01"] is 4 since that is the index of the first occurrence of "data".
Knowing that let's now do .[0] | index("data").
https://jqplay.org/s/ny0bV1xEED
null
null
null
null
null
null
null
null
null
null
null
4
4
4
4
4
4
4
4
4
4
4
null
4
4
4
4
4
4
4
4
4
4
4
null
null
null
null
As you can see in our case the indexes are either 4 or null. We want to filter each input such that the corresponding index is not null. Those are the input that have "data" as part of their path.
(.[0] | index("data")) as $ix | select($ix) does just that. Remember that each $ix is mapped to each input. So only input with their $ix being not null are displayed.
For example see https://jqplay.org/s/NwcD7_USZE Here inputs | select(null) gives no output but inputs | select(true) outputs every input.
These are the filtered stream:
https://jqplay.org/s/SgexvhtaGe
[["tooldatareports",0,"tooldata",0,"data",0,"time"],"2021-01-01"]
[["tooldatareports",0,"tooldata",0,"data",0,"value"],1]
[["tooldatareports",0,"tooldata",0,"data",0,"value"]]
[["tooldatareports",0,"tooldata",0,"data",1,"time"],"2021-01-02"]
[["tooldatareports",0,"tooldata",0,"data",1,"value"],10]
[["tooldatareports",0,"tooldata",0,"data",1,"value"]]
[["tooldatareports",0,"tooldata",0,"data",2,"time"],"2021-01-03"]
[["tooldatareports",0,"tooldata",0,"data",2,"value"],5]
[["tooldatareports",0,"tooldata",0,"data",2,"value"]]
[["tooldatareports",0,"tooldata",0,"data",2]]
[["tooldatareports",0,"tooldata",0,"data"]]
[["tooldatareports",0,"tooldata",1,"data",0,"time"],"2021-01-01"]
[["tooldatareports",0,"tooldata",1,"data",0,"value"],10]
[["tooldatareports",0,"tooldata",1,"data",0,"value"]]
[["tooldatareports",0,"tooldata",1,"data",1,"time"],"2021-01-02"]
[["tooldatareports",0,"tooldata",1,"data",1,"value"],100]
[["tooldatareports",0,"tooldata",1,"data",1,"value"]]
[["tooldatareports",0,"tooldata",1,"data",2,"time"],"2021-01-03"]
[["tooldatareports",0,"tooldata",1,"data",2,"value"],50]
[["tooldatareports",0,"tooldata",1,"data",2,"value"]]
[["tooldatareports",0,"tooldata",1,"data",2]]
[["tooldatareports",0,"tooldata",1,"data"]]
Before we go further let's review update assignment.
Have a look at https://jqplay.org/s/g4P6j8f9FG
Let's say we have input [["tooldatareports",0,"tooldata",0,"data",0,"time"],"2021-01-01"].
Then filter .[0] |= .[4:] produces [["data",0,"time"],"2021-01-01"].
Why?
Remember that right hand side (.[4:]) inherits the context of the left hand side(.[0]). So in this case it has the effect of updating the path ["tooldatareports",0,"tooldata",0,"data",0,"time"] to ["data",0,"time"].
Let's move on then.
So (.[0] | index("data")) as $ix | select($ix) | .[0] |= .[$ix:] has the output:
https://jqplay.org/s/AwcQpVyHO2
[["data",0,"time"],"2021-01-01"]
[["data",0,"value"],1]
[["data",0,"value"]]
[["data",1,"time"],"2021-01-02"]
[["data",1,"value"],10]
[["data",1,"value"]]
[["data",2,"time"],"2021-01-03"]
[["data",2,"value"],5]
[["data",2,"value"]]
[["data",2]]
[["data"]]
[["data",0,"time"],"2021-01-01"]
[["data",0,"value"],10]
[["data",0,"value"]]
[["data",1,"time"],"2021-01-02"]
[["data",1,"value"],100]
[["data",1,"value"]]
[["data",2,"time"],"2021-01-03"]
[["data",2,"value"],50]
[["data",2,"value"]]
[["data",2]]
[["data"]]
Now all we need to do is convert this stream back to json.
https://jqplay.org/s/j2uyzEU_Rc
[fromstream(inputs)] gives:
[
{
"data": [
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 10
},
{
"time": "2021-01-03",
"value": 5
}
]
},
{
"data": [
{
"time": "2021-01-01",
"value": 10
},
{
"time": "2021-01-02",
"value": 100
},
{
"time": "2021-01-03",
"value": 50
}
]
}
]
This is the output we wanted.

parsing json file with jq [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
{
"name": "ford",
"availableVersions": [
{
"version": 111,
"count": 3
},
{
"version": 122,
"count": 2
},
{
"version": 133,
"count": 3
},
{
"version": 144,
"count": 1
}
],
"RealVersion": 155
}
{
"name": "bmw",
"availableVersions": [
{
"version": 244,
"count": 1
},
{
"version": 255,
"count": 3
} ],
"RealVersion": 120
}
I have this demo.json file now if (name == 'ford')(name can be a variable) I want to get the all the versions whose count != 3
and if all the version's count == 3 I want to get the RealVersion of ford so in this case output should be.
EXPECTED OUTPUT = [122 144 ]
I am using jq tool for parsing the json file
Now if all the version count ==3
{
"name": "ford",
"availableVersions": [
{
"version": 111,
"count": 3
},
{
"version": 122,
"count": 3
},
{
"version": 133,
"count": 3
},
{
"version": 144,
"count": 3
}
],
"RealVersion": 155
}
{
"name": "bmw",
"availableVersions": [
{
"version": 244,
"count": 1
},
{
"version": 255,
"count": 3
} ],
"RealVersion": 120
}
Now in this case all the version's count == 3 so now I want to get the real version which is 155
EXPECTED OUTPUT SHOULD BE 155
CAN ANYONE HELP ME WITH THIS?
The following program, when invoked with the -n command-line option, produces the expected output in both cases:
inputs
| .RealVersion as $RealVersion
| select(.name == "ford")
| .availableVersions
| map(select(.count != 3))
| if length > 0 then map(.version)
else $RealVersion
end
Specifically, in the first case, it produces a JSON array, and in the second case, the actual value of .ReadVersion.

How do I create a jq query that extracts data from 2 separate levels of a JSON file?

I have a JSON file that looks something like this:
{
"people": {
"company": "Acme",
"department": "Dev",
"perks": {
"eat": "pizza",
"drink": "beer",
"play": "twister"
},
"names": [{
"last_name": "Smith",
"first_names": [{
"name": "Bill"
},
{
"name": "Alice"
},
{
"name": "Mary"
}
]
},
{
"last_name": "Brown",
"first_names": [{
"name": "Gil"
},
{
"name": "Bob"
},
{
"name": "Mary"
}
]
},
{
"last_name": "Sanchez",
"first_names": [{
"name": "Gil"
},
{
"name": "Jose"
},
{
"name": "Marlena"
}
]
}
]
}
}
The output I'm looking for is:
acme
Dev
twister
Smith, Bill
Smith, Alice
Smith, Mary
Brown, Gil
Brown, Bob
Brown, Mary
Sanchez, Gil
Sanchez, Jose
Sanchez, Marlena
I have the jq query that gets the names:
jq -r '.people | .names[] | "\(.last_name), \(.first_names[].name)"'
And I have the query that gets me the first 3 lines (Acme, Dev, twister):
jq -r '.people | .company, .department, .perks.play'
But when I try to combine them (in too many ways to list here!), I get an error. I don't know how to combine these to get the query to walk the first level below ".people" and then the level below ".people.names[]" (all in one query).
Simply use the "," operator to join the two queries, e.g.
.people
| (.company, .department, .perks.play),
(.names[] | "\(.last_name), \(.first_names[].name)")

How to Bulk Upload Complex JSON to MySQL

I have a JSON file that I am trying to bulk upload to MySql. The file is around 50gb. Is there a simple method to get all of the data into MySql? I tried watching videos on youtube on how to do this, but all of the tutorials were for super simple json data that don't have nested data like this. Any help would be amazing. Here is a sample so you can see the structure of it:
{
"PatentData": [
{
"patentCaseMetadata": {
"applicationNumberText": {
"value": "16315092",
"electronicText": "16315092"
},
"filingDate": "2019-07-03",
"applicationTypeCategory": "Utility",
"partyBag": {
"applicantBagOrInventorBagOrOwnerBag": [
{
"applicant": [
{
"contactOrPublicationContact": [
{
"name": { "personNameOrOrganizationNameOrEntityName": [ { "organizationStandardName": { "content": [ "SEB S.A." ] } } ] },
"cityName": "ECULLY",
"geographicRegionName": {
"value": "",
"geographicRegionCategory": "STATE"
},
"countryCode": "FR"
}
]
}
]
},
{
"inventorOrDeceasedInventor": [
{
"contactOrPublicationContact": [
{
"name": {
"personNameOrOrganizationNameOrEntityName": [
{
"personStructuredName": {
"firstName": "Johan",
"middleName": "",
"lastName": "SABATTIER"
}
}
]
},
"cityName": "Mornant",
"geographicRegionName": {
"value": "",
"geographicRegionCategory": "STATE"
The end goal is to have the JSON file in a MySQL database in the following format:
Name | Address | State | Country ... | Abstract
Tim - 23 North- TX - US ... | The tissue...
Tom - 33 North- TX - US ... | The engineer...
Kim - 78 North- TX - US ... | The lung...
Bob - 123 North- TX - US ... | The tissue...
Rob - 93 North- TX - US ... | The scope...

Parsing Nested JSON using SCALA

I am looking to inject Telemetry data and the output is a multi layered nested JSON file. I am interested in very specific fields but I am not able to parse the JSON file to get to the data.
Data Sample:
{ "version_str": "1.0.0", "node_id_str": "router-01", "encoding_path":
"sys/intf", "collection_id": 241466, "collection_start_time": 0,
"collection_end_time": 0, "msg_timestamp": 0, "subscription_id": [ ],
"sensor_group_id": [ ], "data_source": "DME", "data": {
"interfaceEntity": { "attributes": { "childAction": "", "descr": "",
"dn": "sys/intf", "modTs": "2017-09-19T13:24:14.751+00:00",
"monPolDn": "uni/fabric/monfab-default", "persistentOnReload": "true",
"status": "" }, "children": [ { "l3LbRtdIf": { "attributes": {
"adminSt": "up", "childAction": "", "descr": "Nothing", "id":
"lo103", "linkLog": "default", "modTs":
"2017-11-06T23:18:02.974+00:00", "monPolDn":
"uni/fabric/monfab-default", "name": "", "persistentOnReload": "true",
"rn": "lb-[lo103]", "status": "", "uid": "0" }, "children": [ {
"ethpmLbRtdIf": { "attributes": { "currErrIndex": "4294967295",
"ifIndex": "335544423", "iod": "14", "lastErrors": "0,0,0,0",
"operBitset": "", "operDescr": "Nothing", "operMtu": "1500",
"operSt": "up", "operStQual": "none", "rn": "lbrtdif" } } }, {
"nwRtVrfMbr": { "attributes": { "childAction": "", "l3vmCfgFailedBmp":
"", "l3vmCfgFailedTs": "00:00:00:00.000", "l3vmCfgState": "0",
"modTs": "2017-11-06T23:18:02.945+00:00", "monPolDn": "",
"parentSKey": "unspecified", "persistentOnReload": "true", "rn":
"rtvrfMbr", "status": "", "tCl": "l3Inst", "tDn": "sys/inst-default",
"tSKey": "" } } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt":
"up", "childAction": "", "descr": "Nothing", "id": "lo104",
"linkLog": "default", "modTs": "2018-01-25T15:54:20.367+00:00",
"monPolDn": "uni/fabric/monfab-default", "name": "",
"persistentOnReload": "true", "rn": "lb-[lo104]", "status": "", "uid":
"0" }, "children": [ { "ethpmLbRtdIf": { "attributes": {
"currErrIndex": "4294967295", "ifIndex": "335544424", "iod": "77",
"lastErrors": "0,0,0,0", "operBitset": "", "operDescr":
"Nothing", "operMtu": "1500", "operSt": "up", "operStQual":
"none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr": { "attributes": {
"childAction": "", "l3vmCfgFailedBmp": "", "l3vmCfgFailedTs":
"00:00:00:00.000", "l3vmCfgState": "0", "modTs":
"2018-01-25T15:53:55.757+00:00", "monPolDn": "", "parentSKey":
"unspecified", "persistentOnReload": "true", "rn": "rtvrfMbr",
"status": "", "tCl": "l3Inst", "tDn": "sys/inst-default", "tSKey": ""
} } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt": "up",
"childAction": "", "descr": "Nothing", "id": "lo101",
"linkLog": "default", "modTs": "2017-11-13T21:39:58.910+00:00",
"monPolDn": "uni/fabric/monfab-default", "name": "",
"persistentOnReload": "true", "rn": "lb-[lo101]", "status": "", "uid":
"0" }, "children": [ { "ethpmLbRtdIf": { "attributes": {
"currErrIndex": "4294967295", "ifIndex": "335544421", "iod": "12",
"lastErrors": "0,0,0,0", "operBitset": "", "operDescr":
"Nothing", "operMtu": "1500", "operSt": "up", "operStQual":
"none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr": { "attributes": {
"childAction": "", "l3vmCfgFailedBmp": "", "l3vmCfgFailedTs":
"00:00:00:00.000", "l3vmCfgState": "0", "modTs":
"2017-11-13T21:39:58.880+00:00", "monPolDn": "", "parentSKey":
"unspecified", "persistentOnReload": "true", "rn": "rtvrfMbr",
"status": "", "tCl": "l3Inst", "tDn": "sys/inst-default", "tSKey": ""
} } } ] } }, { "l3LbRtdIf": { "attributes": { "adminSt": "up",
"childAction": "", "descr": "\"^:tier2:if:loopback:mgmt:l3\"", "id":
"lo0", "linkLog": "default", "modTs": "2017-09-25T20:29:54.003+00:00",
"monPolDn": "uni/fabric/monfab-default", "name": "",
"persistentOnReload": "true", "rn": "lb-[lo0]", "status": "", "uid":
"0" }, "children": [ { "ethpmLbRtdIf": { "attributes": {
"currErrIndex": "4294967295", "ifIndex": "335544320", "iod": "11",
"lastErrors": "0,0,0,0", "operBitset": "", "operDescr":
"\"^:tier2:if:loopback:mgmt:l3\"", "operMtu": "1500", "operSt": "up",
"operStQual": "none", "rn": "lbrtdif" } } }, { "nwRtVrfMbr":...
I am interested in these attributes:
| | | | | | | |-- rmonIfIn: struct (nullable = true)
| | | | | | | | |-- attributes: struct (nullable = true )
| | | | | | | | | |-- broadcastPkts: string (nullabl e = true)
| | | | | | | | | |-- discards: string (nullable = t rue)
| | | | | | | | | |-- errors: string (nullable = tru e)
| | | | | | | | | |-- multicastPkts: string (nullabl e = true)
| | | | | | | | | |-- nUcastPkts: string (nullable = true)
| | | | | | | | | |-- packetRate: string (nullable = true)
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.explode
import spark.implicits._
val spark = SparkSession.builder().getOrCreate
val df = spark.read.option("header","true").option("inferSchema","true").json("file:///usr/local/Projects/out.txt")
val mapDF = df.select($"node_id_str" as "nodename", $"data".getItem("InterfaceEntity").getItem("children").getItem("l1PhysIf").getItem("children").getItem("element"))
I keep getting an error when I attempt to get any deeper, I keep getting data type error:
stringJsonDF: org.apache.spark.sql.DataFrame = [nestDevice: string]
org.apache.spark.sql.AnalysisException: cannot resolve '`data`.`InterfaceEntity`.`children`.`l1PhysIf`.`children`['element']' due to data type mismatch: argument 2 requires integral type, however, ''element'' is of string type.;;
You can use Google Gson Library which is used to work with json. You can convert any object to json and of course do it in reverse. here is an example for doing so:
Gson gson = new Gson();
List<Map<Long, String>> listOfMaps = new ArrayList<>();
//here you can new some maps and add them to the listOfMaps.
String listOfMapsInJsonFormat = gson.toJson(listOfMaps);
above sample code is for converting an object to json. To do the reverse job you can check below one too:
Gson gson = new Gson();
List list = gson.fromJson(listOfMapsInJsonFormat, List.class);
the above code will change your input json string to a list which contains maps. Of course there may be a difference in the type of the map you have had before converting the original object to json and the one gson builds the object from json string. to avoid that you can use TypeToken class:
Gson gson = new Gson();
Type type = new TypeToken()<ArrayList<Map<>>>{}.getType();
ArrayList<Map<>> = gson.fromJson(listOfMapsInJsonFormat, type);
Since the fields are part of multiple nested arrays the logic would assume that you are interested in all iterations of those fields per record (so if one record contains n rmonIfIn items due to nested arrays, you would be interested in each of them?)
If so it makes sense to explode these nested arrays and process the expanded dataframe.
Based on your code and incomplete json example it could look like something like this:
val nested = df
.select(explode($"data.InterfaceEntity").alias("l1"))
.select(explode($"l1.l1PhysIf").alias("l2"))
.select($"l2.rmonIfIn.attributes".alias("l3"))
.select($"l3.broadcastPkts", $"l3.discards", $"l3.errors", $"l3.multicastPkts", $"l3.packetRate")
Returning a dataframe that could look like
+-------------+--------+------+-------------+----------+
|broadcastPkts|discards|errors|multicastPkts|packetRate|
+-------------+--------+------+-------------+----------+
|1 |1 |1 |1 |1 |
|2 |2 |2 |2 |2 |
|3 |3 |3 |3 |3 |
|4 |4 |4 |4 |4 |
+-------------+--------+------+-------------+----------+