Parsing nested json with jq - json

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?

I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "

Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org

jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

Related

How do I write a jq query to convert a JSON file to CSV?

The JSON files look like:
{
"name": "My Collection",
"description": "This is a great collection.",
"date": 1639717379161,
"attributes": [
{
"trait_type": "Background",
"value": "Sand"
},
{
"trait_type": "Skin",
"value": "Dark Brown"
},
{
"trait_type": "Mouth",
"value": "Smile Basic"
},
{
"trait_type": "Eyes",
"value": "Confused"
}
]
}
I found a shell script that uses jq and has this code:
i=1
for eachFile in *.json; do
cat $i.json | jq -r '.[] | {column1: .name, column2: .description} | [.[] | tostring] | #csv' > extract-$i.csv
echo "converted $i of many json files..."
((i=i+1))
done
But its output is:
jq: error (at <stdin>:34): Cannot index string with string "name"
converted 1 of many json files...
Any suggestions on how I can make this work? Thank you!
Quick jq lesson
===========
jq filters are applied like this:
jq -r '.name_of_json_field_0 <optional filter>, .name_of_json_field_1 <optional filter>'
and so on and so forth. A single dot is the simplest filter; it leaves the data field untouched.
jq -r '.name_of_field .'
You may also leave the filter field untouched for the same effect.
In your case:
jq -r '.name, .description'
will extract the values of both those fields.
.[] will unwrap an array to have the next piped filter applied to each unwrapped value. Example:
jq -r '.attributes | .[]
extracts all trait_types objects.
You may sometime want to repackage objects in an array by surrounding the filter in brackets:
jq -r '[.name, .description, .date]
You may sometime want to repackage data in an object by surrounding the filter in curly braces:
`jq -r '{new_field_name: .name, super_new_field_name: .description}'
playing around with these, I was able to get
jq -r '[.name, .description, .date, (.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";"")), (.attributes | [.[] | .value] | .[]] | #csv | gsub(",";";") | gsub("\"";""))] | #csv'
to give us:
"My Collection","This is a great collection.",1639717379161,"Background;Skin;Mouth;Eyes","Sand;Dark Brown;Smile Basic;Confused"
Name, description, and date were left as is, so let's break down the weird parts, one step at a time.
.attributes | [.[] | .trait_type]
.[] extracts each element of the attributes array and pipes the result of that into the next filter, which says to simply extract trait_type, where they are re-packaged in an array.
.attributes | [.[] | .trait_type] | #csv
turn the array into a csv-parsable format.
(.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";""))
Parens separate this from the rest of the evaluations, obviously.
The first gsub here replaces commas with semicolons so they don't get interpreted as a separate field, the second removes all extra double quotes.

Convert JSON file to CSV file using jq

i have JSON file called myresponse.json.
"status":"CONTENT",
"valid":true,
"success":true,
"failure":false,
"content":{
"id":0,"resources":[{
"id":0,"value":52.51742935180664
},
{
"id":1,"value":13.392845153808594
},
{
"id":5,"value":"2021-02-09T13:15:15Z"
},
{
"id":6,"value":20.754192352294922
}]}}}
"status":"CONTENT",
"valid":true,
"success":true,
"failure":false,
"content":{
"id":0,"resources":[{
"id":0,"value":52.51742935180664
},
{
"id":1,"value":13.392845153808594
},
{
"id":5,"value":"2021-02-09T13:15:15Z"
},
{
"id":6,"value":20.754192352294922
}]}}}
obtained with a curl.
how can i use jq to convert json to csv file where "0,1,5,6" must be the columns and the values ​​of "0,1,5,6" must respectively occupy each row of the csv file, like this:
0,1,5,6
52.51742935180664, 13.392845153808594, "2021-02-09T13:15:15Z", 20.754192352294922
52.51742935180664, 13.392845153808594, "2021-02-09T13:15:15Z", 20.754192352294922
Thanks for your help!
The following assumes the input consists of a stream of valid JSON objects along the lines shown in the question.
If the relevant values of .id are known beforehand, then generating the header row is trivial, and a solution implemented with flexibility in mind is as follows:
def oneline:
.content.resources
| INDEX(.[]; .id) | map_values(.value);
def emit($keys):
[.[ $keys[] ]];
[0,1,5,6] as $keys
| $keys,
(inputs | oneline | emit($keys))
| join(",")
Since this relies on inputs to read the input, jq should be invoked with the -r and -n options (e.g. jq -rn -f program.jq)
Using the first object to determine the relevant .id values
If the relevant values of .id are determined by the first JSON object in the stream, the above defs can be reused with the following:
(input | oneline) as $first
| ($first | keys) as $keys
| $keys,
($first | emit($keys)),
(inputs | oneline | emit($keys))
| join(",")
This solution would be used with jq's -r and -n options.

How to print path and key values of JSON file using JQ

I would like to print each path and value of a json file with included key values line by line. I would like the output to be comma delimited or at least very easy to cut and sort using Linux command line tools. Given the following json and jq, I have been given jq code which seems to do this for the test JSON, but I am not sure it works in all cases or is the proper approach.
Is there a function in jq which does this automatically? If not, is there a "most concise best way" to do it?
My wish would be something like:
$ cat short.json | jq -doit '.'
Reservations,0,Instances,0,ImageId,ami-a
Reservations,0,Instances,0,InstanceId,i-a
Reservations,0,Instances,0,InstanceType,t2.micro
Reservations,0,Instances,0,KeyName,ubuntu
Test JSON:
$ cat short.json | jq '.'
{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"ImageId": "ami-a",
"InstanceId": "i-a",
"InstanceType": "t2.micro",
"KeyName": "ubuntu"
}
]
}
]
}
Code Recommended:
https://unix.stackexchange.com/questions/561460/how-to-print-path-and-key-values-of-json-file
Supporting:
https://unix.stackexchange.com/questions/515573/convert-json-file-to-a-key-path-with-the-resulting-value-at-the-end-of-each-k
JQ Code Too long and complicated!
jq -r '
paths(scalars) as $p
| [ ( [ $p[] | tostring ] | join(".") )
, ( getpath($p) | tojson )
]
| join(": ")
' short.json
Result:
Reservations.0.Instances.0.ImageId: "ami-a"
Reservations.0.Instances.0.InstanceId: "i-a"
Reservations.0.Instances.0.InstanceType: "t2.micro"
Reservations.0.Instances.0.KeyName: "ubuntu"
A simple jq query to achieve the requested format:
paths(scalars) as $p
| $p + [getpath($p)]
| join(",")
If your jq is ancient and you cannot upgrade, insert | map(tostring) before the last line above.
Output with the -r option
Reservations,0,Instances,0,ImageId,ami-a
Reservations,0,Instances,0,InstanceId,i-a
Reservations,0,Instances,0,InstanceType,t2.micro
Reservations,0,Instances,0,KeyName,ubuntu
Caveat
If a key or atomic value contains "," then of course using a comma may be inadvisable. For this reason, it might be preferable to use a character such as TAB that cannot appear in a JSON key or atomic value. Consider therefore using #tsv:
paths(scalars) as $p
| $p + [getpath($p)]
| #tsv
(The comment above about ancient versions of jq applies here too.)
Read it as a stream.
$ jq --stream -r 'select(.[1]|scalars!=null) | "\(.[0]|join(".")): \(.[1]|tojson)"' short.json
Use -c paths as follows:
cat short.json | jq -c paths | tr -d '[' | tr -d ']'
I am using jq-1.5-1-a5b5cbe

Quoted string in CSV input become double-escaped

I'm trying to use JQ to process CSV like this which has no column headings:
cat "input.csv"
"12345678901234567890","2019-03-19",12
Is there more elegant and readable way to remove escaped quotes for the first and second fields--and overall, to build a stream of objects given such input?
Ideally I would like to have a reusable script which builds JSON from an artbitrary CSV, given a file and a list of fields in it passed as a command-line argument.
Current JQ script and output:
cat "input.csv" |
jq \
--raw-input '
. |
split("\n") |
map( split(",")) |
.[0] |
{
ID: (.[0] | fromjson),
date: (.[1] | fromjson),
count: (.[2] | tonumber)
}'
{
"ID": "12345678901234567890",
"date": "2019-03-19",
"count": 1
}
Output of the same script without | fromjson used which results in quoted quotes, which I would like to avoid:
{
"ID": "\"12345678901234567890\"",
"date": "\"2019-03-19\"",
"count": 1
}
Your invocation of jq can be simplified to:
jq -R '
split(",")
| map(fromjson)
| {ID: .[0], date: .[1], count: .[2] }'
Generic solution
jq -R --argjson header '["ID", "date", "count"]' '
split(",")
| map(fromjson)
| [ $header, . ]
| transpose
| reduce .[] as $kv ({}; .[$kv[0]] =$kv[1]) '
If you want to specify the headers in a file, use the --argfile command-line option instead.

How to map an object to arrays so it can be converted to csv?

I'm trying to convert an object that looks like this:
{
"123" : "abc",
"231" : "dbh",
"452" : "xyz"
}
To csv that looks like this:
"123","abc"
"231","dbh"
"452","xyz"
I would prefer to use the command line tool jq but can't seem to figure out how to do the assignment. I managed to get the keys with jq '. | keys' test.json but couldn't figure out what to do next.
The problem is you can't convert a k:v object like this straight into csv with #csv. It needs to be an array so we need to convert to an array first. If the keys were named, it would be simple but they're dynamic so its not so easy.
Try this filter:
to_entries[] | [.key, .value]
to_entries converts an object to an array of key/value objects. [] breaks up the array to each of the items in the array
then for each of the items, covert to an array containing the key and value.
This produces the following output:
[
"123",
"abc"
],
[
"231",
"dbh"
],
[
"452",
"xyz"
]
Then you can use the #csv filter to convert the rows to CSV rows.
$ echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries[] | [.key, .value] | #csv'
"123","abc"
"231","dbh"
"452","xyz"
Jeff answer is a good starting point, something closer to what you expect:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))'
[
"123,abc",
"231,dbh",
"452,xyz"
]
But did not find a way to join using newline:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))|join("\n")'
"123,abc\n231,dbh\n452,xyz"
Here's an example I ended up using this morning (processing PagerDuty alerts):
cat /tmp/summary.json | jq -r '
.incidents
| map({desc: .trigger_summary_data.description, id:.id})
| group_by(.desc)
| map(length as $len
| {desc:.[0].desc, length: $len})
| sort_by(.length)
| map([.desc, .length] | #csv)
| join("\n") '
This dumps a CVS-separated document that looks something like:
"[Triggered] Something annoyingly frequent",31
"[Triggered] Even more frequent alert!",35
"[No data] Stats Server is probably acting up",55
Try This
give same output you want
echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries | .[] | "\"" + .key + "\",\"" + (.value | tostring)+ "\""'
onecol2txt () {
awk 'BEGIN { RS="_end_"; FS="\n"}
{ for (i=2; i <= NF; i++){
printf "%s ",$i
}
printf "\n"
}'
}
cat jsonfile | jq -r -c '....,"_end_"' | onecol2txt