Convert JSON file to CSV file using jq - json

i have JSON file called myresponse.json.
"status":"CONTENT",
"valid":true,
"success":true,
"failure":false,
"content":{
"id":0,"resources":[{
"id":0,"value":52.51742935180664
},
{
"id":1,"value":13.392845153808594
},
{
"id":5,"value":"2021-02-09T13:15:15Z"
},
{
"id":6,"value":20.754192352294922
}]}}}
"status":"CONTENT",
"valid":true,
"success":true,
"failure":false,
"content":{
"id":0,"resources":[{
"id":0,"value":52.51742935180664
},
{
"id":1,"value":13.392845153808594
},
{
"id":5,"value":"2021-02-09T13:15:15Z"
},
{
"id":6,"value":20.754192352294922
}]}}}
obtained with a curl.
how can i use jq to convert json to csv file where "0,1,5,6" must be the columns and the values ​​of "0,1,5,6" must respectively occupy each row of the csv file, like this:
0,1,5,6
52.51742935180664, 13.392845153808594, "2021-02-09T13:15:15Z", 20.754192352294922
52.51742935180664, 13.392845153808594, "2021-02-09T13:15:15Z", 20.754192352294922
Thanks for your help!

The following assumes the input consists of a stream of valid JSON objects along the lines shown in the question.
If the relevant values of .id are known beforehand, then generating the header row is trivial, and a solution implemented with flexibility in mind is as follows:
def oneline:
.content.resources
| INDEX(.[]; .id) | map_values(.value);
def emit($keys):
[.[ $keys[] ]];
[0,1,5,6] as $keys
| $keys,
(inputs | oneline | emit($keys))
| join(",")
Since this relies on inputs to read the input, jq should be invoked with the -r and -n options (e.g. jq -rn -f program.jq)
Using the first object to determine the relevant .id values
If the relevant values of .id are determined by the first JSON object in the stream, the above defs can be reused with the following:
(input | oneline) as $first
| ($first | keys) as $keys
| $keys,
($first | emit($keys)),
(inputs | oneline | emit($keys))
| join(",")
This solution would be used with jq's -r and -n options.

Related

jq - Looping through json and concatenate the output to single string

I was currently learning the usage of jq. I have a json file and I am able to loop through and filter out the values I need from the json. However, I am running into issue when I try to combine the output into single string instead of having the output in multiple lines.
File svcs.json:
[
{
"name": "svc-A",
"run" : "True"
},
{
"name": "svc-B",
"run" : "False"
},
{
"name": "svc-C",
"run" : "True"
}
]
I was using the jq to filter to output the service names with run value as True
jq -r '.[] | select(.run=="True") | .name ' svcs.json
I was getting the output as follows:
svc-A
svc-C
I was looking to get the output as single string separated by commas.
Expected Output:
"svc-A,svc-C"
I tried to using join, but was unable to get it to work so far.
The .[] expression explodes the array into a stream of its elements. You'll need to collect the transformed stream (the names) back into an array. Then you can use the #csv filter for the final output
$ jq -r '[ .[] | select(.run=="True") | .name ] | #csv' svcs.json
"svc-A","svc-C"
But here's where map comes in handy to operate on an array's elements:
$ jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Keep the array using map instead of decomposing it with .[], then join with a glue string:
jq -r 'map(select(.run=="True") | .name) | join(",")' svcs.json
svc-A,svc-C
Demo
If your goal is to create a CSV output, there is a special #csv command taking care of quoting, escaping etc.
jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Demo

How do I write a jq query to convert a JSON file to CSV?

The JSON files look like:
{
"name": "My Collection",
"description": "This is a great collection.",
"date": 1639717379161,
"attributes": [
{
"trait_type": "Background",
"value": "Sand"
},
{
"trait_type": "Skin",
"value": "Dark Brown"
},
{
"trait_type": "Mouth",
"value": "Smile Basic"
},
{
"trait_type": "Eyes",
"value": "Confused"
}
]
}
I found a shell script that uses jq and has this code:
i=1
for eachFile in *.json; do
cat $i.json | jq -r '.[] | {column1: .name, column2: .description} | [.[] | tostring] | #csv' > extract-$i.csv
echo "converted $i of many json files..."
((i=i+1))
done
But its output is:
jq: error (at <stdin>:34): Cannot index string with string "name"
converted 1 of many json files...
Any suggestions on how I can make this work? Thank you!
Quick jq lesson
===========
jq filters are applied like this:
jq -r '.name_of_json_field_0 <optional filter>, .name_of_json_field_1 <optional filter>'
and so on and so forth. A single dot is the simplest filter; it leaves the data field untouched.
jq -r '.name_of_field .'
You may also leave the filter field untouched for the same effect.
In your case:
jq -r '.name, .description'
will extract the values of both those fields.
.[] will unwrap an array to have the next piped filter applied to each unwrapped value. Example:
jq -r '.attributes | .[]
extracts all trait_types objects.
You may sometime want to repackage objects in an array by surrounding the filter in brackets:
jq -r '[.name, .description, .date]
You may sometime want to repackage data in an object by surrounding the filter in curly braces:
`jq -r '{new_field_name: .name, super_new_field_name: .description}'
playing around with these, I was able to get
jq -r '[.name, .description, .date, (.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";"")), (.attributes | [.[] | .value] | .[]] | #csv | gsub(",";";") | gsub("\"";""))] | #csv'
to give us:
"My Collection","This is a great collection.",1639717379161,"Background;Skin;Mouth;Eyes","Sand;Dark Brown;Smile Basic;Confused"
Name, description, and date were left as is, so let's break down the weird parts, one step at a time.
.attributes | [.[] | .trait_type]
.[] extracts each element of the attributes array and pipes the result of that into the next filter, which says to simply extract trait_type, where they are re-packaged in an array.
.attributes | [.[] | .trait_type] | #csv
turn the array into a csv-parsable format.
(.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";""))
Parens separate this from the rest of the evaluations, obviously.
The first gsub here replaces commas with semicolons so they don't get interpreted as a separate field, the second removes all extra double quotes.

How to convert arbitrary nested JSON to CSV with jq – so you can convert it back?

How do I use jq to convert an arbitrary JSON array of objects to CSV, while objects in this array are nested?
StackOverflow has a sea of questions/answers where specific input or output fields are referenced, but I'd like to have a generic solution that
includes a header row,
works for any JSON input including nested arrays + objects,
allows records that have missing values for keys that are present in other records
does not hard-code any field names,
allows converting the CSV back into the nested JSON structure if needed, and
uses key paths as header names (see the following description).
Dot notation
Many JSON-using products (like CouchDB, MongoDB, …) and libraries (like Lodash, …) use variations of syntax that allows access to nested property values / subfields by joining key fragments with a character, often a dot (‘dot notation’).
An example of a key path like this would be "a.b.0.c" to refer to the deeply nested property in this JSON snippet:
{
"a": {
"b": [
{
"c": 123,
}
]
}
}
Caveat: Using this method is a pragmatic solution for most cases, but means that either dot characters have to be banned in property names, or a more complex (and definitely never used property name) has to be invented for escaping dots in property names / accessing nested fields. MongoDB simply banned usage of "." in documents until v5.0, some libraries have workarounds for field access (Lodash example).
Despite this, for simplicity, a solution should use the described dot syntax in the CSV output’s header for nested properties. Bonus if there is a solution variant that solves this problem, e.g. with JSONPath.
Example JSON array as input
[
{
"a": {
"b": [
{
"c": 123
}
]
}
},
{
"a": {
"b": [
{
"c": "foo \" bar",
"d": "qux"
}
]
}
},
{
"a": {
"b": [
{
"d": 456
}
]
}
}
]
Example CSV output
The output should have a header that includes all fields (even if the object at the first array does not have defined values for all existing key paths).
To make the output intuitively editable by humans, each row should represent one object in the input array.
The expected output should look like this:
"a.b.0.c","a.b.0.d"
123,
"foo "" bar","qux"
,456
Command line
This is what I need:
cat example.json | jq <MISSING CODE HERE>
Solution 1, using dot notation
Here is the jq call to convert your array of nested JSON objects to CSV:
jq -r '(. | map(leaf_paths) | unique) as $cols | map (. as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows | ([($cols | map(. | map(tostring) | join(".")))] + $rows) | map(#csv) | .[]
The fastest way to try this solution out is to use JQPlay.
The CSV output will have a header row. It will contain all properties that exist anywhere in the input objects, including nested ones, in dot notation. Each input array element will be represented as a single row, properties that are missing will be represented as empty CSV fields.
Using solution 1 in bash or a similar shell
Create the JSON input file…
echo '[{"a": {"b": [{"c": 123}]}},{"a": {"b": [{"c": "foo \" bar","d": "qux"}]}},{"a": {"b": [{"d": 456}]}}]' > example.json
Then use this jq command to output the CSV on the standard output:
cat example.json | jq -r '(. | map(leaf_paths) | unique) as $cols | map (. as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows | ([($cols | map(. | map(tostring) | join(".")))] + $rows) | map(#csv) | .[]'
…or write the output to example.csv:
cat example.json | jq -r '(. | map(leaf_paths) | unique) as $cols | map (. as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows | ([($cols | map(. | map(tostring) | join(".")))] + $rows) | map(#csv) | .[]' > example.csv
Converting the data from solution 1 back to JSON
Here is a Node.js example that you can try on RunKit. It converts a CSV generated with the method in solution 1 back to an array of nested JSON objects.
Explanation for solution 1
Here is a longer, commented version of the jq filter.
# 1) Find all unique leaf property names of all objects in the input array. Each nested property name is an array with the components of its key path, for example ["a", 0, "b"].
(. | map(leaf_paths) | unique) as $cols |
# 2) Use the found key paths to determine all (nested) property values in the given input records.
map (. as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows |
# 3) Create the raw output array of rows. Each row is represented as an array of values, one element per existing column.
(
# 3.1) This represents the header row. Key paths are generated here.
[($cols | map(. | map(tostring) | join(".")))]
+ # 3.2) concatenate the header row with all other rows
$rows
)
# 4) Convert each row to a escaped CSV string.
| map(#csv)
# 5) output each array element directly. Without this, the result would be a JSON array of CSV strings.
| .[]
Solution 2: for input that does have dots in property names
If you do need to support dot characters in property names, you can either use a different separator string for the key path syntax (replace the dot in "." with something else), or replace the map(tostring) | join(".") part with tostring - this yields a JSON array of strings that you can use as key paths - no dot notation needed. Here is a JQPlay with this solution variant.
Full jq command:
jq -r (. | map(leaf_paths) | unique) as $cols | map (. as $row | ($cols | map(. as $col | $row | getpath($col)))) as $rows | ([($cols | map(. | tostring))] + $rows) | map(#csv) | .[]
The output CSV for the variant would look like this then – it’s less readable and not useful for cases where you want humans to intuitively understand the CSV’s header:
"[""a"",""b"",0,""c""]","[""a"",""b"",0,""d""]"
123,
"foo "" bar","qux"
,456
See below for an idea how to convert this format back to a representation in your programming language.
Bonus: Converting the generated CSV back to JSON
If the input's nested properties contain no ".", it’s simple to convert the CSV back to JSON, for example with a library that supports dot notation, or with JSONPath.
JavaScript: Use Lodash's _.set()
Other languages: Find a package/library that implements JSONPath and use selectors like $.a.b.0.c or $['a']['b'][0]['c'] to set each nested property of each record.
Solution 2 (with JSON arrays as headers) allows you to interpret the headers as JSON array strings. Then you can generate a JSON Path from each header, and re-create all records/objects:
"[""a"",""b"",0,""c""]" (CSV)
→ ["a","b",0,"c"] (array of key-path components after unescaping and parsing as JSON)
→ $.["a"]["b"][0]["c"] (JSONPath)
→ { a: { b: [{c: … }] } } (Nested regenerated object)
I've written an example Node.js script to convert a CSV like this back to JSON. You can try solution 2 in RunKit.
The following tocsv and fromcsv functions provide a solution to the stated problem except for one complication regarding requirement (6) concerning the headers. Essentially, this requirement can be met using the functions given here by adding a matrix transposition step.
Whether or not a transposition step is added, the advantage of the approach taken here is that there are no restrictions on the JSON keys or values. In particular, they may
contain periods (dots), newlines and/or NUL characters.
In the example, an array of objects is given, but in fact any stream of valid JSON documents could be used as input to tocsv; thanks to the magic of jq, the original stream will be recreated by fromcsv (in the sense of entity-by-entity equality).
Of course, since there is no CSV standard, the CSV produced by the
tocsv function might not be understood by all CSV processors. In
particular, please note that the tocsv function defined here maps
embedded newlines in JSON strings or key names to the two-character
string "\n" (i.e., a literal backslash followed by the letter "n");
the inverse operation performs the inverse translation to meet the
"round-trip" requirement.
(The use of tail is just to simplify the presentation; it would be
trivial to modify the solution to make it an only-jq one.)
The CSV is generated on the assumption that any value can be
included in a field so long as (a) the field is quoted, and (b)
double-quotes within the field are doubled.
Any generic solution that supports "round-trips" is bound to be
somewhat complicated. The main reason why the solution presented here is
more complex than one might expect is because a third column is
added, partly to make it easy to distinguish between integers and
integer-valued strings, but mainly because it makes it easy to
distinguish between the size-1 and size-2 arrays produced by jq's
--stream option. Needless to say, there are other ways
these issues could be addressed; the number of calls to jq could
also be reduced.
The solution is presented as a test script that checks the round-trip requirement on a telling test case:
#!/bin/bash
function json {
cat<<EOF
[
{
"a": 1,
"b": [
1,
2,
"1"
],
"c": "d\",ef",
"embed\"ed": "quote",
"null": null,
"string": "null",
"control characters": "a\u0000c",
"newline": "a\nb"
},
{
"x": 1
}
]
EOF
}
function tocsv {
jq -ncr --stream '
(["path", "value", "stringp"],
(inputs | . + [.[1]|type=="string"]))
| map( tostring|gsub("\"";"\"\"") | gsub("\n"; "\\n"))
| "\"\(.[0])\",\"\(.[1])\",\(.[2])"
'
}
function fromcsv {
tail -n +2 | # first duplicate backslashes and deduplicate double-quotes
jq -rR '"[\(gsub("\\\\";"\\\\") | gsub("\"\"";"\\\"") ) ]"' |
jq -c '.[2] as $s
| .[0] |= fromjson
| .[1] |= if $s then . else fromjson end
| if $s == null then [.[0]] else .[:-1] end
# handle newlines
| map(if type == "string" then gsub("\\\\n";"\n") else . end)' |
jq -n 'fromstream(inputs)'
}
# Check the roundtrip:
json | tocsv | fromcsv | jq -s '.[0] == .[1]' - <(json)
Here is the CSV that would be produced by json | tocsv, except that SO seems to disallow literal NULs, so I have replaced that by \0:
"path","value",stringp
"[0,""a""]","1",false
"[0,""b"",0]","1",false
"[0,""b"",1]","2",false
"[0,""b"",2]","1",true
"[0,""b"",2]","false",null
"[0,""c""]","d"",ef",true
"[0,""embed\""ed""]","quote",true
"[0,""null""]","null",false
"[0,""string""]","null",true
"[0,""control characters""]","a\0c",true
"[0,""newline""]","a\nb",true
"[0,""newline""]","false",null
"[1,""x""]","1",false
"[1,""x""]","false",null
"[1]","false",null

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

How to map an object to arrays so it can be converted to csv?

I'm trying to convert an object that looks like this:
{
"123" : "abc",
"231" : "dbh",
"452" : "xyz"
}
To csv that looks like this:
"123","abc"
"231","dbh"
"452","xyz"
I would prefer to use the command line tool jq but can't seem to figure out how to do the assignment. I managed to get the keys with jq '. | keys' test.json but couldn't figure out what to do next.
The problem is you can't convert a k:v object like this straight into csv with #csv. It needs to be an array so we need to convert to an array first. If the keys were named, it would be simple but they're dynamic so its not so easy.
Try this filter:
to_entries[] | [.key, .value]
to_entries converts an object to an array of key/value objects. [] breaks up the array to each of the items in the array
then for each of the items, covert to an array containing the key and value.
This produces the following output:
[
"123",
"abc"
],
[
"231",
"dbh"
],
[
"452",
"xyz"
]
Then you can use the #csv filter to convert the rows to CSV rows.
$ echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries[] | [.key, .value] | #csv'
"123","abc"
"231","dbh"
"452","xyz"
Jeff answer is a good starting point, something closer to what you expect:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))'
[
"123,abc",
"231,dbh",
"452,xyz"
]
But did not find a way to join using newline:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))|join("\n")'
"123,abc\n231,dbh\n452,xyz"
Here's an example I ended up using this morning (processing PagerDuty alerts):
cat /tmp/summary.json | jq -r '
.incidents
| map({desc: .trigger_summary_data.description, id:.id})
| group_by(.desc)
| map(length as $len
| {desc:.[0].desc, length: $len})
| sort_by(.length)
| map([.desc, .length] | #csv)
| join("\n") '
This dumps a CVS-separated document that looks something like:
"[Triggered] Something annoyingly frequent",31
"[Triggered] Even more frequent alert!",35
"[No data] Stats Server is probably acting up",55
Try This
give same output you want
echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries | .[] | "\"" + .key + "\",\"" + (.value | tostring)+ "\""'
onecol2txt () {
awk 'BEGIN { RS="_end_"; FS="\n"}
{ for (i=2; i <= NF; i++){
printf "%s ",$i
}
printf "\n"
}'
}
cat jsonfile | jq -r -c '....,"_end_"' | onecol2txt