Quoted string in CSV input become double-escaped - json

I'm trying to use JQ to process CSV like this which has no column headings:
cat "input.csv"
"12345678901234567890","2019-03-19",12
Is there more elegant and readable way to remove escaped quotes for the first and second fields--and overall, to build a stream of objects given such input?
Ideally I would like to have a reusable script which builds JSON from an artbitrary CSV, given a file and a list of fields in it passed as a command-line argument.
Current JQ script and output:
cat "input.csv" |
jq \
--raw-input '
. |
split("\n") |
map( split(",")) |
.[0] |
{
ID: (.[0] | fromjson),
date: (.[1] | fromjson),
count: (.[2] | tonumber)
}'
{
"ID": "12345678901234567890",
"date": "2019-03-19",
"count": 1
}
Output of the same script without | fromjson used which results in quoted quotes, which I would like to avoid:
{
"ID": "\"12345678901234567890\"",
"date": "\"2019-03-19\"",
"count": 1
}

Your invocation of jq can be simplified to:
jq -R '
split(",")
| map(fromjson)
| {ID: .[0], date: .[1], count: .[2] }'
Generic solution
jq -R --argjson header '["ID", "date", "count"]' '
split(",")
| map(fromjson)
| [ $header, . ]
| transpose
| reduce .[] as $kv ({}; .[$kv[0]] =$kv[1]) '
If you want to specify the headers in a file, use the --argfile command-line option instead.

Related

How do I write a jq query to convert a JSON file to CSV?

The JSON files look like:
{
"name": "My Collection",
"description": "This is a great collection.",
"date": 1639717379161,
"attributes": [
{
"trait_type": "Background",
"value": "Sand"
},
{
"trait_type": "Skin",
"value": "Dark Brown"
},
{
"trait_type": "Mouth",
"value": "Smile Basic"
},
{
"trait_type": "Eyes",
"value": "Confused"
}
]
}
I found a shell script that uses jq and has this code:
i=1
for eachFile in *.json; do
cat $i.json | jq -r '.[] | {column1: .name, column2: .description} | [.[] | tostring] | #csv' > extract-$i.csv
echo "converted $i of many json files..."
((i=i+1))
done
But its output is:
jq: error (at <stdin>:34): Cannot index string with string "name"
converted 1 of many json files...
Any suggestions on how I can make this work? Thank you!
Quick jq lesson
===========
jq filters are applied like this:
jq -r '.name_of_json_field_0 <optional filter>, .name_of_json_field_1 <optional filter>'
and so on and so forth. A single dot is the simplest filter; it leaves the data field untouched.
jq -r '.name_of_field .'
You may also leave the filter field untouched for the same effect.
In your case:
jq -r '.name, .description'
will extract the values of both those fields.
.[] will unwrap an array to have the next piped filter applied to each unwrapped value. Example:
jq -r '.attributes | .[]
extracts all trait_types objects.
You may sometime want to repackage objects in an array by surrounding the filter in brackets:
jq -r '[.name, .description, .date]
You may sometime want to repackage data in an object by surrounding the filter in curly braces:
`jq -r '{new_field_name: .name, super_new_field_name: .description}'
playing around with these, I was able to get
jq -r '[.name, .description, .date, (.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";"")), (.attributes | [.[] | .value] | .[]] | #csv | gsub(",";";") | gsub("\"";""))] | #csv'
to give us:
"My Collection","This is a great collection.",1639717379161,"Background;Skin;Mouth;Eyes","Sand;Dark Brown;Smile Basic;Confused"
Name, description, and date were left as is, so let's break down the weird parts, one step at a time.
.attributes | [.[] | .trait_type]
.[] extracts each element of the attributes array and pipes the result of that into the next filter, which says to simply extract trait_type, where they are re-packaged in an array.
.attributes | [.[] | .trait_type] | #csv
turn the array into a csv-parsable format.
(.attributes | [.[] | .trait_type] | #csv | gsub(",";";") | gsub("\"";""))
Parens separate this from the rest of the evaluations, obviously.
The first gsub here replaces commas with semicolons so they don't get interpreted as a separate field, the second removes all extra double quotes.

Grouping and sorting JSON records in Bash

I'm using curl to get JSON file. My problem is that I would like to get group of 4 words in one line, then break the line, and sort it by first column.
I'm trying:
curl -L 'http://mylink/ | jq '.[]| .location, .host_name, .serial_number, .model'
I'm getting
"Office-1"
"work-1"
"11xxx111"
"hp"
"Office-2"
"work-2"
"33xxx333"
"lenovo"
"Office-1"
"work-3"
"22xxx222"
"dell"
I would like to have:
"Office-1", "work-1", "11xxx111", "hp"
"Office-1" "work-3", "22xxx222", "dell"
"Office-2", "work-2", "33xxx333", "lenovo"
I tried jq -S ".[]| .location| group_by(.location), and few other combinations like sort_by(.location) but it doesn't work. I'm getting error: jq: error (at <stdin>:1): Cannot iterate over string ("Office-1")
Sample of my JSON file:
[
{
"location": "Office-1",
"host_name": "work-1",
"serial_number": "11xxx111",
"model": "hp"
},
{
"location": "Office-2",
"host_name": "work-2",
"serial_number": "33xxx333",
"model": "lenovo"
},
{
"location": "Office-1",
"host_name": "work-3",
"serial_number": "22xxx222",
"model": "dell"
}
]
To sort by .location only, without an external sort:
map( [ .location, .host_name, .serial_number, .model] )
| sort_by(.[0])[]
| map("\"\(.)\"") | join(", ")
The ", " is per the stated requirements.
If you want the output as CSV, simply replace the last line in the jq program above by #csv.
If minimizing keystrokes is a goal, then if you are certain that the keys are always in the desired order, you could get away with replacing the first line by map( [ .[] ] )
You can ask jq to produce arbitrary formatted strings.
curl -L 'http://mylink/ |
jq -r '.[]| "\"\(.location)\", \"\(.host_name)\", \"\(.serial_number)\", \"\(.model)\""' |
sort
Inside the double quotes, \" produces literal double quotes, and \(.field) interpolates a field name. The -r option is required to produce output which isn't JSON.
This will get you the output you wanted:
jq -r 'group_by(.location) | .[] | .[] | map(values) | "\"" + join ("\", \"") + "\""'
like so:
$ jq -r 'group_by(.location) | .[] | .[] | map(values) | "\"" + join ("\", \"") + "\""' /tmp/so7713.json
"Office-1", "work-1", "11xxx111", "hp"
"Office-1", "work-3", "22xxx222", "dell"
"Office-2", "work-2", "33xxx333", "lenovo"
If you want it all as one string, it's a bit simpler:
$ jq 'group_by(.location) | .[] | .[] | map(values) | join (", ")' /tmp/so7713.json
"Office-1, work-1, 11xxx111, hp"
"Office-1, work-3, 22xxx222, dell"
"Office-2, work-2, 33xxx333, lenovo"
Note the lack of -r in the second example.
I feel there has to be a better way of doing .[] | .[], but I don't know what it is (yet).

Nested array in JSON to different rows in CSV

I have the following JSON:
{
"transmitterId": "30451155eda2",
"rssiSignature": [
{
"receiverId": "001bc509408201d5",
"receiverIdType": 1,
"rssi": -52,
"numberOfDecodings": 5,
"rssiSum": -52
},
{
"receiverId": "001bc50940820228",
"receiverIdType": 1,
"rssi": -85,
"numberOfDecodings": 5,
"rssiSum": -85
}
],
"timestamp": 1574228579837
}
I want to convert it to CSV format, where each row corresponds to an entry in rssiSignature (I have added the header row for visualization purposes):
timestamp,transmitterId,receiverId,rssi
1574228579837,"30451155eda2","001bc509408201d5",-52
1574228579837,"30451155eda2","001bc50940820228",-85
My current attempt is the following, but I get a single CSV row:
$ jq -r '[.timestamp, .transmitterId, .rssiSignature[].receiverId, .rssiSignature[].rssi] | #csv' test.jsonl
1574228579837,"30451155eda2","001bc509408201d5","001bc50940820228",-52,-85
How can I use jq to generate different rows for each entry of the rssiSignature array?
In order to reuse a value of the upper level, like the timestamp, for every item of the rssiSignature array, you can define it as a variable. You can get your csv like this:
jq -r '.timestamp as $t | .transmitterId as $tid |
.rssiSignature[] | [ $t, $tid, .receiverId, .rssi] | #csv
' file.json
Output:
1574228579837,"30451155eda2","001bc509408201d5",-52
1574228579837,"30451155eda2","001bc50940820228",-85
Also here is an way to print headers for an output file in bash, independent of what commands we call, using commands grouping.
(
printf "timestamp,transmitterId,receiverId,rssi\n"
jq -r '.timestamp as $t | .transmitterId as $tid |
.rssiSignature[] | [ $t, $tid, .receiverId, .rssi] | #csv
' file.json
) > output.csv
Actually, the task can be accomplished without the use of any variables; one can also coax jq to include a header:
jq -r '
["timestamp","transmitterId","receiverId","rssi"],
[.timestamp, .transmitterId] + (.rssiSignature[] | [.receiverId,.rssi])
| #csv'
A single header with multiple files
One way to produce a single header with multiple input files would be to use inputs in conjunction with the -n command-line option. This happens also to be efficient:
jq -nr '
["timestamp","transmitterId","receiverId","rssi"],
(inputs |
[.timestamp, .transmitterId] + (.rssiSignature[] | [.receiverId,.rssi]))
| #csv'

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

How to map an object to arrays so it can be converted to csv?

I'm trying to convert an object that looks like this:
{
"123" : "abc",
"231" : "dbh",
"452" : "xyz"
}
To csv that looks like this:
"123","abc"
"231","dbh"
"452","xyz"
I would prefer to use the command line tool jq but can't seem to figure out how to do the assignment. I managed to get the keys with jq '. | keys' test.json but couldn't figure out what to do next.
The problem is you can't convert a k:v object like this straight into csv with #csv. It needs to be an array so we need to convert to an array first. If the keys were named, it would be simple but they're dynamic so its not so easy.
Try this filter:
to_entries[] | [.key, .value]
to_entries converts an object to an array of key/value objects. [] breaks up the array to each of the items in the array
then for each of the items, covert to an array containing the key and value.
This produces the following output:
[
"123",
"abc"
],
[
"231",
"dbh"
],
[
"452",
"xyz"
]
Then you can use the #csv filter to convert the rows to CSV rows.
$ echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries[] | [.key, .value] | #csv'
"123","abc"
"231","dbh"
"452","xyz"
Jeff answer is a good starting point, something closer to what you expect:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))'
[
"123,abc",
"231,dbh",
"452,xyz"
]
But did not find a way to join using newline:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))|join("\n")'
"123,abc\n231,dbh\n452,xyz"
Here's an example I ended up using this morning (processing PagerDuty alerts):
cat /tmp/summary.json | jq -r '
.incidents
| map({desc: .trigger_summary_data.description, id:.id})
| group_by(.desc)
| map(length as $len
| {desc:.[0].desc, length: $len})
| sort_by(.length)
| map([.desc, .length] | #csv)
| join("\n") '
This dumps a CVS-separated document that looks something like:
"[Triggered] Something annoyingly frequent",31
"[Triggered] Even more frequent alert!",35
"[No data] Stats Server is probably acting up",55
Try This
give same output you want
echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries | .[] | "\"" + .key + "\",\"" + (.value | tostring)+ "\""'
onecol2txt () {
awk 'BEGIN { RS="_end_"; FS="\n"}
{ for (i=2; i <= NF; i++){
printf "%s ",$i
}
printf "\n"
}'
}
cat jsonfile | jq -r -c '....,"_end_"' | onecol2txt