Convert json to csv / jq Cannot iterate over string - json

[
{
"Description": "Copied for Destination xxx from Sourc 30c for Snapshot 1. Task created on X,52,87,14,76.",
"Encrypted": false,
"ID": "snap-074",
"Progress": "100%",
"Time": "2019-06-11T09:25:23.110Z",
"Owner": "883065",
"Status": "completed",
"Volume": "vol1",
"Size": 16
},
{
"Description": "Copied for Destination yy from Source 31c for Snapshot 2. Task created on X,52,87,14,76.",
"Encrypted": false,
"ID": "snap-096",
"Progress": "100%",
"Time": "2019-06-11T10:18:01.410Z",
"Owner": "1259",
"Status": "completed",
"Volume": "vol-2",
"Size": 4
}
]
I have that json file that I'm trying to convert to csv using the following command:
jq -r '. | map(.Description[], .Encrypted, .ID, .Progress, .Time, .Owner, .Status, .Volume, .Size | join(",")) | join("\n")' snapshots1.json
But I'm getting error:
jq: error (at snapshots1.json:24): Cannot iterate over string ("Copied for...)
I look at similar post in jq: error: Cannot iterate over string but can't figure out the error. Any help is appreciated.

jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' snapshots1.json >> myfile.csv
Found this post that explains this code and it worked for me.

I think you were on the right track. Here is how I'd do it:
jq -r '.[] | map(..) | #csv' snapshot1.json > snapshot1.csv
There's a couple of small problems with your code:
.Descriptions[] - Descriptions doesn't have an array so the square brackets don't work - there's no array to open.
Suppose we get rid of the square brackets, you see that the code works insofar as it puts the contents of the objects into an array. However, it put the contents into one array - the result is that your csv will only have one line (and I'm assuming that you want each object on separate rows.). This is because the map function puts all the contents into one array (see documentation: jq Manual) - so you have to split open the array first.
The first part of your code with the dot (.) doesn't do anything - it simply returns the whole JSON as is. If you want play around with it, try .[] and then experiment from there.
Edited: Spelling

There's a risk in using .. here to extract the "values" in an object: what if the ordering of the keys in the input objects differs between objects?
Here's a generic filter which addresses this and other issues. It also emits a suitable "header" line:
def object2array(stream):
foreach stream as $x (null;
if . == null then $x | [true, keys_unsorted] else .[0]=false end;
(if .[0] then .[1] else empty end),
.[1] as $keys | $x | [getpath( $keys[] | [.]) ] );
Example
def data: [{a:1,b:2}, {b:22,a:11,c:0}];
object2array(data[])
produces:
["a","b"]
[1,2]
[11,22]
Just right for piping to #csv or #tsv.
Solution
So the solution to the original problem would essentially be:
object2array(.[]) | #csv

Related

JQ stop after the first match

I have a large file (about 500 megabytes) and the data in it is in JSON format.
{
"0001": [
"aaaaa",
"qqqqq"
],
"0002": [
"aaaaa"
],
"0003": [
"ccccc"
],
"0004": [
"bbbbb"
]
...
}
I need to extract from it:
aaaaa
qqqqq
At the moment and I do the following jq -r 'try . "0001" | .[]' ./1.txt, it works, but the problem is that it takes a very long time because the search continues on through the whole file, instead of stopping immediately after the first match.
Please advise me a way to stop further scouting if an input has already been found. I know that there is a first(inputs | ), but I don't understand how to implements this command.
If it is known beforehand that the relevant key is the first one in the JSON object, then the solution using --stream and first/1 as given by #pmf is applicable; otherwise, it could be adapted as follows:
jq --stream -n 'first(fromstream(1 | truncate_stream(inputs|select(.[0][0] =="0001"))))[]' input.json
This works for the sample input:
jq --null-input --raw-output --stream 'label $out | inputs | if .[0][0] == "0001" then (if length == 2 then .[1] else break $out end) else empty end' file

Map arrays to objects with no common fields

How might one use jq-1.5-1-a5b5cbe to join a filtered set of arrays from STDIN to a set of objects which contains no common fields, assuming that all elements will be in predictable order?
Standard Input (pre-slurpfile; generated by multiple GETs):
{"ref":"objA","arr":["alpha"]}
{"ref":"objB","arr":["bravo"]}
Existing File:
[{"name":"foo"},{"name":"bar"}]
Desired Output:
[{"name":"foo","arr":["alpha"]},{"name":"bar","arr":["bravo"]}]
Current Bash:
$ multiGET | jq --slurpfile stdin /dev/stdin '.[].arr = $stdin[].arr' file
[
{
"name": "foo",
"arr": [
"alpha"
]
},
{
"name": "bar",
"arr": [
"alpha"
]
}
]
[
{
"name": "foo",
"arr": [
"bravo"
]
},
{
"name": "bar",
"arr": [
"bravo"
]
}
]
Sidenote: I wasn't sure when to use pretty/compact JSON in this question; please comment with your opinion on best practice.
Get jq to read file before stdin, so that the first entity in file will be . and you can get everything else using inputs.
$ multiGET | jq -c '. as $objects
| [ foreach (inputs | {arr}) as $x (-1; .+1;
. as $i | $objects[$i] + $x
) ]' file -
[{"name":"foo","arr":["alpha"]},{"name":"bar","arr":["bravo"]}]
"Slurping" (whether using -s or --slurpfile) is sometimes necessary but rarely desirable, because of the memory requirements. So here's a solution that takes advantage of the fact that your multiGET produces a stream:
multiGET | jq -n --argjson objects '[{"name":"foo"},{"name":"bar"}]' '
$objects
| [foreach inputs as $in (-1; .+1;
. as $ix
| $objects[$ix] + ($in | del(.ref)))]
'
Here's a functional approach that might be appropriate if your stream was in fact already packaged as an array:
multiGET | jq -s --argjson objects '[{"name":"foo"},{"name":"bar"}]' '
[$objects, map(del(.ref))]
| transpose
| map(add)
'
If the $objects array is in a file or too big for the command line, I'd suggest using --argfile, even though it is technically deprecated.
If the $objects array is in a file, and if you want to avoid --argfile, you could still avoid slurping, e.g. by using the fact that unless -n is used, jq will automatically read one JSON entity from stdin:
(echo '[{"name":"foo"},{"name":"bar"}]';
multiGET) | jq '
. as $objects
| [foreach inputs as $in (-1; .+1;
. as $ix | $objects[$ix] + $in | del(.ref))]
'

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

jq: Conditionally update/replace/add json elements using an input file

I receive the following input file:
input.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"bbb_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:30:00","VALUE":22,"FLAG":"0"},
{"ID":"ccc_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:30:00","VALUE":20,"FLAG":"0"},
{"ID":"ddd_122455","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:30:00","VALUE":null,"FLAG":"?"},
]
As you can see there are some valid values (FLAG: 0) and some invalid values (FLAG: "?").
Now I got a file looking like this (one for each ID):
aaa.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
As you can see, object one is the same as in input.json but object two is invalid (FLAG: "?"). That's why object two has to be replaced by the correct object from input.json (with VALUE:18).
Objects can be identified by "time_CET" and "ID" element.
Additionally, there will be new objects in input.json, that have not been part of aaa.json etc. These objects should be added to the array, and valid objects from aaa.json should be kept.
In the end, aaa.json should look like this:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
So, to summarize:
look for FLAG: "?" in aaa.json
replace this object with matching object from input.json using "ID"
and "time_CET" for mapping.
Keep exisiting valid objects and add objects from input.json that
did not exist in aaa.json before (this means only objects starting
with "aaa" in "ID" field)
repeat this for bbb.json, ccc.json and ddd.json
I am not sure if it's possible to get this done all at once with a command like this, because the output has to go to back to the correct id files (aaa, bbb ccc.json):
jq --argfile aaa aaa.json --argfile bbb bbb.json .... -f prog.jq input.json
The problem is, that the number after the identifier (aaa, bbb, ccc etc.) may change. So to make sure objects are added to the correct file/array, a statement like this would be required:
if (."ID"|contains("aaa")) then ....
Or is it better to run the program several times with different input parameters? I am not sure..
Thank you in advance!!
Here is one approach
#!/bin/bash
# usage: update.sh input.json aaa.json bbb.json....
# updates each of aaa.json bbb.json....
input_json="$1"
shift
for i in "$#"; do
jq -M --argfile input_json "$input_json" '
# functions to restrict input.json to keys of current xxx.json file
def prefix: input_filename | split(".")[0];
def selectprefix: select(.ID | startswith(prefix));
# functions to build and probe a lookup table
def pk: [.ID, .time_CET];
def lookup($t;$k): $t | getpath($k);
def lookup($t): lookup($t;pk);
def organize(s): reduce s as $r ({}; setpath($r|pk; $r));
# functions to identify objects in input.json missing from xxx.json
def pks: paths | select(length==2);
def missing($t1;$t2): [$t1|pks] - [$t2|pks] | .[];
def getmissing($t1;$t2): [ missing($t1;$t2) as $p | lookup($t1;$p)];
# main routine
organize(.[]) as $xxx
| organize($input_json[] | selectprefix) as $inp
| map(if .FLAG != "?" then . else . += lookup($inp) end)
| . + getmissing($inp;$xxx)
' "$i" | sponge "$i"
done
The script uses jq in a loop to read and update each aaa.json... file.
The filter creates temporary objects to facilitate looking up values by [ID,time_CET], updates any values in the aaa.json with a FLAG=="?" and finally adds any values from input.json that are missing in aaa.json.
The temporary lookup table for input.json uses input_filename so that only keys starting with a prefix matching the name of the currently processed file will be included.
Sample Run:
$ ./update.sh input.json aaa.json
aaa.json after run:
[
{
"ID": "aaa_12301248",
"time_CET": "00:00:00",
"VALUE": 10,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:15:00",
"VALUE": 18,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:55:00",
"VALUE": 45,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:30:00",
"VALUE": 160,
"FLAG": "0"
}
]

How to convert arbitrary simple JSON to CSV using jq?

Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?
There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').
Examples
Input:
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
Possible output:
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
Possible output:
"code","name","level","country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
Input:
[
{"name": "bang", "value": "!", "level": 0},
{"name": "letters", "value": "a,b,c", "level": 0},
{"name": "letters", "value": "x,y,z", "level": 1},
{"name": "bang", "value": "\"!\"", "level": 1}
]
Possible output:
name,value,level
bang,!,0
letters,"a,b,c",0
letters,"x,y,z",1
bang,"""!""",0
Possible output:
"name","value","level"
"bang","!","0"
"letters","a,b,c","0"
"letters","x,y,z","1"
"bang","""!""","1"
First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:
(map(keys) | add | unique) as $cols
Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.
map(. as $row | $cols | map($row[.])) as $rows
Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the #csv filter.
$cols, $rows[] | #csv
All together now. Remember to use the -r flag to get the result as a raw string:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv'
The Skinny
jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | #csv'
or:
jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | #csv'
The Details
Aside
Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).
$ jq -c '.[]' <<<'["a", "b"]'
"a"
"b"
$ jq -cn '"a", "b"'
"a"
"b"
Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -c option) shows that each array element (or argument to the , filter) becomes a separate object in the output (each is on a separate line).
A stream is like a JSON-seq, but uses newlines rather than RS as an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.
Constructing the Filter
The first object's keys can be extracted with:
.[0] | keys_unsorted
Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.
To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.
(.[0] | keys_unsorted) as $keys | $keys, ...
The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keys is an array, not a sequence, so [] is applied to convert it to a sequence,
$keys[]
which can then be passed to .[]
.[ $keys[] ]
This, too, produces a sequence, so the array constructor is used to convert it to an array.
[.[ $keys[] ]]
This expression is to be applied to a single object. map() is used to apply it to all objects in the outer array:
map([.[ $keys[] ]])
Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.
map([.[ $keys[] ]])[]
Why bundle the sequence into an array within the map only to unbundle it outside? map produces an array; .[ $keys[] ] produces a sequence. Applying map to the sequence from .[ $keys[] ] would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.
["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]
The values from each object need to be kept separate, so that they become separate rows in the final output.
Finally, the sequence is passed through #csv formatter.
Alternate
The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and + used to append the array of values. This still needs to be converted to a sequence before being passed to #csv.
The following filter is slightly different in that it will ensure every value is converted to a string. (jq 1.5+)
# For an array of many objects
jq -f filter.jq [file]
# For many objects (not within array)
jq -s -f filter.jq [file]
Filter: filter.jq
def tocsv:
(map(keys)
|add
|unique
|sort
) as $cols
|map(. as $row
|$cols
|map($row[.]|tostring)
) as $rows
|$cols,$rows[]
| #csv;
tocsv
$cat test.json
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #tsv ' test.json
Code Name Level Country
NSW New South Wales state AU
AB Alberta province CA
ABD Aberdeenshire council area GB
AK Alaska state US
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #csv ' test.json
"Code","Name","Level","Country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.
def to_csv($headers):
def _object_to_csv:
($headers | #csv),
(.[] | [.[$headers[]]] | #csv);
def _array_to_csv:
($headers | #csv),
(.[][:$headers|length] | #csv);
if .[0]|type == "object"
then _object_to_csv
else _array_to_csv
end;
So you could use it like so:
to_csv([ "code", "name", "level", "country" ])
This variant of Santiago's program is also safe but ensures that the key names in
the first object are used as the first column headers, in the same order as they
appear in that object:
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $firstkeys
| (map(keys) | add | unique) as $allkeys
| ($firstkeys + ($allkeys - $firstkeys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| #csv
end ;
tocsv
If you're open to using other Unix tools, csvkit has an in2csv tool:
in2csv example.json
Using your sample data:
> in2csv example.json
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
I like the pipe approach for piping directly from jq:
cat example.json | in2csv -f json -
A simple way is to just use string concatenation. If your input is a proper array:
# filename.txt
[
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"}
]
then index with .[]:
cat filename.txt | jq -r '.[] | .field1 + ", " + .field2'
or if it's just line by line objects:
# filename.txt
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
just do this:
cat filename.txt | jq -r '.field1 + ", " + .field2'