How to convert json into csv file using jq? - json

This is my json file:
{
"ClientCountry": "ca",
"ClientASN": 812,
"CacheResponseStatus": 404,
"CacheResponseBytes": 130756,
"CacheCacheStatus": "hit"
}
{
"ClientCountry": "ua",
"ClientASN": 206996,
"CacheResponseStatus": 301,
"CacheResponseBytes": 142,
"CacheCacheStatus": "unknown"
}
{
"ClientCountry": "ua",
"ClientASN": 206996,
"CacheResponseStatus": 0,
"CacheResponseBytes": 0,
"CacheCacheStatus": "unknown"
}
I want to convert these json into csv like below.
"ClientCountry", "ClientASN","CacheResponseStatus", "CacheResponseBytes", "CacheCacheStatus"
"ca", 812, 404, 130756, "hit";
"ua", 206996, 301, 142,"unknown";
"ua", 206996, 0,0,"unknown";
Please let me know how to achieve this using jq?
I just tried below. But its not working.
jq 'to_entries[] | [.key, .value] | #csv'
Regards
Palani

Since you want all the key-values,
then assuming that the keys are presented in a consistent order in the input file, you can simply write:
jq -r '[.[]] | #csv' palanikumar.json
With the given input, this produces the following CSV:
"ca",812,404,130756,"hit"
"ua",206996,301,142,"unknown"
"ua",206996,0,0,"unknown"
Adding the headers and the trailing semicolons (if you really want them) is left as a (very easy) exercise.
Inconsistent ordering
If the ordering of the keys varies or might vary, then the following could be used to produce suitable CSV, assuming that the ordering of the keys in the first object in the input stream should be used:
input
| . as $first
| keys_unsorted as $keys
| $keys, [$first[]], (inputs | [.[$keys[]]]) | #csv
The appropriate invocation of jq would include both the -n and -r command-line options.

Look at these links
How to convert arbirtrary simple JSON to CSV using jq?
http://bigdatums.net/2017/09/30/convert-json-to-csv-with-jq/
( jq -r '.myarray | #csv' )

Related

jq - Looping through json and concatenate the output to single string

I was currently learning the usage of jq. I have a json file and I am able to loop through and filter out the values I need from the json. However, I am running into issue when I try to combine the output into single string instead of having the output in multiple lines.
File svcs.json:
[
{
"name": "svc-A",
"run" : "True"
},
{
"name": "svc-B",
"run" : "False"
},
{
"name": "svc-C",
"run" : "True"
}
]
I was using the jq to filter to output the service names with run value as True
jq -r '.[] | select(.run=="True") | .name ' svcs.json
I was getting the output as follows:
svc-A
svc-C
I was looking to get the output as single string separated by commas.
Expected Output:
"svc-A,svc-C"
I tried to using join, but was unable to get it to work so far.
The .[] expression explodes the array into a stream of its elements. You'll need to collect the transformed stream (the names) back into an array. Then you can use the #csv filter for the final output
$ jq -r '[ .[] | select(.run=="True") | .name ] | #csv' svcs.json
"svc-A","svc-C"
But here's where map comes in handy to operate on an array's elements:
$ jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Keep the array using map instead of decomposing it with .[], then join with a glue string:
jq -r 'map(select(.run=="True") | .name) | join(",")' svcs.json
svc-A,svc-C
Demo
If your goal is to create a CSV output, there is a special #csv command taking care of quoting, escaping etc.
jq -r 'map(select(.run=="True") | .name) | #csv' svcs.json
"svc-A","svc-C"
Demo

Generate csv files from a JSON

Unfortunately I have considerable difficulties to generate three csv files from one json format. Maybe someone has a good hint how I could do this. Thanks
Here is the output. Within dropped1 and dropped2 can be several different and multiple addresses.
{
"result": {
"found": 0,
"dropped1": {
"address10": 1140
},
"rates": {
"total": {
"1min": 3579,
"5min": 1593,
"15min": 5312,
"60min": 1328
},
"dropped2": {
"address20": {
"1min": 9139,
"5min": 8355,
"15min": 2785,
"60min": 8196
}
}
},
"connections": 1
},
"id": "whatever",
"jsonrpc": "2.0"
}
The 3 csv files should be displayed in this form.
address10,1140
total,3579,1593,5312,1328
address20,9139,8355,2785,8196
If you decide to use jq, then unless there is some specific reason not to, I'd suggest invoking jq once for each of the three output files. The three invocations would then look like these:
jq -r '.result.dropped1 | [to_entries[][]] | #csv' > 1.csv
jq -r '.result.rates.total | ["total", .["1min"], .["5min"], .["15min"], .["60min"]] | #csv' > 2.csv
jq -r '.result.rates.dropped2
| to_entries[]
| [.key] + ( .value | [ .["1min"], .["5min"], .["15min"], .["60min"]] )
| #csv
' > 3.csv
If you can be sure the ordering of keys within the total and address20 objects is fixed and in the correct order, then the last two invocations can be simplified.
Did you try using this library?
https://www.npmjs.com/package/json-to-csv-stream
npm i json-to-csv-stream

jq - parsing& replacement based on key-value pairs within json

I have a json file in the form of a key-value map. For example:
{
"users":[
{
"key1":"user1",
"key2":"user2"
}
]
}
I have another json file. The values in the second file has to be replaced based on the keys in first file.
For example 2nd file is:
{
"info":
{
"users":["key1","key2","key3","key4"]
}
}
This second file should be replaced with
{
"info":
{
"users":["user1","user2","key3","key4"]
}
}
Because the value of key1 in first file is user1. this could be done with any python program, but I am learning jq and would like to try if it is possible with jq itself. I tried different combinations with reading file using slurpfile, then select & walk etc. But couldn't arrive at the required solution.
Any suggestions for the same will be appreciated.
Since .users[0] is a JSON dictionary, it would make sense to use it as such (e.g. for efficiency):
Invocation:
jq -c --slurpfile users users.json -f program.jq input.json
program.jq:
$users[0].users[0] as $dict
| .info.users |= map($dict[.] // .)
Output:
{"info":{"users":["user1","user2","key3","key4"]}}
Note: the above assumes that the dictionary contains no null or false values, or rather that any such values in the dictionary should be ignored. This avoids the double lookup that would otherwise be required. If this assumption is invalid, then a solution using has or in (e.g. as provided by RomanPerekhrest) would be appropriate.
Solution to supplemental problem
(See "comments".)
$users[0].users[0] as $dict
| second
| .info.users |= (map($dict[.] | select(. != null)))
sponge
It is highly inadvisable to use redirection to overwrite an input file.
If you have or can install sponge, then it would be far better to use it. For further details, see e.g. "What is jq's equivalent of sed -i?" in the jq FAQ.
jq solution:
jq --slurpfile users 1st.json '$users[0].users[0] as $users
| .info.users |= map(if in($users) then $users[.] else . end)' 2nd.json
The output:
{
"info": {
"users": [
"user1",
"user2",
"key3",
"key4"
]
}
}

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

How to convert arbitrary simple JSON to CSV using jq?

Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?
There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').
Examples
Input:
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
Possible output:
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
Possible output:
"code","name","level","country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
Input:
[
{"name": "bang", "value": "!", "level": 0},
{"name": "letters", "value": "a,b,c", "level": 0},
{"name": "letters", "value": "x,y,z", "level": 1},
{"name": "bang", "value": "\"!\"", "level": 1}
]
Possible output:
name,value,level
bang,!,0
letters,"a,b,c",0
letters,"x,y,z",1
bang,"""!""",0
Possible output:
"name","value","level"
"bang","!","0"
"letters","a,b,c","0"
"letters","x,y,z","1"
"bang","""!""","1"
First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:
(map(keys) | add | unique) as $cols
Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.
map(. as $row | $cols | map($row[.])) as $rows
Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the #csv filter.
$cols, $rows[] | #csv
All together now. Remember to use the -r flag to get the result as a raw string:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv'
The Skinny
jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | #csv'
or:
jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | #csv'
The Details
Aside
Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).
$ jq -c '.[]' <<<'["a", "b"]'
"a"
"b"
$ jq -cn '"a", "b"'
"a"
"b"
Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -c option) shows that each array element (or argument to the , filter) becomes a separate object in the output (each is on a separate line).
A stream is like a JSON-seq, but uses newlines rather than RS as an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.
Constructing the Filter
The first object's keys can be extracted with:
.[0] | keys_unsorted
Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.
To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.
(.[0] | keys_unsorted) as $keys | $keys, ...
The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keys is an array, not a sequence, so [] is applied to convert it to a sequence,
$keys[]
which can then be passed to .[]
.[ $keys[] ]
This, too, produces a sequence, so the array constructor is used to convert it to an array.
[.[ $keys[] ]]
This expression is to be applied to a single object. map() is used to apply it to all objects in the outer array:
map([.[ $keys[] ]])
Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.
map([.[ $keys[] ]])[]
Why bundle the sequence into an array within the map only to unbundle it outside? map produces an array; .[ $keys[] ] produces a sequence. Applying map to the sequence from .[ $keys[] ] would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.
["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]
The values from each object need to be kept separate, so that they become separate rows in the final output.
Finally, the sequence is passed through #csv formatter.
Alternate
The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and + used to append the array of values. This still needs to be converted to a sequence before being passed to #csv.
The following filter is slightly different in that it will ensure every value is converted to a string. (jq 1.5+)
# For an array of many objects
jq -f filter.jq [file]
# For many objects (not within array)
jq -s -f filter.jq [file]
Filter: filter.jq
def tocsv:
(map(keys)
|add
|unique
|sort
) as $cols
|map(. as $row
|$cols
|map($row[.]|tostring)
) as $rows
|$cols,$rows[]
| #csv;
tocsv
$cat test.json
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #tsv ' test.json
Code Name Level Country
NSW New South Wales state AU
AB Alberta province CA
ABD Aberdeenshire council area GB
AK Alaska state US
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #csv ' test.json
"Code","Name","Level","Country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.
def to_csv($headers):
def _object_to_csv:
($headers | #csv),
(.[] | [.[$headers[]]] | #csv);
def _array_to_csv:
($headers | #csv),
(.[][:$headers|length] | #csv);
if .[0]|type == "object"
then _object_to_csv
else _array_to_csv
end;
So you could use it like so:
to_csv([ "code", "name", "level", "country" ])
This variant of Santiago's program is also safe but ensures that the key names in
the first object are used as the first column headers, in the same order as they
appear in that object:
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $firstkeys
| (map(keys) | add | unique) as $allkeys
| ($firstkeys + ($allkeys - $firstkeys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| #csv
end ;
tocsv
If you're open to using other Unix tools, csvkit has an in2csv tool:
in2csv example.json
Using your sample data:
> in2csv example.json
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
I like the pipe approach for piping directly from jq:
cat example.json | in2csv -f json -
A simple way is to just use string concatenation. If your input is a proper array:
# filename.txt
[
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"}
]
then index with .[]:
cat filename.txt | jq -r '.[] | .field1 + ", " + .field2'
or if it's just line by line objects:
# filename.txt
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
just do this:
cat filename.txt | jq -r '.field1 + ", " + .field2'