jq: Conditionally update/replace/add json elements using an input file - json

I receive the following input file:
input.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"bbb_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:30:00","VALUE":22,"FLAG":"0"},
{"ID":"ccc_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:30:00","VALUE":20,"FLAG":"0"},
{"ID":"ddd_122455","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:30:00","VALUE":null,"FLAG":"?"},
]
As you can see there are some valid values (FLAG: 0) and some invalid values (FLAG: "?").
Now I got a file looking like this (one for each ID):
aaa.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
As you can see, object one is the same as in input.json but object two is invalid (FLAG: "?"). That's why object two has to be replaced by the correct object from input.json (with VALUE:18).
Objects can be identified by "time_CET" and "ID" element.
Additionally, there will be new objects in input.json, that have not been part of aaa.json etc. These objects should be added to the array, and valid objects from aaa.json should be kept.
In the end, aaa.json should look like this:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
So, to summarize:
look for FLAG: "?" in aaa.json
replace this object with matching object from input.json using "ID"
and "time_CET" for mapping.
Keep exisiting valid objects and add objects from input.json that
did not exist in aaa.json before (this means only objects starting
with "aaa" in "ID" field)
repeat this for bbb.json, ccc.json and ddd.json
I am not sure if it's possible to get this done all at once with a command like this, because the output has to go to back to the correct id files (aaa, bbb ccc.json):
jq --argfile aaa aaa.json --argfile bbb bbb.json .... -f prog.jq input.json
The problem is, that the number after the identifier (aaa, bbb, ccc etc.) may change. So to make sure objects are added to the correct file/array, a statement like this would be required:
if (."ID"|contains("aaa")) then ....
Or is it better to run the program several times with different input parameters? I am not sure..
Thank you in advance!!

Here is one approach
#!/bin/bash
# usage: update.sh input.json aaa.json bbb.json....
# updates each of aaa.json bbb.json....
input_json="$1"
shift
for i in "$#"; do
jq -M --argfile input_json "$input_json" '
# functions to restrict input.json to keys of current xxx.json file
def prefix: input_filename | split(".")[0];
def selectprefix: select(.ID | startswith(prefix));
# functions to build and probe a lookup table
def pk: [.ID, .time_CET];
def lookup($t;$k): $t | getpath($k);
def lookup($t): lookup($t;pk);
def organize(s): reduce s as $r ({}; setpath($r|pk; $r));
# functions to identify objects in input.json missing from xxx.json
def pks: paths | select(length==2);
def missing($t1;$t2): [$t1|pks] - [$t2|pks] | .[];
def getmissing($t1;$t2): [ missing($t1;$t2) as $p | lookup($t1;$p)];
# main routine
organize(.[]) as $xxx
| organize($input_json[] | selectprefix) as $inp
| map(if .FLAG != "?" then . else . += lookup($inp) end)
| . + getmissing($inp;$xxx)
' "$i" | sponge "$i"
done
The script uses jq in a loop to read and update each aaa.json... file.
The filter creates temporary objects to facilitate looking up values by [ID,time_CET], updates any values in the aaa.json with a FLAG=="?" and finally adds any values from input.json that are missing in aaa.json.
The temporary lookup table for input.json uses input_filename so that only keys starting with a prefix matching the name of the currently processed file will be included.
Sample Run:
$ ./update.sh input.json aaa.json
aaa.json after run:
[
{
"ID": "aaa_12301248",
"time_CET": "00:00:00",
"VALUE": 10,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:15:00",
"VALUE": 18,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:55:00",
"VALUE": 45,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:30:00",
"VALUE": 160,
"FLAG": "0"
}
]

Related

Filter in jq based on an array of string

This is close to this question: jq filter JSON array based on value in list , the only difference being the condition. I don't want equality as a condition, but a "contains", but I am struggling to do it correctly.
The problem:
If want to filter elements of an input file based on the presence of a specific value in an entry
The input file:
[{
"id":"type 1 is great"
},{
"id":"type 2"
},
{
"id":"this is another type 2"
},
{
"id":"type 4"
}]
The filter file:
ids.txt:
type 2
type 1
The select.jq file:
def isin($a): . as $in | any($a[]; contains($in));
map( select( .id | contains($ids) ) )
The jq command:
jq --argjson ids "$(jq -R . ids.txt | jq -s .)" -f select.jq test.json
The expected result should be something like:
[{
"id":"type 1 is great"
},{
"id":"type 2"
},
{
"id":"this is another type 2"
}]
The problem is obviously in the "isin" function of the select.jq file, so how to write it correctly to check if an entry contains one of the string of another array?
Here is a solution that illustrates that there is no need to invoke jq more than once. In accordance with the jq program shown in the question, it also assumes that the selection is to be based on the values of the "id" key.
< ids.txt jq -nR --argfile json input.json '
# Is the input an acceptable value?
def acceptable($array): any(contains($array[]); .);
[inputs] as $substrings
| $json
| map( select(.id | acceptable($substrings)) )
'
Note on --argfile
--argfile has been deprecated, so you may wish to use --slurpfile instead, in which case you'd have to write $json[0].

JQ Recursive Tree expansion

I am attempting to parse a JSON structure to extract a dependency path, for use in an automation script.
The structure of this JSON is extracted to a format like this:
[
{
"Id": "abc",
"Dependencies": [
]
},
{
"Id": "def",
"Dependencies": [
"abc"
]
},
{
"Id": "ghi",
"Dependencies": [
"def"
]
}
]
Note: Lots of other irrelevant fields removed.
The plan is to be able to pass into my JQ command the Id of one of these and get back out a list.
Eg:
Input: abc
Expected Output: []
Input: def
Expected Output: ["abc"]
Input: ghi
Expected Output: ["abc", "def"]
Currently have a jq script like this (https://jqplay.org/s/NAhuXNYXXO):
jq
'. as $original | .[] |
select(.Id == "INPUTVARIABLE") |
[.Dependencies[]] as $level1Dep | [$original[] | select( [ .Id == $level1Dep[] ] | any )] as $level1Full | $level1Full[] |
[.Dependencies[]] as $level2Dep | [$original[] | select ( [ .Id == $level2Dep[] ] | any )] as $level2Full |
[$level1Dep[], $level2Dep[]]'
Input: abc
Output: empty
Input: def
Output: ["abc"]
Input: ghi
Output: ["def","abc"]
Great! However, as you can see this is not particularly scale-able and will only handle two dependency levels (https://jqplay.org/s/Zs0xIvJ2Zn), and also falls apart horribly when there are multiple dependencies on an item (https://jqplay.org/s/eB9zHQSH2r).
Is there a way of constructing this within JQ or do I need to move out to a different language?
I know that the data cannot have circular dependencies, it is pulled from a database that enforces this.
It's trivial then. Reduce your input JSON down to an object where each Id and corresponding Dependencies array are paired, and walk through it aggregating dependencies using a recursive function.
def deps($depdb; $id):
def _deps($id): $depdb[$id] // empty
| . + map(_deps(.)[]);
_deps($id);
deps(map({(.Id): .Dependencies}) | add; $fid)
Invocation:
jq -c --arg fid 'ghi' -f prog.jq file
Online demo - arbitrary dependency levels
Online demo - multiple dependencies per Id
Here's a short program that handles circular dependencies efficiently and illustrates how a subfunction can be defined after the creation of a local variable (here, $next) for efficiency:
def dependents($x):
(map( {(.Id): .Dependencies}) | add) as $next
# Input: array of dependents computed so far
# Output: array of all dependents
| def tc($x):
($next[$x] - .) as $new
| if $new == [] then .
else (. + $new | unique)
# avoid calling unique again:
| . + ([tc($new[])[]] - .)
end ;
[] | tc($x);
dependents($start)
Usage
With the given input and an invocation such as
jq --arg start START -f program.jq input.json
the output for various values of START is:
START output
abc []
def ["abc"]
ghi ["def", "abc"]
If the output must be sorted, then simply add a call to sort.

How to use jq to find all paths to a certain key

In a very large nested json structure I'm trying to find all of the paths that end in a key.
ex:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
},
"A2": {
"_": "_"
}
},
"B": {
"B1": {}
},
"foo": {
"_": "_"
}
}
would print something along the lines of:
["A","A1","foo"], ["foo"]
Unfortunately I don't know at what level of nesting the keys will appear, so I haven't been able to figure it out with a simple select. I've gotten close with jq '[paths] | .[] | select(contains(["foo"]))', but the output contains all the permutations of any tree that contains foo.
output: ["A", "A1", "foo"]["A", "A1", "foo", "_"]["foo"][ "foo", "_"]
Bonus points if I could keep the original data structure format but simply filter out all paths that don't contain the key (in this case the sub trees under "foo" wouldn't need to be hidden).
With your input:
$ jq -c 'paths | select(.[-1] == "foo")'
["A","A1","foo"]
["foo"]
Bonus points:
(1) If your jq has tostream:
$ jq 'fromstream(tostream| select(.[0]|index("foo")))'
Or better yet, since your input is large, you can use the streaming parser (jq -n --stream) with this filter:
fromstream( inputs|select( (.[0]|index("foo"))))
(2) Whether or not your jq has tostream:
. as $in
| reduce (paths(scalars) | select(index("foo"))) as $p
(null; setpath($p; $in|getpath($p)))
In all three cases, the output is:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
}
},
"foo": {
"_": "_"
}
}
I had the same fundamental problem.
With (yaml) input like:
developer:
android:
members:
- alice
- bob
oncall:
- bob
hr:
members:
- charlie
- doug
this:
is:
really:
deep:
nesting:
members:
- example deep nesting
I wanted to find all arbitrarily nested groups and get their members.
Using this:
yq . | # convert yaml to json using python-yq
jq '
. as $input | # Save the input for later
. | paths | # Get the list of paths
select(.[-1] | tostring | test("^(members|oncall|priv)$"; "ix")) | # Only find paths which end with members, oncall, and priv
. as $path | # save each path in the $path variable
( $input | getpath($path) ) as $members | # Get the value of each path from the original input
{
"key": ( $path | join("-") ), # The key is the join of all path keys
"value": $members # The value is the list of members
}
' |
jq -s 'from_entries' | # collect kv pairs into a full object using slurp
yq --sort-keys -y . # Convert back to yaml using python-yq
I get output like this:
developer-android-members:
- alice
- bob
developer-android-oncall:
- bob
hr-members:
- charlie
- doug
this-is-really-deep-nesting-members:
- example deep nesting

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]

How to convert arbitrary simple JSON to CSV using jq?

Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?
There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').
Examples
Input:
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
Possible output:
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
Possible output:
"code","name","level","country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
Input:
[
{"name": "bang", "value": "!", "level": 0},
{"name": "letters", "value": "a,b,c", "level": 0},
{"name": "letters", "value": "x,y,z", "level": 1},
{"name": "bang", "value": "\"!\"", "level": 1}
]
Possible output:
name,value,level
bang,!,0
letters,"a,b,c",0
letters,"x,y,z",1
bang,"""!""",0
Possible output:
"name","value","level"
"bang","!","0"
"letters","a,b,c","0"
"letters","x,y,z","1"
"bang","""!""","1"
First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:
(map(keys) | add | unique) as $cols
Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.
map(. as $row | $cols | map($row[.])) as $rows
Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the #csv filter.
$cols, $rows[] | #csv
All together now. Remember to use the -r flag to get the result as a raw string:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv'
The Skinny
jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | #csv'
or:
jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | #csv'
The Details
Aside
Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).
$ jq -c '.[]' <<<'["a", "b"]'
"a"
"b"
$ jq -cn '"a", "b"'
"a"
"b"
Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -c option) shows that each array element (or argument to the , filter) becomes a separate object in the output (each is on a separate line).
A stream is like a JSON-seq, but uses newlines rather than RS as an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.
Constructing the Filter
The first object's keys can be extracted with:
.[0] | keys_unsorted
Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.
To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.
(.[0] | keys_unsorted) as $keys | $keys, ...
The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keys is an array, not a sequence, so [] is applied to convert it to a sequence,
$keys[]
which can then be passed to .[]
.[ $keys[] ]
This, too, produces a sequence, so the array constructor is used to convert it to an array.
[.[ $keys[] ]]
This expression is to be applied to a single object. map() is used to apply it to all objects in the outer array:
map([.[ $keys[] ]])
Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.
map([.[ $keys[] ]])[]
Why bundle the sequence into an array within the map only to unbundle it outside? map produces an array; .[ $keys[] ] produces a sequence. Applying map to the sequence from .[ $keys[] ] would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.
["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]
The values from each object need to be kept separate, so that they become separate rows in the final output.
Finally, the sequence is passed through #csv formatter.
Alternate
The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and + used to append the array of values. This still needs to be converted to a sequence before being passed to #csv.
The following filter is slightly different in that it will ensure every value is converted to a string. (jq 1.5+)
# For an array of many objects
jq -f filter.jq [file]
# For many objects (not within array)
jq -s -f filter.jq [file]
Filter: filter.jq
def tocsv:
(map(keys)
|add
|unique
|sort
) as $cols
|map(. as $row
|$cols
|map($row[.]|tostring)
) as $rows
|$cols,$rows[]
| #csv;
tocsv
$cat test.json
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #tsv ' test.json
Code Name Level Country
NSW New South Wales state AU
AB Alberta province CA
ABD Aberdeenshire council area GB
AK Alaska state US
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #csv ' test.json
"Code","Name","Level","Country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.
def to_csv($headers):
def _object_to_csv:
($headers | #csv),
(.[] | [.[$headers[]]] | #csv);
def _array_to_csv:
($headers | #csv),
(.[][:$headers|length] | #csv);
if .[0]|type == "object"
then _object_to_csv
else _array_to_csv
end;
So you could use it like so:
to_csv([ "code", "name", "level", "country" ])
This variant of Santiago's program is also safe but ensures that the key names in
the first object are used as the first column headers, in the same order as they
appear in that object:
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $firstkeys
| (map(keys) | add | unique) as $allkeys
| ($firstkeys + ($allkeys - $firstkeys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| #csv
end ;
tocsv
If you're open to using other Unix tools, csvkit has an in2csv tool:
in2csv example.json
Using your sample data:
> in2csv example.json
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
I like the pipe approach for piping directly from jq:
cat example.json | in2csv -f json -
A simple way is to just use string concatenation. If your input is a proper array:
# filename.txt
[
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"}
]
then index with .[]:
cat filename.txt | jq -r '.[] | .field1 + ", " + .field2'
or if it's just line by line objects:
# filename.txt
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
just do this:
cat filename.txt | jq -r '.field1 + ", " + .field2'