How can I completely sort arbitrary JSON using jq? - json

I want to diff two JSON text files. Unfortunately they're constructed in arbitrary order, so I get diffs when they're semantically identical. I'd like to use jq (or whatever) to sort them in any kind of full order, to eliminate differences due only to element ordering.
--sort-keys solves half the problem, but it doesn't sort arrays.
I'm pretty ignorant of jq and don't know how to write a jq recursive filter that preserves all data; any help would be appreciated.
I realize that line-by-line 'diff' output isn't necessarily the best way to compare two complex objects, but in this case I know the two files are very similar (nearly identical) and line-by-line diffs are fine for my purposes.
Using jq or alternative command line tools to diff JSON files answers a very similar question, but doesn't print the differences. Also, I want to save the sorted results, so what I really want is just a filter program to sort JSON.

Here is a solution using a generic function sorted_walk/1 (so named for the reason described in the postscript below).
normalize.jq:
# Apply f to composite entities recursively using keys[], and to atoms
def sorted_walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | sorted_walk(f)) } ) | f
elif type == "array" then map( sorted_walk(f) ) | f
else f
end;
def normalize: sorted_walk(if type == "array" then sort else . end);
normalize
Example using bash:
diff <(jq -S -f normalize.jq FILE1) <(jq -S -f normalize.jq FILE2)
POSTSCRIPT: The builtin definition of walk/1 was revised after this response was first posted: it now uses keys_unsorted rather than keys.

I want to diff two JSON text files.
Use jd with the -set option:
No output means no difference.
$ jd -set A.json B.json
Differences are shown as an # path and + or -.
$ jd -set A.json C.json
# ["People",{}]
+ "Carla"
The output diffs can also be used as patch files with the -p option.
$ jd -set -o patch A.json C.json; jd -set -p patch B.json
{"City":"Boston","People":["John","Carla","Bryan"],"State":"MA"}
https://github.com/josephburnett/jd#command-line-usage

I'm surprised this isn't a more popular question/answer. I haven't seen any other json deep sort solutions. Maybe everyone likes solving the same problem over and over.
Here's an wrapper for #peak's excellent solution above that wraps it into a shell script that works in a pipe or with file args.
#!/usr/bin/env bash
# json normalizer function
# Recursively sort an entire json file, keys and arrays
# jq --sort-keys is top level only
# Alphabetize a json file's dict's such that they are always in the same order
# Makes json diff'able and should be run on any json data that's in source control to prevent excessive diffs from dict reordering.
[ "${DEBUG}" ] && set -x
TMP_FILE="$(mktemp)"
trap 'rm -f -- "${TMP_FILE}"' EXIT
cat > "${TMP_FILE}" <<-EOT
# Apply f to composite entities recursively using keys[], and to atoms
def sorted_walk(f):
. as \$in
| if type == "object" then
reduce keys[] as \$key
( {}; . + { (\$key): (\$in[\$key] | sorted_walk(f)) } ) | f
elif type == "array" then map( sorted_walk(f) ) | f
else f
end;
def normalize: sorted_walk(if type == "array" then sort else . end);
normalize
EOT
# Don't pollute stdout with debug output
[ "${DEBUG}" ] && cat $TMP_FILE > /dev/stderr
if [ "$1" ] ; then
jq -S -f ${TMP_FILE} $1
else
jq -S -f ${TMP_FILE} < /dev/stdin
fi

Related

JQ Group Multiple Files

I have a set of JSON that all contain JSON in the following format:
File 1:
{ "host" : "127.0.0.1", "port" : "80", "data": {}}
File 2:
{ "host" : "127.0.0.2", "port" : "502", "data": {}}
File 3:
{ "host" : "127.0.0.1", "port" : "443", "data": {}}
These files can be rather large, up to several gigabytes.
I want to use JQ or some other bash json processing tool that can merge these json files into one file with a grouped format like so:
[{ "host" : "127.0.0.1", "data": {"80": {}, "443" : {}}},
{ "host" : "127.0.0.2", "data": {"502": {}}}]
Is this possible with jq and if yes, how could I possibly do this? I have looked at the group_by function in jq, but it seems like I need to combine all files first and then group on this big file. However, since the files can be very large, it might make sense to stream the data and group them on the fly.
With really big files, I'd look into a primarily disk based approach instead of trying to load everything into memory. The following script leverages sqlite's JSON1 extension to load the JSON files into a database and generate the grouped results:
#!/usr/bin/env bash
DB=json.db
# Delete existing database if any.
rm -f "$DB"
# Create table. Assuming each host,port pair is unique.
sqlite3 -batch "$DB" <<'EOF'
CREATE TABLE data(host TEXT, port INTEGER, data TEXT,
PRIMARY KEY (host, port)) WITHOUT ROWID;
EOF
# Insert the objects from the files into the database.
for file in file*.json; do
sqlite3 -batch "$DB" <<EOF
INSERT INTO data(host, port, data)
SELECT json_extract(j, '\$.host'), json_extract(j, '\$.port'), json_extract(j, '\$.data')
FROM (SELECT json(readfile('$file')) AS j) as json;
EOF
done
# And display the results of joining the objects Could use
# json_group_array() instead of this sed hackery, but we're trying to
# avoid building a giant string with the entire results. It might still
# run into sqlite maximum string length limits...
sqlite3 -batch -noheader -list "$DB" <<'EOF' | sed '1s/^/[/; $s/,$/]/'
SELECT json_object('host', host,
'data', json_group_object(port, json(data))) || ','
FROM data
GROUP BY host
ORDER BY host;
EOF
Running this on your sample data prints out:
[{"host":"127.0.0.1","data":{"80":{},"443":{}}},
{"host":"127.0.0.2","data":{"502":{}}}]
If the goal is really to produce a single ginormous JSON entity, then presumably that entity is still small enough to have a chance of fitting into the memory of some computer, say C. So there is a good chance of jq being up to the job on C. At any rate, to utilize memory efficiently, you would:
use inputs while performing the grouping operation;
avoid the built-in group_by (since it requires an in-memory sort).
Here then is a two-step candidate using jq, which assumes grouping.jq contains the following program:
# emit a stream of arrays assuming that f is always string-valued
def GROUPS_BY(stream; f):
reduce stream as $x ({}; ($x|f) as $s | .[$s] += [$x]) | .[];
GROUPS_BY(inputs | .data=.port | del(.port); .host)
| {host: .[0].host, data: map({(.data): {}}) | add}
If the JSON files can be captured by *.json, you could then consider:
jq -n -f grouping.jq *.json | jq -s .
One advantage of this approach is that if it fails, you could try using a temporary file to hold the output of the first step, and then processing it later, either by "slurping" it, or perhaps more sensibly distributing it amongst several files, one per .host.
Removing extraneous data
Obviously, if the input files contain extraneous data, you might want to remove it first, e.g. by running
for f in *.json ; do
jq '{host,port}' "$f" | sponge $f
done
or by performing the projection in program.jq, e.g. using:
GROUPS_BY(inputs | {host, data: .port}; .host)
| {host: .[0].host, data: map( {(.data):{}} )}
Here's a script which uses jq to solve the problem without requiring more memory than is needed for the largest group. For simplicity:
it reads *.json and directs output to $OUT as defined at the top of the script.
it uses sponge
#!/usr/bin/env bash
# Requires: sponge
OUT=big.json
/bin/rm -i "$OUT"
if [ -s "$OUT" ] ; then
echo $OUT already exists
exit 1
fi
### Step 0: setup
TDIR=$(mktemp -d /tmp/grouping.XXXX)
function cleanup {
if [ -d "$TDIR" ] ; then
/bin/rm -r "$TDIR"
fi
}
trap cleanup EXIT
### Step 1: find the groups
for f in *.json ; do
host=$(jq -r '.host' "$f")
echo "$f" >> "$TDIR/$host"
done
for f in $TDIR/* ; do
echo $f ...
jq -n 'reduce (inputs | {host, data: {(.port): {} }}) as $in (null;
.host=$in.host | .data += [$in.data])' $(cat $f) | sponge "$f"
done
### Step 2: assembly
i=0
echo "[" > $OUT
find $TDIR -type f | while read f ; do
i=$((i + 1))
if [ $i -gt 1 ] ; then echo , >> $OUT ; fi
cat "$f" >> $OUT
done
echo "]" >> $OUT
Discussion
Besides requiring enough memory to handle the largest group, the main deficiencies of the above implementation are:
it assumes that the .host string is suitable as a file name.
the resultant file is not strictly speaking pretty-printed.
These two issues could however be addressed quite easily with minor modifications to the script without requiring additional memory.

JSON to CSV: variable number of columns per row

I need to convert JSON to CSV where JSON has arrays of variable length, for example:
JSON objects:
{"labels": ["label1"]}
{"labels": ["label2", "label3"]}
{"labels": ["label1", "label4", "label5"]}
Resulting CSV:
labels,labels,labels
"label1",,
"label2","label3",
"label1","label4","label5"
There are many other properties in the source JSON, this is just an exсerpt for the sake of simplicity.
Also, I need to say that the process has to work with JSON as a stream because source JSON could be very large (>1GB).
I wanted to use jq with two passes, the first pass would collect the maximum length of the 'labels' array, the second pass would create CSV as the number of the resulting columns is known by this time. But jq doesn't have a concept of global variables, so I don't know where I can store the running total.
I'd like to be able to do that on Windows via CLI.
Thank you in advance.
The question shows a stream of JSON objects, so the following solutions assume that the input file is already a sequence as shown. These solutions can also easily be adapted to cover the case where the input file contains a huge array of objects, e.g. as discussed in the epilog.
A two-invocation solution
Here's a two-pass solution using two invocations of jq. The presentation assumes a bash-like environment, in case you have wsl:
n=$(jq -n 'reduce (inputs|.labels|length) as $i (-1;
if $i > . then $i else . end)' stream.json)
jq -nr --argjson n $n '
def fill($n): . + [range(length;$n)|null];
[range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv' stream.json
Assuming the input is as described, this is guaranteed to produce valid CSV. Hopefully you can adapt the above to your shell as necessary -- maybe this link will help:
Assign output of a program to a variable using a MS batch file
Using input_filename and a single invocation of jq
Unfortunately, jq does not have a "rewind" facility, but
there is an alternative: read the file twice within a single invocation of jq. This is more cumbersome than the two-invocation solution above but avoids any difficulties associated with the latter.
cat sample.json | jq -nr '
def fill($n): . + [range(length;$n)|null];
def max($x): if . < $x then $x else . end;
foreach (inputs|.labels) as $in ( {n:0};
if input_filename == "<stdin>"
then .n |= max($in|length)
else .printed+=1
end;
if .printed == null then empty
else .n as $n
| (if .printed == 1 then [range(0;$n)|"labels"] else empty end),
($in | fill($n))
end)
| #csv' - sample.json
Another single-invocation solution
The following solution uses a special value (here null) to delineate the two streams:
(cat stream.json; echo null; cat stream.json) | jq -nr '
def fill($n): . + [range(length; $n) | null];
def max($x): if . < $x then $x else . end;
(label $loop | foreach inputs as $in (0;
if $in == null then . else max($in|.labels|length) end;
if $in == null then ., break $loop else empty end)) as $n
| [range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv '
Epilog
A file with a top-level JSON array that is too large to fit into memory can be converted into a stream of the array's items by invoking jq with the --stream option, e.g. as follows:
jq -cn --stream 'fromstream(1|truncate_stream(inputs))'
For such a large file, you will probably want to do this in two separate invocations, one to get the count, then another to actually output the csv. If you wanted to read the whole file into memory, you could do this in one, but we definitely don't want to do that, we'll want to stream it in where possible.
Things get a little ugly when it comes to storing the result of commands to a variable, writing to a file might be simpler. But I'd rather not use temp files if we don't have to.
REM assuming in a batch file
for /f "usebackq delims=" %%i in (`jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json`) do set cols=%%i
jq -rn --stream --argjson cols "%cols%" "[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
> jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json
For the first invocation to get the count of columns, we're just taking advantage of the fact that the paths to the array values could be used to indicate the lengths of the arrays. We'll just want to take the max across all items.
> jq -rn --stream --argjson cols "%cols%" ^
"[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
Then to output the rest, we're just taking the labels array (assuming it's the only property on the objects) and padding them out with null up to the $cols count. Then output as csv.
If the labels are in a different, deeply nested path than what's in your example here, you'll need to select based on the appropriate paths.
set labelspath=foo.bar.labels
jq -rn --stream --argjson cols "%cols%" --arg labelspath "%labelspath%" ^
"($labelspath|split(\".\")|[.,length]) as [$path,$depth] | [range($cols)|\"labels\"],(fromstream($depth|truncate_stream(inputs|select(.[0][:$depth] == $path)))|[.[],(range($cols-length)|null)])|#csv" input.json

How to merge a remote json referenced by a url string into the current json

Given
[
{"json1": "http://example.com/remote1.json"},
{"json2": "http://example.com/remote2.json"}
]
with remote1.json and remote2.json containing [1] and [2] respectively
How to turn it into
[{"json1": [1], "json2": [2]}]
using jq? I think other CLI tools like bash and curl are needed. But I have no idea how to merge the responses back.
XPath/XQuery has network access functions, since the W3C loves URI-references. If you are open to other tools, you could try my XPath/XQuery/JSONiq interpreter:
xidel master.json -e '[$json()()!{.:json($json()(.))}]'
Syntax:
$json is the input data
json() is a function to retrieve JSON
() are array values or object keys
! maps a sequence of values, whereby . is a single value
First, our test framework:
curl() {
case $1 in
http://example.com/remote1.json) echo "[1]" ;;
http://example.com/remote2.json) echo "[2]" ;;
*) echo "IMABUG" ;;
esac
}
input_json='[
{"json1": "http://example.com/remote1.json"},
{"json2": "http://example.com/remote2.json"}
]'
Then, our actual code:
# defines the "walk" function, which is not yet included in a released version of jq
# ...in the future, this will not be necessary.
walk_fn='
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
'
get_url_keys() {
jq -r "$walk_fn
walk(
if type == \"object\" then
to_entries
else . end
)
| flatten
| .[]
| select(.value | test(\"://\"))
| [.key, .value]
| #tsv"
}
operations=( )
options=( )
i=0
while IFS=$'\t' read -r key url; do
options+=( --arg "key$i" "$key" --argjson "value$i" "$(curl "$url")" )
operations+=(
" walk(
if type == \"object\" then
if .[\$key$i] then .[\$key$i]=\$value$i else . end
else . end
) "
)
(( ++i ))
done < <(get_url_keys <<<"$input_json")
IFS='|' # separate operations with a | character
jq -c "${options[#]}" "${walk_fn} ${operations[*]}" <<<"$input_json"
Output is properly:
[{"json1":[1]},{"json2":[2]}]
Network access has been proposed for jq but rejected, because of some combination of security, complexity, portabilty, and bloatware concerns.
Shelling out has likewise been proposed but still seems some way off.
It would be quite easy to achieve what
I understand to be the goal here, using jq and curl in conjuction with a scripting language such as bash. One way would be to serialize the JSON, and then "edit" the serialized JSON using curl, before deserializing it. For serialization/deserialization functions in jq, see e.g. How to Flatten JSON using jq and Bash into Bash Associative Array where Key=Selector?
If all strings that are valid URLs are to be replaced, then identifying them could in principle be done before or after serialization. If only a subset of such strings are to be dereferenced, then the choice might depend on the specific requirements.

How to Flatten JSON using jq and Bash into Bash Associative Array where Key=Selector?

As a follow-up to Flatten Arbitrary JSON, I'm looking to take the flattened results and make them suitable for doing queries and updates back to the original JSON file.
Motivation: I'm writing Bash (4.2+) scripts (on CentOS 7) that read JSON into a Bash associative array using the JSON selector/filter as the key. I do processing on the associative arrays, and in the end I want to update the JSON with those changes.
The preceding solution gets me close to this goal. I think there are two things that it doesn't do:
It doesn't quote keys that require quoting. For example, the key com.acme would need to be quoted because it contains a special character.
Array indexes are not represented in a form that can be used to query the original JSON.
Existing Solution
The solution from the above is:
$ jq --stream -n --arg delim '.' 'reduce (inputs|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
For example, if input.json contains:
{
"a.b":
[
"value"
]
}
then the output is:
{
"a.b.0": "value"
}
What is Really Wanted
An improvement would have been:
{
"\"a.b\"[0]": "value"
}
But what I really want is output formatted so that it could be sourced directly in a Bash program (implying the array name is passed to jq as an argument):
ArrayName['"a.b"[0]']='value' # Note 'value' might need escapes for Bash
I'm looking to have the more human-readable syntax above as opposed to the more general:
ArrayName['.["a.b"][0]']='value'
I don't know if jq can handle all of this. My present solution is to take the output from the preceding solution and to post-process it to the form that I want. Here's the work in process:
#!/bin/bash
Flatten()
{
local -r OPTIONS=$(getopt -o d:m:f: -l "delimiter:,mapname:,file:" -n "${FUNCNAME[0]}" -- "$#")
eval set -- "$OPTIONS"
local Delimiter='.' MapName=map File=
while true ; do
case "$1" in
-d|--delimiter) Delimiter="$2"; shift 2;;
-m|--mapname) MapName="$2"; shift 2;;
-f|--file) File="$2"; shift 2;;
--) shift; break;;
esac
done
local -a Array=()
readarray -t Array <<<"$(
jq -c -S --stream -n --arg delim "$Delimiter" 'reduce (inputs|select(length==2)) as $i ({}; .[[$i[0][]|tostring]|join($delim)] = $i[1])' <<<"$(sed 's|^\s*[#%].*||' "$File")" |
jq -c "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" |
sed -e 's|^"||' -e 's|"$||' -e 's|=|\t|')"
if [[ ! -v $MapName ]]; then
local -gA $MapName
fi
. <(
IFS=$'\t'
while read -r Key Value; do
printf "$MapName[\"%s\"]=%q\n" "$Key" "$Value"
done <<<"$(printf "%s\n" "${Array[#]}")"
)
}
declare -A Map
Flatten -m Map -f "$1"
declare -p Map
With the output:
$ ./Flatten.sh <(echo '{"a.b":["value"]}')
declare -A Map='([a.b.0]="value" )'
1) jq is Turing complete, so it's all just a question of which hammer to use.
2)
An improvement would have been:
{
"\"a.b\"[0]": "value"
}
That is easily accomplished using a helper function along these lines:
def flattenPath(delim):
reduce .[] as $s ("";
if $s|type == "number"
then ((if . == "" then "." else . end) + "[\($s)]")
else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
end );
3)
I do processing on the associative arrays, and in the end I want to update the JSON with those changes.
This suggests you might have posed an xy-problem. However, if you really do want to serialize and unserialize some JSON text, then the natural way to do so using jq is using leaf_paths, as illustrated by the following serialization/deserialization functions:
# Emit (path, value) pairs
# Usage: jq -c -f serialize.jq input.json > serialized.json
def serialize: leaf_paths as $p | ($p, getpath($p));
# Usage: jq -n -f unserialize.jq serialized.json
def unserialize:
def pairwise(s):
foreach s as $i ([];
if length == 1 then . + [$i] else [$i] end;
select(length == 2));
reduce pairwise(inputs) as $p (null; setpath($p[0]; $p[1]));
If using bash, you could use readarray (mapfile) to read the paths and values into a single array, or if you want to distinguish between the paths and values more easily, you could (for example) use the approach illustrated by the following:
i=0
while read -r line ; do
path[$i]="$line"; read -r line; value[$i]="$line"
i=$((i + 1))
done < serialized.json
But there are many other alternatives.

Using jq, Flatten Arbitrary JSON to Delimiter-Separated Flat Dictionary

I'm looking to transform JSON using jq to a delimiter-separated and flattened structure.
There have been attempts at this. For example, Flatten nested JSON using jq.
However the solutions on that page fail if the JSON contains arrays. For example, if the JSON is:
{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}
The solution above will fail to transform the above to:
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
In addition, I'm looking for a solution that will also allow for an arbitrary delimiter. For example, suppose the space character is the delimiter. In this case, the result would be:
{"a b 0":1,"x 0 y":2,"x 1 z":3}
I'm looking to have this functionality accessed via a Bash (4.2+) function as is found in CentOS 7, something like this:
flatten_json()
{
local JSONData="$1"
# jq command to flatten $JSONData, putting the result to stdout
jq ... <<<"$JSONData"
}
The solution should work with all JSON data types, including null and boolean. For example, consider the following input:
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
It should produce:
{"a b 0":"p q r","w 0 x":null,"w 1 y":false,"w 2 z":3}
If you stream the data in, you'll get pairings of paths and values of all leaf values. If not a pair, then a path marking the end of a definition of an object/array at that path. Using leaf_paths as you found would only give you paths to truthy leaf values so you'll miss out on null or even false values. As a stream, you won't get this problem.
There are many ways this could be combined to an object, I'm partial to using reduce and assignment in these situations.
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
.[[$i[0][]|tostring]|join($delim)] = $i[1]
)' input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Here's the same solution broken up a bit to allow room for explanation of what's going on.
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
Converting the input to a stream with tostream, we'll receive multiple values of pairs/paths as input to our filter. With this, we can pass those multiple values into reduce which is designed to accept multiple values and do something with them. But before we do, we want to filter those pairs/paths by only the pairs (select(length==2)).
Then in the reduce call, we're starting with a clean object and assigning new values using a key derived from the path and the corresponding value. Remember that every value produced in the reduce call is used for the next value in the iteration. Binding values to variables doesn't change the current context and assignments effectively "modify" the current value (the initial object) and passes it along.
$path_as_strings is just the path which is an array of strings and numbers to just strings. [$i[0][]|tostring] is a shorthand I use as an alternative to using map when the array I want to map is not the current array. This is more compact since the mapping is done as a single expression. That instead of having to do this to get the same result: ($i[0]|map(tostring)). The outer parentheses might not be necessary in general but, it's still two separate filter expressions vs one (and more text).
Then from there we convert that array of strings to the desired key using the provided delimiter. Then assign the appropriate values to the current object.
The following has been tested with jq 1.4, jq 1.5 and the current "master" version. The requirement about including paths to null and false is the reason for "allpaths" and "all_leaf_paths".
# all paths, including paths to null
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
def all_leaf_paths:
def isscalar: type | (. != "object" and . != "array");
allpaths as $p
| select(getpath($p)|isscalar)
| $p ;
. as $in
| reduce all_leaf_paths as $path ({};
. + { ($path | map(tostring) | join($delim)): $in | getpath($path) })
With this jq program in flatten.jq:
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim . -f flatten.jq input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Collisions
Here is a helper function that illustrates an alternative path-flattening algorithm. It converts keys that contain the delimiter to quoted strings, and array elements are presented in square brackets (see the example below):
def flattenPath(delim):
reduce .[] as $s ("";
if $s|type == "number"
then ((if . == "" then "." else . end) + "[\($s)]")
else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
end );
Example: Using flattenPath instead of map(tostring) | join($delim), the object:
{"a.b": [1]}
would become:
{
"\"a.b\"[0]": 1
}
To add a new option to the solutions already given, jqg is a script I wrote to flatten any JSON file and then search it using a regex. For your purposes your regex would simply be '.' which would match everything.
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg .
{
"a.b.0": 1,
"x.0.y": 2,
"x.1.z": 3
}
and can produce compact output:
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg -q -c .
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
It also handles the more complicated example that #peak used:
$ echo '{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}' | jqg .
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
as well as empty arrays and objects (and a few other edge-case values):
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.two-a.non-integer-number": 101.75,
"two.two-a.number-zero": 0,
"two.true-boolean": true,
"two.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
(reporting empty arrays & objects can be turned off with the -E option).
jqg was tested with jq 1.6
Note : I am the author of the jqg script.