jq get value or default if nested key not present - json

I would like to make a small jq-like function that does get-or-default-if-not-present similar to Python's dict.get(key, default). This is the desired behavior:
% echo '{"nested": {"key": "value", "tricky": null}}' > file.json
% my-jq nested.key \"default\" file.json
"value"
% my-jq nested.tricky \"default\" file.json
null
% my-jq nested.dne \"default\" file.json
"default"
I have tried playing with this answer to a similar question but it doesn't work for nested keys. Does anyone have a suggestion?
function my-jq () {
jq --arg key "$1" --arg default "$2" \
'if has($key) then .[$key] else $default | fromjson end' "$3"
}

Based on your projection to make a "jq-like" function, it's fair to assume that your parameter should then be considered not just a path expression but a general jq filter, i.e. code. With the current versions of jq, there is no option or other shorthand method that works for code similar to how --arg and --argjson do for data.
You can, however, import code fragments by means of jq's library/module system, but with the task at hand you'd need to store the code from your function's parameter into a (temporary) file, reference it in the static part of the actual jq filter, and delete the file afterwards. This not only is cumbersome, it also needs some static overhead in the module file, which inevitably opens up Pandora's box labeled "Code injection", so you could just as well submit to the unleashed evil, and compose the actual jq filter on the fly using the literal (and potentially malicious) content of the parameter. (Note that this assumes a valid jq expression, thus using .nested.key etc. with a dot up front):
function my-jq() { jq "($1) // ($2)" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json # fails
"default"
% my-jq .nested.dne \"default\" file.json
"default"
This minimal approach uses the alternative operator //, which fails to tell an actual but falsy value ("null" or "false") apart from an empty stream (missing value). To counteract that, you could perform a check on the existence of the path input among the all the paths of the base document. This drastically reduces the kind of filters trivially accepted by your function (which with your use case in mind may even be considered a good thing, yet malicious injection is still possible), and the comparison with all paths may come with a performance penalty for base documents with complex structuring, but it meets your three test cases:
function my-jq() { jq "if any(path($1) == paths; .) then ($1) else ($2) end" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json
null
% my-jq .nested.dne \"default\" file.json
"default"
Eventually, the potential performance penalty could be mitigated by combining both approaches, i.e. starting off with the faster first one for the general case, but reverting to the possibly slower second one if the first one produced an ambiguous falsy value:
function my-jq() { jq "($1) // if any(path($1) == paths; .) then ($1) else ($2) end" "$3"; }
% my-jq .nested.key \"default\" file.json
"value"
my-jq .nested.tricky \"default\" file.json
null
% my-jq .nested.dne \"default\" file.json
"default"

You can't use a dotted path as a sequence of keys. You can turn such a path into a valid path for use with the getpath function, however. (There may be a cleaner, more robust way to do this.) The // operator provides for an alternate value should the left-hand side produce false or null.
$ jq 'getpath($key | ltrimstr(".") | split(".")) // $default' file.json --arg key .nested.key --arg default foobar
"value"
$ jq 'getpath($key | ltrimstr(".") | split(".")) // $default' file.json --arg key .nested.dne --arg default foobar
"foobar"
$key | ltrimstr(".") | split(".") first gets rid of the leading ., then splits the remaining string on the remaining . to produce a list of separate keys. getpath produces a filter using that list of keys; getpath(["nested", "key"]) is equivalent (AFAIK) to .nested.key.

Your requirements go against JSON's grain a bit, but here is one possible solution, or the basis of a family of solutions.
This particular solution assumes that the path is given in array form:
function my-jq () {
jq -c --argjson key "$1" --arg default "$2" '
first(tostream | select(length == 2 and .[0] == $key)) // null
| if . then .[1] else $default end
'
}
Examples:
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","key"]' haha
"value"
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","nokey"]' haha
"haha"
echo '{"nested": {"key": "value", "tricky": null}}' | my-jq2 '["nested","tricky"]' haha
null

Related

Constructing a JSON object from a bash associative array

I would like to convert an associative array in bash to a JSON hash/dict. I would prefer to use JQ to do this as it is already a dependency and I can rely on it to produce well formed json. Could someone demonstrate how to achieve this?
#!/bin/bash
declare -A dict=()
dict["foo"]=1
dict["bar"]=2
dict["baz"]=3
for i in "${!dict[#]}"
do
echo "key : $i"
echo "value: ${dict[$i]}"
done
echo 'desired output using jq: { "foo": 1, "bar": 2, "baz": 3 }'
There are many possibilities, but given that you already have written a bash for loop, you might like to begin with this variation of your script:
#!/bin/bash
# Requires bash with associative arrays
declare -A dict
dict["foo"]=1
dict["bar"]=2
dict["baz"]=3
for i in "${!dict[#]}"
do
echo "$i"
echo "${dict[$i]}"
done |
jq -n -R 'reduce inputs as $i ({}; . + { ($i): (input|(tonumber? // .)) })'
The result reflects the ordering of keys produced by the bash for loop:
{
"bar": 2,
"baz": 3,
"foo": 1
}
In general, the approach based on feeding jq the key-value pairs, with one key on a line followed by the corresponding value on the next line, has much to recommend it. A generic solution following this general scheme, but using NUL as the "line-end" character, is given below.
Keys and Values as JSON Entities
To make the above more generic, it would be better to present the keys and values as JSON entities. In the present case, we could write:
for i in "${!dict[#]}"
do
echo "\"$i\""
echo "${dict[$i]}"
done |
jq -n 'reduce inputs as $i ({}; . + { ($i): input })'
Other Variations
JSON keys must be JSON strings, so it may take some work to ensure that the desired mapping from bash keys to JSON keys is implemented. Similar remarks apply to the mapping from bash array values to JSON values. One way to handle arbitrary bash keys would be to let jq do the conversion:
printf "%s" "$i" | jq -Rs .
You could of course do the same thing with the bash array values, and let jq check whether the value can be converted to a number or to some other JSON type as desired (e.g. using fromjson? // .).
A Generic Solution
Here is a generic solution along the lines mentioned in the jq FAQ and advocated by #CharlesDuffy. It uses NUL as the delimiter when passing the bash keys and values to jq, and has the advantage of only requiring one call to jq. If desired, the filter fromjson? // . can be omitted or replaced by another one.
declare -A dict=( [$'foo\naha']=$'a\nb' [bar]=2 [baz]=$'{"x":0}' )
for key in "${!dict[#]}"; do
printf '%s\0%s\0' "$key" "${dict[$key]}"
done |
jq -Rs '
split("\u0000")
| . as $a
| reduce range(0; length/2) as $i
({}; . + {($a[2*$i]): ($a[2*$i + 1]|fromjson? // .)})'
Output:
{
"foo\naha": "a\nb",
"bar": 2,
"baz": {
"x": 0
}
}
This answer is from nico103 on freenode #jq:
#!/bin/bash
declare -A dict=()
dict["foo"]=1
dict["bar"]=2
dict["baz"]=3
assoc2json() {
declare -n v=$1
printf '%s\0' "${!v[#]}" "${v[#]}" |
jq -Rs 'split("\u0000") | . as $v | (length / 2) as $n | reduce range($n) as $idx ({}; .[$v[$idx]]=$v[$idx+$n])'
}
assoc2json dict
You can initialize a variable to an empty object {} and add the key/values {($key):$value} for each iteration, re-injecting the result in the same variable :
#!/bin/bash
declare -A dict=()
dict["foo"]=1
dict["bar"]=2
dict["baz"]=3
data='{}'
for i in "${!dict[#]}"
do
data=$(jq -n --arg data "$data" \
--arg key "$i" \
--arg value "${dict[$i]}" \
'$data | fromjson + { ($key) : ($value | tonumber) }')
done
echo "$data"
This has been posted, and credited to nico103 on IRC, which is to say, me.
The thing that scares me, naturally, is that these associative array keys and values need quoting. Here's a start that requires some additional work to dequote keys and values:
function assoc2json {
typeset -n v=$1
printf '%q\n' "${!v[#]}" "${v[#]}" |
jq -Rcn '[inputs] |
. as $v |
(length / 2) as $n |
reduce range($n) as $idx ({}; .[$v[$idx]]=$v[$idx+$n])'
}
$ assoc2json a
{"foo\\ bar":"1","b":"bar\\ baz\\\"\\{\\}\\[\\]","c":"$'a\\nb'","d":"1"}
$
So now all that's needed is a jq function that removes the quotes, which come in several flavors:
if the string starts with a single-quote (ksh) then it ends with a single quote and those need to be removed
if the string starts with a dollar sign and a single-quote and ends in a double-quote, then those need to be removed and internal backslash escapes need to be unescaped
else leave as-is
I leave this last iterm as an exercise for the reader.
I should note that I'm using printf here as the iterator!
bash 5.2 introduces the #k parameter transformation which, makes this much easier. Like:
$ declare -A dict=([foo]=1 [bar]=2 [baz]=3)
$ jq -n '[$ARGS.positional | _nwise(2) | {(.[0]): .[1]}] | add' --args "${dict[#]#k}"
{
"foo": "1",
"bar": "2",
"baz": "3"
}

Using jq, Flatten Arbitrary JSON to Delimiter-Separated Flat Dictionary

I'm looking to transform JSON using jq to a delimiter-separated and flattened structure.
There have been attempts at this. For example, Flatten nested JSON using jq.
However the solutions on that page fail if the JSON contains arrays. For example, if the JSON is:
{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}
The solution above will fail to transform the above to:
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
In addition, I'm looking for a solution that will also allow for an arbitrary delimiter. For example, suppose the space character is the delimiter. In this case, the result would be:
{"a b 0":1,"x 0 y":2,"x 1 z":3}
I'm looking to have this functionality accessed via a Bash (4.2+) function as is found in CentOS 7, something like this:
flatten_json()
{
local JSONData="$1"
# jq command to flatten $JSONData, putting the result to stdout
jq ... <<<"$JSONData"
}
The solution should work with all JSON data types, including null and boolean. For example, consider the following input:
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
It should produce:
{"a b 0":"p q r","w 0 x":null,"w 1 y":false,"w 2 z":3}
If you stream the data in, you'll get pairings of paths and values of all leaf values. If not a pair, then a path marking the end of a definition of an object/array at that path. Using leaf_paths as you found would only give you paths to truthy leaf values so you'll miss out on null or even false values. As a stream, you won't get this problem.
There are many ways this could be combined to an object, I'm partial to using reduce and assignment in these situations.
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
.[[$i[0][]|tostring]|join($delim)] = $i[1]
)' input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Here's the same solution broken up a bit to allow room for explanation of what's going on.
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
Converting the input to a stream with tostream, we'll receive multiple values of pairs/paths as input to our filter. With this, we can pass those multiple values into reduce which is designed to accept multiple values and do something with them. But before we do, we want to filter those pairs/paths by only the pairs (select(length==2)).
Then in the reduce call, we're starting with a clean object and assigning new values using a key derived from the path and the corresponding value. Remember that every value produced in the reduce call is used for the next value in the iteration. Binding values to variables doesn't change the current context and assignments effectively "modify" the current value (the initial object) and passes it along.
$path_as_strings is just the path which is an array of strings and numbers to just strings. [$i[0][]|tostring] is a shorthand I use as an alternative to using map when the array I want to map is not the current array. This is more compact since the mapping is done as a single expression. That instead of having to do this to get the same result: ($i[0]|map(tostring)). The outer parentheses might not be necessary in general but, it's still two separate filter expressions vs one (and more text).
Then from there we convert that array of strings to the desired key using the provided delimiter. Then assign the appropriate values to the current object.
The following has been tested with jq 1.4, jq 1.5 and the current "master" version. The requirement about including paths to null and false is the reason for "allpaths" and "all_leaf_paths".
# all paths, including paths to null
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
def all_leaf_paths:
def isscalar: type | (. != "object" and . != "array");
allpaths as $p
| select(getpath($p)|isscalar)
| $p ;
. as $in
| reduce all_leaf_paths as $path ({};
. + { ($path | map(tostring) | join($delim)): $in | getpath($path) })
With this jq program in flatten.jq:
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim . -f flatten.jq input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Collisions
Here is a helper function that illustrates an alternative path-flattening algorithm. It converts keys that contain the delimiter to quoted strings, and array elements are presented in square brackets (see the example below):
def flattenPath(delim):
reduce .[] as $s ("";
if $s|type == "number"
then ((if . == "" then "." else . end) + "[\($s)]")
else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
end );
Example: Using flattenPath instead of map(tostring) | join($delim), the object:
{"a.b": [1]}
would become:
{
"\"a.b\"[0]": 1
}
To add a new option to the solutions already given, jqg is a script I wrote to flatten any JSON file and then search it using a regex. For your purposes your regex would simply be '.' which would match everything.
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg .
{
"a.b.0": 1,
"x.0.y": 2,
"x.1.z": 3
}
and can produce compact output:
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg -q -c .
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
It also handles the more complicated example that #peak used:
$ echo '{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}' | jqg .
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
as well as empty arrays and objects (and a few other edge-case values):
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.two-a.non-integer-number": 101.75,
"two.two-a.number-zero": 0,
"two.true-boolean": true,
"two.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
(reporting empty arrays & objects can be turned off with the -E option).
jqg was tested with jq 1.6
Note : I am the author of the jqg script.

Numeric argument passed with jq --arg not matching data with ==

Here is a sample JSON response from my curl:
{
"success": true,
"message": "jobStatus",
"jobStatus": [
{
"ID": 9,
"status": "Successful"
},
{
"ID": 2,
"status": "Successful"
},
{
"ID": 99,
"status": "Failed"
}
]
}
I want to check the status of ID=2. Here is the command I tried:
cat test.txt|jq --arg v "2" '.jobStatus[]|select(.ID == $v)|.status'
response: there is none
I tried value 2 without quotes and still no result.
By contrast, if I try the command with a literal 2, it works:
cat test.txt | jq '.jobStatus[]|select(.ID == 2)|.status'
response:
"Successful"
I'm stuck. Can anyone help me identify the problem?
jq is data-type-aware:
.ID, as defined in the JSON input, is a number,
but any command-line argument passed with --arg (such as v here) is invariably a string (whether you quote the value or not),
so, in order to compare them, you must use an explicit type conversion, such as with tonumber/1:
jq --arg v '2' '.jobStatus[] | select(.ID == ($v | tonumber)) | .status' test.txt
Given that you're only passing a scalar argument here, the following solution, using --argjson (jq v1.5+) is a bit of an overkill, but it is an alternative to explicit type conversion in that passing a JSON argument in effect passes typed data:
jq --argjson v '{ "ID": 2 }' '.jobStatus[] | select(.ID == $v.ID) | .status' test.txt
peak's answer demonstrates that even --argjson v 2 works (in which case comparing to $v works directly), which is certainly the most concise solution, but may require an explanation:
Even though 2 may not look like JSON, it is: it is a valid JSON text containing a single value of type number (see json.org).
Specifically, it is the fact that 2 is an unquoted token that starts with a digit that makes it a number in the context of JSON (the JSON string-value equivalent is "2", which from the shell would have to be passed as '"2"' - note the embedded double quotes).
Therefore jq interprets --argjson -v 2 as a number, and comparison .ID == $v works as intended (note that the same applies to --argjson -v '2' / --argjson -v "2", where the shell removes the quotes before jq sees the value).
By contrast, anything you pass with --arg is always a string value that is used as-is.
In other words: --argjson, whose purpose is to accept arbitrary JSON texts as strings (such as '{ "ID": 2 }' in the example above), can also be used to pass number-string scalars to force their interpretation as numbers.
The same technique also works with Boolean strings true and false.
Tip of the hat to peak for his help.
Assuming you want to check for the JSON value 2, you have a choice to make - either convert the argument of --arg to a number, or use --argjson with a numeric argument. These alternatives are illustrated by the following:
jq --arg v 2 '.jobStatus[] | select(.ID == ($v|tonumber) | .status'
jq --argjson v 2 '.jobStatus[] | select(.ID == $v) | .status'
Note that --argjson requires a relatively recent version of jq.
Of course, if you want to "normalize" .ID so that it's always treated as a string, you could write:
jq --arg v 2 '.jobStatus[] | select((.ID|tostring) == $v) | .status'

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]

Flatten nested JSON using jq

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })