jq json object concatenation to bash string array - json

I want to use jq (or anything else when it's the wrong tool) to concatenate a json object like this:
{
"https://github.com": {
"user-one": {
"repository-one": "version-one",
"repository-two": "version-two"
},
"user-two": {
"repository-three": "version-three",
"repository-four": "version-four"
}
},
"https://gitlab.com": {
"user-three": {
"repository-five": "version-five",
"repository-six": "version-six"
},
"user-four": {
"repository-seven": "version-seven",
"repository-eight": "version-eight"
}
}
}
recursively to a bash string array like this:
(
"https://github.com/user-one/repository-one/archive/refs/heads/version-one.tar.gz"
"https://github.com/user-one/repository-two/archive/refs/heads/version-two.tar.gz"
"https://github.com/user-two/repository-three/archive/refs/heads/version-three.tar.gz"
"https://github.com/user-two/repository-four/archive/refs/heads/version-four.tar.gz"
"https://gitlab.com/user-three/repository-five/-/archive/version-five/repository-five-version-five.tar.gz"
"https://gitlab.com/user-three/repository-six/-/archive/version-six/repository-six-version-six.tar.gz"
"https://gitlab.com/user-four/repository-seven/-/archive/version-seven/repository-seven-version-seven.tar.gz"
"https://gitlab.com/user-four/repository-eight/-/archive/version-eight/repository-eight-version-eight.tar.gz"
)
for subsequent use in a loop.
for i in "${arr[#]}"
do
echo "$i"
done
Have no idea how to do that.
As you can see, the values must be handled differently depending on the object name.
"https://github.com" + "/" + $user_name + "/" + $repository_name + "/archive/refs/heads/" + $version + ".tar.gz"
"https://gitlab.com" + "/" + $user_name + "/" + $repository_name + "/-/archive/" + $version + "/" + $repository_name + "-" + $version + ".tar.gz"
Could anyone help?

Easily done.
First, let's focus on the jq code alone:
to_entries[] # split items into keys and values
| .key as $site # store first key in $site
| .value # code below deals with the value
| to_entries[] # split that value into keys and values
| .key as $user # store the key in $user
| .value # code below deals with the value
| to_entries[] # split that value into keys and values
| .key as $repository_name # store the key in $repository_name
| .value as $version # store the value in $version
| if $site == "https://github.com" then
"\($site)/\($user)/\($repository_name)/archive/refs/heads/\($version).tar.gz"
else
"\($site)/\($user)/\($repository_name)/-/archive/\($version)/\($repository_name)-\($version).tar.gz"
end
That generates a list of lines. Reading lines into a bash array looks like readarray -t arrayname < ...datasource...
Thus, using a process substitution to redirect jq's stdout as if it were a file:
readarray -t uris < <(jq -r '
to_entries[]
| .key as $site
| .value
| to_entries[]
| .key as $user
| .value
| to_entries[]
| .key as $repository_name
| .value as $version
| if $site == "https://github.com" then
"\($site)/\($user)/\($repository_name)/archive/refs/heads/\($version).tar.gz"
else
"\($site)/\($user)/\($repository_name)/-/archive/\($version)/\($repository_name)-\($version).tar.gz"
end
' <config.json
)

The basic task of generating the strings can be done efficiently and generically (i.e., without any limits on the depths of the basenames) using the jq filter:
paths(strings) as $p | $p + [getpath($p)] | join("/")
There are several ways to populate a bash array accordingly, but if you merely wish to iterate through the values, you could use a bash while loop, like so:
< input.json jq -r '
paths(strings) as $p | $p + [getpath($p)] | join("/")' |
while read -r line ; do
echo "$line"
done
You might also wish to consider using jq's #sh or #uri filter. For a jq urlencode function, see e.g.
https://rosettacode.org/wiki/URL_encoding#jq
(If the strings contain newlines or tabs, then the above would need to be tweaked accordingly.)

Related

Split/Slice large JSON using jq

Would like to SLICE a huge json file ~20GB into smaller chunk of data based on array size (10000/50000 etc)..
Input:
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77832"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77834"}
]
}
Currently running in a loop to get the desire output by incrementing x/y value, but performance is very slow and takes very 8-20 seconds for a iteration depends on size of the file to complete the split process. Currently using 1.6 version, is there any alternates for getting below result
Expected Output: for Slice of 2 objects in array
{"recDt":"2021-01-05","country":"US","name":"ABC","number":"9828","add":[{"rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},{"rngNum":"2","state":"TX","city":"ANDERSON","postal":"77832"}]}
{"recDt":"2021-01-05","country":"US","name":"ABC","number":"9828","add":[{"rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},{"rngNum":"4","state":"TX","city":"ANDERSON","postal":"77834"}]}
Tried with
cat $inFile | jq -cn --stream 'fromstream(1|truncate_stream(inputs))' | jq --arg x $x --arg y $y -c '{recDt: .recDt, country: .country, name: .name, number: .number, add: .add[$x|tonumber:$y|tonumber]}' >> $outFile
cat $inFile | jq --arg x $x --arg y $y -c '{recDt: .recDt, country: .country, name: .name, number: .number, add: .add[$x|tonumber:$y|tonumber]}' >> $outFile
Please share if there are any alternate available..
In this response, which calls jq just once, I'm going to assume your computer has enough memory to read the entire JSON. I'll also assume you want to create separate files for each slice, and that you want the JSON to be pretty-printed in each file.
Assuming a chunk size of 2, and that the output files are to be named using the template part-N.json, you could write:
< input.json jq -r --argjson size 2 '
del(.add) as $object
| (.add|_nwise($size) | ("\t", $object + {add:.} ))
' | awk '
/^\t/ {fn++; next}
{ print >> "part-" fn ".json"}'
The trick being used here is that valid JSON cannot contain a tab character.
The following assumes the input JSON is too large to read into memory and therefore uses jq's --stream command-line option.
To keep things simple, I'll focus on the "slicing" of the .add array, and won't worry about the other keys, or pretty-printing, and other details, as you can easily adapt the following according to your needs:
< input.json jq -nc --stream --argjson size 2 '
def regroup(stream; $n):
foreach (stream, null) as $x ({a:[]};
if $x == null then .emit = .a
elif .a|length == $n then .emit = .a | .a = [$x]
else .emit=null | .a += [$x] end;
select(.emit).emit);
regroup(fromstream( 2 | truncate_stream(inputs | select(.[0][0] == "add")) );
$size)' |
awk '{fn++; print > fn ".json"}'
This writes the arrays to files with filenames of the form N.json

How to recursively merge inherited json array elements?

I have the following json file named CMakePresets.json that is a cmake-preset file:
{
"configurePresets": [
{
"name": "default",
"hidden": true,
"generator": "Ninja",
"binaryDir": "${sourceDir}/_build/${presetName}",
"cacheVariables": {
"YIO_DEV": "1",
"BUILD_TESTING": "1"
}
},
{
"name": "debug",
"inherits": "default",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Debug"
}
},
{
"name": "release",
"inherits": "default",
"binaryDir": "${sourceDir}/_build/Debug",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release"
}
},
{
"name": "arm",
"inherits": "debug",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/Toolchain/arm-none-eabi-gcc.cmake"
}
}
]
}
I want recursively merge with * the configurePresets elements that inherit themselves for a specific entry name. I have example a node with name arm and want to have resulting json object with resolved inheritance. The parent has the name stored inside .inherits of each element. arm inherits over debug which inherits over default.
I could write a bash shell loop that I believe works, with the help of Remove a key:value from an JSON object using jq and this answer:
input=arm
# extract one element
g() { jq --arg name "$1" '.configurePresets[] | select(.name == $name)' CMakePresets.json; };
# get arm element
acc=$(g "$input");
# If .inherits field exists
while i=$(<<<"$acc" jq -r .inherits) && [[ -n "$i" && "$i" != "null" ]]; do
# remove it from input
a=$(<<<"$acc" jq 'del(.inherits)');
# get parent element
b=$(g "$i");
# merge parent with current
acc=$(printf "%s\n" "$b" "$a" | jq -s 'reduce .[] as $item ({}; . * $item)');
done;
echo "$acc"
outputs, which I believe is the expected output for arm:
{
"name": "arm",
"hidden": true,
"generator": "Ninja",
"binaryDir": "${sourceDir}/_build/${presetName}",
"cacheVariables": {
"YIO_DEV": "1",
"BUILD_TESTING": "1",
"CMAKE_BUILD_TYPE": "Debug",
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/Toolchain/arm-none-eabi-gcc.cmake"
}
}
But I want to write it in jq. I tried and jq language is not intuitive for me. I can do it for example for two (ie. countable) elements:
< CMakePresets.json jq --arg name "arm" '
def g(n): .configurePresets[] | select(.name == n);
g($name) * (g($name) | .inherits) as $name2 | g($name2)
'
But I do not know how to do reduce .[] as $item ({}; . * $item) when the $item is really g($name) that depends on the last g($name) | .inherits. I tried reading jq manual and learning about variables and loops, but jq has a very different syntax. I tried to use while, but that's just syntax error that I do not understand and do not know how to fix. I guess while and until might not be right here, as they operate on previous loop output, while the elements are always from root.
$ < CMakePresets.json jq --arg name "arm" 'def g(n): .configurePresets[] | select(.name == n);
while(g($name) | .inherits as $name; g($name))
'
jq: error: syntax error, unexpected ';', expecting '|' (Unix shell quoting issues?) at <top-level>, line 2:
while(g($name) | .inherits as $name; g($name))
jq: 1 compile error
How to write such loop in jq language?
Assuming the inheritance hierarchy contains no loops, as is the case with the example, we can break the problem down into the pieces shown below:
# Use an inner function of arity 0 to take advantage of jq's TCO
def inherits_from($dict):
def from:
if .name == "default" then .
else $dict[.inherits] as $next
| ., ($next | from)
end;
from;
def chain($start):
INDEX(.configurePresets[]; .name) as $dict
| $dict[$start] | inherits_from($dict);
reduce chain("arm") as $x (null;
($x.cacheVariables + .cacheVariables) as $cv
| $x + .
| .cacheVariables = $cv)
| del(.inherits)
This produces the desired output efficiently.
One advantage of the above formulation of a solution is that it can easily be modified to handle circular dependencies.
Using recurse/1
inherits_from/1 could also be defined using the built-in function recurse/1:
def inherits_from($dict):
recurse( select(.name != "default") | $dict[.inherits]) ;
or perhaps more interestingly:
def inherits_from($dict):
recurse( select(.inherits) | $dict[.inherits]) ;
Using *
Using * to combine objects has a high overhead because of its recursive semantics, which is often either not required or not wanted. However,
if it is acceptable here to use * for combining the objects, the above can be simplified to:
def inherits_from($dict):
recurse( select(.inherits) | $dict[.inherits]) ;
INDEX(.configurePresets[]; .name) as $dict
| $dict["arm"]
| reduce inherits_from($dict) as $x ({}; $x * .)
| del(.inherits)
Writing a recursive function is actually simple, once you get the hang of it:
jq --arg name "$1" '
def _get_in(input; n):
(input[] | select(.name == n)) |
(if .inherits then .inherits as $n | _get_in(input; $n) else {} end) * .;
def get(name):
.configurePresets as $input | _get_in($input; name);
get($name)
' "$presetfile"
First I filter only .configurePresets then in a function I get input[] | select(.name == n) only the part I am interested in. Then if .inherits if it has inherits, then .inherits as $n | _get_in(input; $n) take the name in inherits and call itself again. Else return else {} end empty. Then that is * . merged with the result of input[] | select(.name == n) - the itself. So it recursively loads all the {} * (input[]|select()) * (input[]|select()) * (input[]|select()).

Accessing nested values in JQ using path passed through command line

How or if is it possible to extract data from JSON structure specifying path as an argument to the command. I got this simple snippet taken from some larger script just for simplicity and have problems working it out:
#!/bin/bash
DATA='{
"level1": {
"level2": {
"level3": {
"foo": "bar",
"bar": "baz",
"baz": "bar"
}
}
}
}'
field="level1.level2.level3"
# does not work
jq -r --arg f ${field} '.[$f] | to_entries | .[] | "\"" + .key + "\"=\"" + .value + "\""' <<< ${DATA}
# works
jq -r --arg f ${field} '.level1.level2.level3 | to_entries | .[] | "\"" + .key + "\"=\"" + .value + "\""' <<< ${DATA}
# also works
field2="level3"
jq -r --arg f ${field2} '.level1.level2 | .[$f] | to_entries | .[] | "\"" + .key + "\"=\"" + .value + "\""' <<< ${DATA}
Gives the following output:
user#astra:~/test$ ./jqtest
jq: error (at <stdin>:12): null (null) has no keys
"foo"="bar"
"bar"="baz"
"baz"="bar"
"foo"="bar"
"bar"="baz"
"baz"="bar"
What am I doing wrong?
In this case .[$f] means "return the value associated with the key named level1.level2.level3". See:
$ jq --arg f 'level1.level2.level3' '.[$f]' <<< '{ "level1.level2.level3": "foo" }'
"foo"
Unless any of the path components contain a dot, splitting $f by dots and using the result as argument to getpath should work.
getpath($f / ".")
jq seems picky about paths in input arguments.
One solution could be to provide the path as a json array, and then use getpath to convert that to a path:
field='["level1", "level2", "level3"]'
jq -r --argjson f "$field" 'getpath($f)' <<< ${DATA}
Or for your specific question:
field='["level1", "level2", "level3"]'
jq -r --argjson f "${field}" 'getpath($f) | to_entries | .[] | "\"" + .key + "\"=\"" + .value + "\""' <<< ${DATA}

Can I output boolean based on values in a list?

Edit: I used the solution provided by #peak to do the following:
$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| #csv
' users.json
Added the select() to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.
Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:
{
"username": "janedoe",
"roles": [
"role1",
"role4",
"role5"
]
}
The actual data file is an ndjson file, one user object as above per line.
I am only interested in specific roles, say role1, role3, and role4. I want to produce a CSV formatted as:
username,role1?,role3?,role4?
e.g.,
janedoe,true,false,true
The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq itself?
With your input, the invocation:
jq -r --argjson whitelist '["role1", "role3", "role4"]' '
(["username"] + $whitelist),
[.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
| #csv
'
produces:
"username","role1","role3","role4"
"janedoe",true,false,true
Notes:
The second last line of the jq filter above could be shortened to:
[.username, (.roles | index($whitelist[]) != null)]
Presumably if there were more than one user, you'd only want
the header row once, in which case the above solution
would need to be tweaked.
Using IN/1
Because index/1 is not as efficient as it might be,
you might like to consider this alternative:
(["username"] + $whitelist),
(.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
| #csv
Using a JSON dictionary
If the number of roles was very large, then it would probably be more
efficient to construct a JSON dictionary to avoid repeated linear lookups:
(reduce .roles[] as $r ({}; .[$r]=true)) as $roles
| (["username"] + $whitelist),
[.username, ($roles[$whitelist[]] != null)]
| #csv
With ndjson as input
For efficiency, and to ensure there's just one header, you could use inputs with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:
jq -nr --argjson whitelist '["role1", "role2"]' '
["email", "username", "given_name", "family_name"] as $greenlist
| ($greenlist + $whitelist),
(inputs
| select(has("roles") and any(.roles[] == $whitelist[]; true))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
| #csv
' users.json

Is there a fix for this Expression to get the key value pairs right

I have this python script which prints this output in raw string format.
{"A":"ab3241c","B":"d12e31234f","c":"[g$x>Q)M&.N+v8"}
I am using jq to set the values of A,B and c
Something like this
expression='. | to_entries | .[] | .key + "=\"" + .value + "\""'
eval "$(python script.py | jq -r "$expression")"
This works fine A and B. And when I do something like
echo $A
ab3241c
But the problem is with c where i get the output as
[g\u003eQ)M\u0026.N+v8
so $x> and & are getting converted to unicode.
Can I fix the expression to avoid this?
I fixed it using
expression as
expression='to_entries | map("\(.key)=\(.value | #sh)") | .[]'
Using eval here is like shooting yourself in the foot.
Why not just pipe the output of your python command to the jq command?
Consider:
function mypython {
cat <<"EOF"
{"A":"ab3241c","B":"d12e31234f","c":"[g$x>Q)M&.N+v8"}
EOF
}
expression='to_entries[] | .key + "=\"" + .value + "\""'
mypython | jq -r "$expression"
Note that using expression here seems pointless as well. In general, it would be better either to "inline" it, or put it in a file and use jq's -f command-line option.
(Notice also that the first "." in your expression is not needed.)
If you really must use eval, then consider:
function mypython {
cat <<"EOF"
{"A":"ab3241c","B":"d12e31234f","c":"[g$x>Q)M&.N+v8"}
EOF
}
expression='to_entries[] | .key + \"=\\\"\" + .value + \"\\\"\"'
eval "mypython | jq -r \"$expression\""
Output
A="ab3241c"
B="d12e31234f"
c="[g$x>Q)M&.N+v8"