Merge 2 YAML files with update in-place only - json

I have a need to merge two YAML files with YQ/JQ update only on the map file. Let me explain my scenario.
data.yaml
data1:
subkey1: subvalue1
subkey2: subvalue2
data2: value2
data3:
subkey3: subvalue3
map.yaml
data1:
data3:
subkey3:
Expected outcome:
data1:
subkey1: subvalue1
subkey2: subvalue2
data3:
subkey3: subvalue3
I tried to look up and couldn't find any solution to achieve this. The data & map are supposed to be changing. Is there any way to achieve this in jq or yq?

This produces the desired output and might be close to what you want:
gojq -n --yaml-input --yaml-output '
input as $data
| input as $map
| reduce ($map | keys[]) as $k ({}; .[$k] = $data[$k])
' data.yaml map.yaml
Then again, maybe this is even closer:
gojq -n --yaml-input --yaml-output '
input as $data
| input as $map
| reduce ($map | tostream | select(length==2)) as [$k, $_] ({}; setpath($k; $data | getpath($k)) )
' data.yaml map.yaml

Related

jq json object concatenation to bash string array

I want to use jq (or anything else when it's the wrong tool) to concatenate a json object like this:
{
"https://github.com": {
"user-one": {
"repository-one": "version-one",
"repository-two": "version-two"
},
"user-two": {
"repository-three": "version-three",
"repository-four": "version-four"
}
},
"https://gitlab.com": {
"user-three": {
"repository-five": "version-five",
"repository-six": "version-six"
},
"user-four": {
"repository-seven": "version-seven",
"repository-eight": "version-eight"
}
}
}
recursively to a bash string array like this:
(
"https://github.com/user-one/repository-one/archive/refs/heads/version-one.tar.gz"
"https://github.com/user-one/repository-two/archive/refs/heads/version-two.tar.gz"
"https://github.com/user-two/repository-three/archive/refs/heads/version-three.tar.gz"
"https://github.com/user-two/repository-four/archive/refs/heads/version-four.tar.gz"
"https://gitlab.com/user-three/repository-five/-/archive/version-five/repository-five-version-five.tar.gz"
"https://gitlab.com/user-three/repository-six/-/archive/version-six/repository-six-version-six.tar.gz"
"https://gitlab.com/user-four/repository-seven/-/archive/version-seven/repository-seven-version-seven.tar.gz"
"https://gitlab.com/user-four/repository-eight/-/archive/version-eight/repository-eight-version-eight.tar.gz"
)
for subsequent use in a loop.
for i in "${arr[#]}"
do
echo "$i"
done
Have no idea how to do that.
As you can see, the values must be handled differently depending on the object name.
"https://github.com" + "/" + $user_name + "/" + $repository_name + "/archive/refs/heads/" + $version + ".tar.gz"
"https://gitlab.com" + "/" + $user_name + "/" + $repository_name + "/-/archive/" + $version + "/" + $repository_name + "-" + $version + ".tar.gz"
Could anyone help?
Easily done.
First, let's focus on the jq code alone:
to_entries[] # split items into keys and values
| .key as $site # store first key in $site
| .value # code below deals with the value
| to_entries[] # split that value into keys and values
| .key as $user # store the key in $user
| .value # code below deals with the value
| to_entries[] # split that value into keys and values
| .key as $repository_name # store the key in $repository_name
| .value as $version # store the value in $version
| if $site == "https://github.com" then
"\($site)/\($user)/\($repository_name)/archive/refs/heads/\($version).tar.gz"
else
"\($site)/\($user)/\($repository_name)/-/archive/\($version)/\($repository_name)-\($version).tar.gz"
end
That generates a list of lines. Reading lines into a bash array looks like readarray -t arrayname < ...datasource...
Thus, using a process substitution to redirect jq's stdout as if it were a file:
readarray -t uris < <(jq -r '
to_entries[]
| .key as $site
| .value
| to_entries[]
| .key as $user
| .value
| to_entries[]
| .key as $repository_name
| .value as $version
| if $site == "https://github.com" then
"\($site)/\($user)/\($repository_name)/archive/refs/heads/\($version).tar.gz"
else
"\($site)/\($user)/\($repository_name)/-/archive/\($version)/\($repository_name)-\($version).tar.gz"
end
' <config.json
)
The basic task of generating the strings can be done efficiently and generically (i.e., without any limits on the depths of the basenames) using the jq filter:
paths(strings) as $p | $p + [getpath($p)] | join("/")
There are several ways to populate a bash array accordingly, but if you merely wish to iterate through the values, you could use a bash while loop, like so:
< input.json jq -r '
paths(strings) as $p | $p + [getpath($p)] | join("/")' |
while read -r line ; do
echo "$line"
done
You might also wish to consider using jq's #sh or #uri filter. For a jq urlencode function, see e.g.
https://rosettacode.org/wiki/URL_encoding#jq
(If the strings contain newlines or tabs, then the above would need to be tweaked accordingly.)

Can I output boolean based on values in a list?

Edit: I used the solution provided by #peak to do the following:
$ jq -r --argjson whitelist '["role1", "role2"]' '
select(has("roles") and any(.roles[]; . == "role1" or . == "role2"))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [.email, .username, .given_name, .family_name, ($roles[$whitelist[]]
| . != null)]
| #csv
' users.json
Added the select() to filter out users who haven't onboarded yet and don't have any roles, and to ensure the users included in the output have at least one of the target roles.
Scenario: user profiles as JSON docs, where each profile has a list object with their assigned roles. Example:
{
"username": "janedoe",
"roles": [
"role1",
"role4",
"role5"
]
}
The actual data file is an ndjson file, one user object as above per line.
I am only interested in specific roles, say role1, role3, and role4. I want to produce a CSV formatted as:
username,role1?,role3?,role4?
e.g.,
janedoe,true,false,true
The part I haven't figured out is how to output booleans or Y / N in response to the values in the list object. Is this something I can do in jq itself?
With your input, the invocation:
jq -r --argjson whitelist '["role1", "role3", "role4"]' '
(["username"] + $whitelist),
[.username, ($whitelist[] as $w | .roles | index([$w]) != null)]
| #csv
'
produces:
"username","role1","role3","role4"
"janedoe",true,false,true
Notes:
The second last line of the jq filter above could be shortened to:
[.username, (.roles | index($whitelist[]) != null)]
Presumably if there were more than one user, you'd only want
the header row once, in which case the above solution
would need to be tweaked.
Using IN/1
Because index/1 is not as efficient as it might be,
you might like to consider this alternative:
(["username"] + $whitelist),
(.roles as $roles | [.username, ($whitelist[] | IN($roles[]) )])
| #csv
Using a JSON dictionary
If the number of roles was very large, then it would probably be more
efficient to construct a JSON dictionary to avoid repeated linear lookups:
(reduce .roles[] as $r ({}; .[$r]=true)) as $roles
| (["username"] + $whitelist),
[.username, ($roles[$whitelist[]] != null)]
| #csv
With ndjson as input
For efficiency, and to ensure there's just one header, you could use inputs with the -n command-line option. Adding the extra fields mentioned in the revised Q, you might end up with:
jq -nr --argjson whitelist '["role1", "role2"]' '
["email", "username", "given_name", "family_name"] as $greenlist
| ($greenlist + $whitelist),
(inputs
| select(has("roles") and any(.roles[] == $whitelist[]; true))
| (reduce ."roles"[] as $r ({}; .[$r]=true)) as $roles
| [ .[$greenlist[]], ($roles[$whitelist[]] != null) ])
| #csv
' users.json

Zip lists in jq's objects construction by {} instead of multiplying them like default

A JSON object like this:
{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}
And, I want to convert it with lists(assume all lists have equal length N) zipped and output like this:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
I followed Object - {} example and tried:
tmp='{"user":"stedolan","titles":["JQ Primer", "More JQ"],"years":[2013, 2016]}'
echo $tmp | jq '{user, title: .titles[], year: .years[]}'
then it output:
{"user":"stedolan","title":"JQ Primer","year":2013}
{"user":"stedolan","title":"JQ Primer","year":2016}
{"user":"stedolan","title":"More JQ","year":2013}
{"user":"stedolan","title":"More JQ","year":2016}
It produces N*N ... lines result, instead of N lines result.
Any suggestion is appreciated!
transpose/0 can be used to effectively zip the values together. And the nice thing about the way assignments work is that it can be assigned simultaneously over multiple variables.
([.titles,.years]|transpose[]) as [$title,$year] | {user,$title,$year}
If you want the results in an array rather than a stream, just wrap it all in [].
https://jqplay.org/s/ZIFU5gBnZ7
For a jq 1.4 compatible version, you'll have to rewrite it to not use destructuring but you could use the same transpose/0 implementation from the builtins.
transpose/0:
def transpose:
if . == [] then []
else . as $in
| (map(length) | max) as $max
| length as $length
| reduce range(0; $max) as $j
([]; . + [reduce range(0;$length) as $i ([]; . + [ $in[$i][$j] ] )] )
end;
Here's an alternative implementation that I cooked up that should also be compatible. :)
def transpose2:
length as $cols
| (map(length) | max) as $rows
| [range(0;$rows) as $r | [.[range(0;$cols)][$r]]];
([.titles,.years]|transpose[]) as $p | {user,title:$p[0],year:$p[1]}
If you want the output to have the keys in the order indicated in the Q, then the solution is a bit trickier than would otherwise be the case.
Here's one way to retain the order:
with_entries( .key |= (if . == "titles" then "title" elif . == "years" then "year" else . end) )
| range(0; .title|length) as $i
| .title |= .[$i]
| .year |= .[$i]
The (potential) advantage of this approach is that one does not have to mention any of the other keys.

jq add value of a key in nested array and given to a new key

I have a stream of JSON arrays like this
[{"id":"AQ","Count":0}]
[{"id":"AR","Count":1},{"id":"AR","Count":3},{"id":"AR","Count":13},
{"id":"AR","Count":12},{"id":"AR","Count":5}]
[{"id":"AS","Count":0}]
I want to use jq to get a new json like this
{"id":"AQ","Count":0}
{"id":"AR","Count":34}
{"id":"AS","Count":0}
34=1+3+13+12+5 which are in the second array.
I don't know how to describe it in detail. But the basic idea is shown in my example.
I use bash and prefer to use jq to solve this problem. Thank you!
If you want an efficient but generic solution that does NOT assume each input array has the same ids, then the following helper function makes a solution easy:
# Input: a JSON object representing the subtotals
# Output: the object augmented with additional subtotals
def adder(stream; id; filter):
reduce stream as $s (.; .[$s|id] += ($s|filter));
Assuming your jq has inputs, then the most efficient approach is to use it (but remember to use the -n command-line option):
reduce inputs as $row ({}; adder($row[]; .id; .Count) )
This produces:
{"AQ":0,"AR":34,"AS":0}
From here, it's easy to get the answer you want, e.g. using to_entries[] | {(.key): .value}
If your jq does not have inputs and if you don't want to upgrade, then use the -s option (instead of -n) and replace inputs by .[]
Assuming the .id is the same in each array:
first + {Count: map(.Count) | add}
Or perhaps more intelligibly:
(map(.Count) | add) as $sum | first | .Count = $sum
Or more declaratively:
{ id: (first|.id), Count: (map(.Count) | add) }
It's a bit kludgey, but given your input:
jq -c '
reduce .[] as $item ({}; .[($item.id)] += ($item.Count))
| to_entries
| .[] | {"id": .key, "Count": .value}
'
Yields the output:
{"id":"AQ","Count":0}
{"id":"AR","Count":34}
{"id":"AS","Count":0}

How to map an object to arrays so it can be converted to csv?

I'm trying to convert an object that looks like this:
{
"123" : "abc",
"231" : "dbh",
"452" : "xyz"
}
To csv that looks like this:
"123","abc"
"231","dbh"
"452","xyz"
I would prefer to use the command line tool jq but can't seem to figure out how to do the assignment. I managed to get the keys with jq '. | keys' test.json but couldn't figure out what to do next.
The problem is you can't convert a k:v object like this straight into csv with #csv. It needs to be an array so we need to convert to an array first. If the keys were named, it would be simple but they're dynamic so its not so easy.
Try this filter:
to_entries[] | [.key, .value]
to_entries converts an object to an array of key/value objects. [] breaks up the array to each of the items in the array
then for each of the items, covert to an array containing the key and value.
This produces the following output:
[
"123",
"abc"
],
[
"231",
"dbh"
],
[
"452",
"xyz"
]
Then you can use the #csv filter to convert the rows to CSV rows.
$ echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries[] | [.key, .value] | #csv'
"123","abc"
"231","dbh"
"452","xyz"
Jeff answer is a good starting point, something closer to what you expect:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))'
[
"123,abc",
"231,dbh",
"452,xyz"
]
But did not find a way to join using newline:
cat input.json | jq 'to_entries | map([.key, .value]|join(","))|join("\n")'
"123,abc\n231,dbh\n452,xyz"
Here's an example I ended up using this morning (processing PagerDuty alerts):
cat /tmp/summary.json | jq -r '
.incidents
| map({desc: .trigger_summary_data.description, id:.id})
| group_by(.desc)
| map(length as $len
| {desc:.[0].desc, length: $len})
| sort_by(.length)
| map([.desc, .length] | #csv)
| join("\n") '
This dumps a CVS-separated document that looks something like:
"[Triggered] Something annoyingly frequent",31
"[Triggered] Even more frequent alert!",35
"[No data] Stats Server is probably acting up",55
Try This
give same output you want
echo '{"123":"abc","231":"dbh","452":"xyz"}' | jq -r 'to_entries | .[] | "\"" + .key + "\",\"" + (.value | tostring)+ "\""'
onecol2txt () {
awk 'BEGIN { RS="_end_"; FS="\n"}
{ for (i=2; i <= NF; i++){
printf "%s ",$i
}
printf "\n"
}'
}
cat jsonfile | jq -r -c '....,"_end_"' | onecol2txt