How to perform a check on multiple sub-trees - json

What I'm trying to do currently, is, within each environment, compare mainAccount and secondAccount values.
If they do match, then I will trigger some downstream code to check the file version. If they do not, then I will pass. That is not really relevant, however I am struggling to compare the values across each environment. Since each .json file will have different amounts of environments.
Meaning, in testing environment, I want to check if mainAccount = secondAccount, and same in production environment.
I'm running into issues parsing this JSON with jq:
json1
{
"file_version": 1.0,
"config": [
{
"environment": "testing",
"main": [
{
"mainAccount": "123"
}
],
"second": [
{
"secondAccount": "456"
}
]
},
{
"environment": "production",
"main": [
{
"mainAccount": "789"
}
],
"second": [
{
"secondAccount": "789"
}
]
}
]
}
Here's another sample .json file for comparsion:
json2
{
"file_version": 1.3,
"config": [
{
"environment": "testing",
"main": [
{
"mainAccount": "123"
}
],
"second": [
{
"secondAccount": "456"
}
]
},
{
"environment": "production",
"main": [
{
"mainAccount": "789"
}
],
"second": [
{
"secondAccount": "789"
}
]
},
{
"environment": "pre-production",
"main": [
{
"mainAccount": "456"
}
],
"second": [
{
"secondAccount": "789"
}
]
},
{
"environment": "staging",
"main": [
{
"mainAccount": "234"
}
],
"second": [
{
"secondAccount": "456"
}
]
}
]
}
If I run this command:
jq -r '.config[] | select(.main != null) | .main[].mainAccount
My output is:
123
789
If i store this output in a variable, it'll be 123 789 so comparing this to the "secondAccount" value is troublesome.
I think what I'm looking for is iteration here, however, I'm not sure how to implement this. I wanted to take a pythonic approach to check the length of the config array, create a for loop in that length range, then collect the value based on an index like
.config[0] | select(.main != null) | .main[].mainAccount
.config[1] | select(.main != null) | .main[].mainAccount
etc. The issue however, is that when I read in the .config[] value as a variable, bash doesn't interpret it like that. The length will be the length of characters, not, the amount of objects in the array.
EXPECTED OUTPUT
Nothing. I simply want to, for each .json file above, compare the mainAccount and secondAccount values with eachother, within each environment.
In json1, I want to compare mainAccount == secondAccount in environment: testing. Then mainAccount == secondAccount in environment: production.
Then move onto json 2 and compare mainAccount == secondAccount in environment: testing. Then environment production, pre-production, staging, so on and so forth.

Since all information is within this one JSON file it is better to do the processing in jq as much as possible and to keep the shell out.
Given your input you can try this jq:
jq '
.config[]
| {
environment,
condition: (.main[0].mainAccount == .second[0].secondAccount)
}' input.json
The result is:
{
"environment": "testing",
"condition": false
}
{
"environment": "production",
"condition": true
}
Some questions though:
Why are the values of first and second arrays objects and not object?
Is it really intended to match the first one of both?
Can there be more items in the arrays?
Also: If you want to process the results in a shell, I propose this expression because the output can be used (source or eval) in a shell:
jq -r '
.config[]
| "\(.environment)=\(.main[0].mainAccount == .second[0].secondAccount)"' input.json
The output is:
testing=false
production=true

You can do the comparison within jq, return the boolean result as its exit status using the -e option, and react upon that in bash, e.g. using an if statement.
if jq -e '
.config | map(select(.main != null) | .main[].mainAccount) | .[0] == .[1]
' file.json >/dev/null
then echo "equal"
else echo "not equal"
fi
not equal

Related

Merging multiple JSON Lines files into a single JSON object

I'm trying to merge / reduce many JSON objects and somehow I'm not getting the expected result.
I'm only interested in getting all keys, the values and the number of items inside arrays are irrelevant.
file1.json:
{
"customerId": "xx",
"emails": [
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
}
{
"id": "654",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
]
}
The desired output is a JSON object with all possible keys from all input objects. The values are irrelevant, any value from any input object is OK. But all keys from input objects must be present in output object:
{
"emails": [
{
"address": "james#zz.com", <--- any existing value works
"customType": "", <--- any existing value works
"type": "custom", <--- any existing value works
"primary": true <--- any existing value works
}
],
"customerId": "xx", <--- any existing value works
"id": "654" <--- any existing value works
}
I tried reducing it, but it misses many of the keys in the array:
$ jq -s 'reduce .[] as $item ({}; . + $item)' file1.json
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
],
"id": "654"
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
Is it possible to fix this somehow considering how jq works? Or is it possible to solve this issue using another tool?
PS: For those of you that are curious, this is useful to infer a schema that can be created in a database. Given an arbitrary number of JSON objects with an arbitrary structure, it's easy to create a single JSON squished/merged/fused structure that will "accommodate" all JSON objects.
BigQuery is able to autodetect a schema, but only 500 lines are analyzed to come up with it. This presents problems if objects have different structures past that 500 line mark.
With this approach I can squish a JSON Lines file with 1000000s of objects into one line that can be then imported into BigQuery with the autodetect schema flag and it will work every time since BigQuery only has one line to analyze and this line is the "super-schema" of all the objects. After extracting the autodetected schema I can manually fine tune it to make sure types are correct and then recreate the table specifying my tuned schema:
$ ls -1 users*.json | wc --lines
3672
$ cat users*.json > users-all.json
$ cat users-all.json | wc --lines
146482633
$ jq 'squish' users-all.json > users-all-squished.json
$ cat users-all-squished.json | wc --lines
1
$ bq load --autodetect users users-all-squished.json
$ bq show schema --format=prettyjson users > users-schema.json
$ vi users-schema.json
$ bq rm --table users
$ bq mk --table users --schema=users-schema.json
$ bq load users users-all.json
[Some options are missing or changed for readability]
Here is a solution that produces the expected result in the sample example, and seems to meet all the stated requirements. It is similar to one proposed by #pmf on this page.
jq -n --stream '
def squish: map(if type == "number" then 0 else . end);
reduce (inputs | select(length==2)) as [$p, $v] ({}; setpath($p|squish; $v))
'
Output
For the example given in the Q, the output is:
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
}
],
"id": "654"
}
As #peak has pointed out, some aspects are underspecified. For instance, what should happen with .customerId and .id? Are they always the same across all files (as suggested by the sample files provided)? Do you want the items of the .emails array just thrown into one large array, or do you want to have them "merged" by some criteria (e.g. by a common value in their .address field)? Here are some stubs to start from:
Simply concatenate the .emails arrays and take all other parts from the first file:
jq 'reduce inputs as $in (.; .emails += $in.emails)' file*.json
# or simpler
jq '.emails += [inputs.emails[]]' file*.json
Demo Demo
{
"emails": [
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
],
"customerId": "xx",
"id": "654"
}
Merge the objects in the .emails array by a common value in their .address field, with latter values overwriting former values for other fields with colliding names, and discard all other parts from the files:
jq -n 'reduce inputs.emails[] as $e ({}; .[$e.address] += $e) | map(.)' file*.json
Demo
[
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
If you are only interested in a list of unique field names for a given address, regardless of the counts and values used, you can also go with:
jq -n '
reduce inputs.emails[] as $e ({}; .[$e.address][$e | keys_unsorted[]] = 1)
| map_values(keys)
'
Demo
{
"cc#xx.com": [
"address"
],
"james#zz.com": [
"address",
"customType",
"type"
],
"james#x.com": [
"address"
],
"sales#x.com": [
"address",
"primary"
],
"info#x.com": [
"address"
]
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
You can use the --stream flag to break down the structure into an array of paths and values, discard the values part and make the paths unique:
jq --stream -nc '[inputs[0]] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["emails",1,"address"]
["emails",2]
["emails",2,"address"]
["emails",2,"primary"]
["emails",3]
["emails",3,"address"]
["id"]
Trying to build a representation of this, similar to any of the input files, comes with a lot of caveats. For instance, how would you represent in a single structure if one file had .emails as an array of objects, and another had .emails as just an atomic value, say, a string. You would not be able to represent this plurality without introducing new, possibly ambiguous structures (e.g. putting all possibilities into an array).
Therefore, having a list of paths could be a fair compromise. Judging by your desired output, you want to focus more on the object structure, so you could further reduce complexity by discarding the array indices. Depending on your use case, you could replace them with a single value to retain the information of the presence of an array, or discard them entirely:
jq --stream -nc '[inputs[0] | map(numbers = 0)] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["id"]
jq --stream -nc '[inputs[0] | map(strings)] | unique[]' file*.json
["customerId"]
["emails"]
["emails","address"]
["emails","customType"]
["emails","primary"]
["emails","type"]
["id"]
The following program meets these two key requirements:
"all keys from input objects must be present in output object";
"the solution must be agnostic of any keys/values and the solution must not assume any structure or depth."
The approach is the same as one suggested by #pmf, and for the example given in the Q, produces results that are very similar to the one that is shown:
jq -n --stream '
def squish: map(select(type == "string"));
reduce (inputs | select(length==2)) as [$p, $v] ({};
setpath($p|squish; $v))
'
With the given input, this produces:
{
"customerId": "xx",
"emails": {
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
},
"id": "654"
}

Integrate json values into another file

I'm trying to update an existing json file from values in another json file using jq in a bash shell.
I've got a settings json file
{
"Logging": {
"MinimumLevel": {
"Default": "Information",
"Override": "Warning"
},
"WriteTo": [
{
"Name": "File",
"Args": {
"path": "./logs/log-.txt",
"rollingInterval": "Day"
}
}
]
},
"Settings": {
"DataServerUrl": "https://address.to.server.com",
"ServerKey": "1f969476798adfe95114dd28ed3a3ff"
"ServerTimeZone": "Mountain Standard Time",
"MaxOccupantCount": 6
}
}
In an integration step, I'm attempting to incorporate values for specific environments (think dev/staging/prod) from an external json file with limited setting values. An example of such a file is
{
"DataServerUrl": "https://dev.server.addr.com",
"ServerKey": "2a4d99233efea456b95114aa23ed342ae"
}
I can get to the data using jq. I can update the data using jq if I hard-code the updates. I'm looking for something general to take in any environment settings values and update them in the base settings file. My searches suggest I can do this in a single step without knowing the specific values. A command similar to
jq -r 'to_entries[]' settings.dev.json |
while IFS= read -r key value; do
jq -r '.[$key] |= [$value]' settings.json
done
What happens is I get error messages stating jq: error: $key is not defined at <top-level> (as well as the same message for $value). The messages appear several times in pairs. settings.json is not changed. Now, this makes partial sense because the output from just jq -r 'to_entries[]' settings.dev.json looks like (empty space in this output is included as produced by the command).
"key": "DataServerUrl",
"value": "https://dev.server.addr.com"
"key": "ServerKey",
"value": "2a4d99233efea456b95114aa23ed342ae"
How do I go about iterating over the values in the environment settings file such that I can use those values to update the base settings file for further processing (i.e., publishing to the target environment)?
The simplest way is to provide both files and address the second one using input. That way, all you need is the assignment:
jq '.Settings = input' settings.json insert.json
{
"Logging": {
"MinimumLevel": {
"Default": "Information",
"Override": "Warning"
},
"WriteTo": [
{
"Name": "File",
"Args": {
"path": "./logs/log-.txt",
"rollingInterval": "Day"
}
}
]
},
"Settings": {
"DataServerUrl": "https://dev.server.addr.com",
"ServerKey": "2a4d99233efea456b95114aa23ed342ae"
}
}
Demo
You could do something like
jq -s '.[1] as $insert | .[0].Settings |= $insert | .[0]' settings.json insert.json
Where we :
slurp both files
Save insert.json to a variable called $insert
Append (|=) $insert to .[0].Settings
Show only the first file .[0]
So the output will become:
{
"Logging": {
"MinimumLevel": {
"Default": "Information",
"Override": "Warning"
},
"WriteTo": [
{
"Name": "File",
"Args": {
"path": "./logs/log-.txt",
"rollingInterval": "Day"
}
}
]
},
"Settings": {
"DataServerUrl": "https://dev.server.addr.com",
"ServerKey": "2a4d99233efea456b95114aa23ed342ae"
}
}

jq - return array value if its length is not null

I have a report.json generated by a gitlab pipeline.
It looks like:
{"version":"14.0.4","vulnerabilities":[{"id":"64e69d1185ecc48a1943141dcb6dbd628548e725f7cef70d57403c412321aaa0","category":"secret_detection"....and so on
If no vulnerabilities found, then "vulnerabilities":[]. I'm trying to come up with a bash script that would check if vulnerabilities length is null or not. If not, print the value of the vulnerabilities key. Sadly, I'm very far from scripting genius, so it's been a struggle.
While searching web for a solution to this, I've come across jq. It seems like select() should do the job.
I've tried:
jq "select(.vulnerabilities!= null)" report.json
but it returned {"version":"14.0.4","vulnerabilities":[{"id":"64e69d1185ecc48a194314... instead of expected "vulnerabilities":[{"id":"64e69d1185ecc48a194314...
and
map(select(.vulnerabilities != null)) report.json
returns "No matches found"
Would you mind pointing out what's wrong apart from my 0 experience with bash and JSON parsing? :)
Thanks in advance
Just use . filter to identify the object vulnerabilities.
these is some cases below
$ jq '.vulnerabilities' <<END
heredoc> {"version":"14.0.4","vulnerabilities":[{"id":"64e69d1185ecc48a1943141dcb6dbd628548e725f7cef70d57403c412321aaa0","category":"secret_detection"}]}
heredoc> END
[
{
"id": "64e69d1185ecc48a1943141dcb6dbd628548e725f7cef70d57403c412321aaa0",
"category": "secret_detection"
}
]
if vulnerabilities null, then jq will return null
$ jq '.vulnerabilities' <<END
{"version":"14.0.4","vulnerabilities":null}
END
null
then with pipe |, you can change it to any output you wanted.
change null to []: .vulnerabilities | if . == null then [] else . end
filter empty array: .vulnerabilities | select(length > 0)
For further information about jq filters, you can read the jq manual.
Assuming, by "print the value of the vulnerabilities key" you mean the value of an item's id field. You can retrieve it using .id and have it extracted to bash with the -r option.
If in case the array is not empty you want all of the "keys", iterate over the array using .[]. If you just wanted a specific key, let's say the first, address it using a 0-based index: .[0].
To check the length of an array there is a dedicated length builtin. However, as your final goal is to extract, you can also attempt to do so right anyway, suppress a potential unreachability error using the ? operator, and have your bash script read an appropriate exit status using the -e option.
Your bash script then could include the following snippet
if key=$(jq -re '.vulnerabilities[0].id?' report.json)
then
# If the array was not empty, $key contains the first key
echo "There is a vulnerability in key $key."
fi
# or
if keys=$(jq -re '.vulnerabilities[].id?' report.json)
then
# If the array was not empty, $keys contains all the keys
for k in $keys
do echo "There is a vulnerability in key $k."
done
fi
Firstly, please note that in the JSON world, it is important to distinguish
between [] (the empty array), the values 0 and null, and the absence of a value (e.g. as the result of the absence of a key in an object).
In the following, I'll assume that the output should be the value of .vulnerabilities
if it is not `[]', or nothing otherwise:
< sample.json jq '
select(.vulnerabilities != []).vulnerabilities
'
If the goal were to differentiate between two cases based on the return code from jq, you could use the -e command-line option.
You can use if-then-else.
Filter
if (.vulnerabilities | length) > 0 then {vulnerabilities} else empty end
Input
{
"version": "1.1.1",
"vulnerabilities": [
{
"id": "111",
"category": "secret_detection"
},
{
"id": "112",
"category": "secret_detection"
}
]
}
{
"version": "1.2.1",
"vulnerabilities": [
{
"id": "121",
"category": "secret_detection 2"
}
]
}
{
"version": "3.1.1",
"vulnerabilities": []
}
{
"version": "4.1.1",
"vulnerabilities": [
{
"id": "411",
"category": "secret_detection 4"
},
{
"id": "412",
"category": "secret_detection"
},
{
"id": "413",
"category": "secret_detection"
}
]
}
Output
{
"vulnerabilities": [
{
"id": "111",
"category": "secret_detection"
},
{
"id": "112",
"category": "secret_detection"
}
]
}
{
"vulnerabilities": [
{
"id": "121",
"category": "secret_detection 2"
}
]
}
{
"vulnerabilities": [
{
"id": "411",
"category": "secret_detection 4"
},
{
"id": "412",
"category": "secret_detection"
},
{
"id": "413",
"category": "secret_detection"
}
]
}
Demo
https://jqplay.org/s/wicmr4uVRm

jq output is empty when tag name does not exist

When I run the jq command to parse a json document from the amazon cli I have the following problem.
I’m parsing through the IP address and a tag called "Enviroment". The enviroment tag in the instance does not exist therefore it does not throw me any result.
Here's an example of the relevant output returned by the AWS CLI
{
"Reservations": [
{
"Instances": [
{
"PrivateIpAddress": "10.0.0.1",
"Tags": [
{
"Key": "Name",
"Value": "Balance-OTA-SS_a"
},
{
"Key": "Environment",
"Value": "alpha"
}
]
}
]
},
{
"Instances": [
{
"PrivateIpAddress": "10.0.0.2",
"Tags": [
{
"Key": "Name",
"Value": "Balance-OTA-SS_a"
}
]
}
]
}
]
}
I’m running the following command
aws ec2 describe-instances --filters "Name=tag:Name,Values=Balance-OTA-SS_a" | jq -c '.Reservations[].Instances[] | ({IP: .PrivateIpAddress, Ambiente: (.Tags[]|select(.Key=="Environment")|.Value)})'
## output
empty
How do I show the IP address in the output of the command even if the enviroment tag does not exist?
Regards,
Let's assume this input:
{
"Reservations": [
{
"Instances": [
{
"PrivateIpAddress": "10.0.0.1",
"Tags": [
{
"Key": "Name",
"Value": "Balance-OTA-SS_a"
},
{
"Key": "Environment",
"Value": "alpha"
}
]
}
]
},
{
"Instances": [
{
"PrivateIpAddress": "10.0.0.2",
"Tags": [
{
"Key": "Name",
"Value": "Balance-OTA-SS_a"
}
]
}
]
}
]
}
This is the format returned by describe-instances, but with all the irrelevant fields removed.
Note that tags is always a list of objects, each of which has a Key and a Value. This format is perfect for from_entries, which can transform this list of tags into a convenient mapping object. Try this:
.Reservations[].Instances[] |
{
IP: .PrivateIpAddress,
Ambiente: (.Tags|from_entries.Environment)
}
{"IP":"10.0.0.1","Ambiente":"alpha"}
{"IP":"10.0.0.2","Ambiente":null}
That answers how to do it. But you probably want to understand why your approach didn't work.
.Reservations[].Instances[] |
{
IP: .PrivateIpAddress,
Ambiente: (.Tags[]|select(.Key=="Environment")|.Value)
}
The .[] filter you're using on the tags can return zero or multiple results. Similarly, the select filter can eliminate some or all items. When you apply this inside an object constructor (the expression from { to }), you're causing that whole object to be created a variable number of times. You need to be very careful where you use these filters, because often that's not what you want at all. Often you instead want to do one of the following:
Wrap the expression that returns multiple results in an array constructor [ ... ]. That way instead of outputting the parent object potentially zero or multiple times, you output it once containing an array that potentially has zero or multiple items. E.g.
[.Tags[]|select(.Key=="Environment")]
Apply map to the array to keep it an array but process its contents, e.g.
.Tags|map(select(.Key=="Environment"))
Apply first(expr) to capture only the first value emitted by the expression. If the expression might emit zero items, you can use the comma operator to provide a default, e.g.
first((.Tags[]|select(.Key=="Environment")),null)
Apply some other array-level function, such as from_entries.
.Tags|from_entries.Environment
You can either use an if ... then ... else ... end construct, or //. For example:
.Reservations[].Instances[]
| {IP: .PrivateIpAddress} +
({Ambiente: (.Tags[]|select(.Key=="Environment")|.Value)}
// null)

Merge and Sort JSON using JQ

I have a file containing the following structure and unknown number of results:
{
"results": [
[
{
"field": "AccountID",
"value": "5177497"
},
{
"field": "Requests",
"value": "50900"
}
],
[
{
"field": "AccountID",
"value": "pro"
},
{
"field": "Requests",
"value": "251"
}
]
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
{
"results": [
[
{
"field": "AccountID",
"value": "5577497"
},
{
"field": "Requests",
"value": "51900"
}
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
There are multiple such results which are indexed as an array with the results folder. They are not seperated by a comma.
I am trying to just print The "AccountID" sorted by "Requests" in ZSH using jq. I have tried flattening them and using:
jq -r '.results[][0] |.value ' filename
jq -r '.results[][1] |.value ' filename
To get the Account ID and Requests seperately and sorting them. I don't think bash has a dictionary that can be used. The problem lies in the file as the Field and value are not key value pair but are both pairs. Therefore extracting them using the above two lines into seperate arrays and sorting by the second array seems a bit too long. I was wondering if there is a way to combine both the operations.
The other way is to combine it all to a string and sort it in ascending order. Python would probably have the best solution but the code requires to be a zsh or bash script.
Solutions that use sed, jq or any other ZSH supported compilers are welcome. If there is a way to create a dictionary in bash, please do let me know.
The projectd output requirement is just the Account ID vs Request Number.
5577497 has 51900 requests
5177497 has 50900 requests
pro has 251 requests
If you don't mind learning a little jq, it will probably be best to write a small jq program to do what you want.
To get you started, consider the following jq program, which assumes your input is a stream of valid JSON objects with a "results" key similar to your sample:
[inputs | .results[] | map( { (.field) : .value} ) | add]
After making minor changes to your input so that it consists of valid JSON objects, an invocation of jq with the -n option produces an array of AccountID/Requests objects:
[
{
"AccountID": "5177497",
"Requests": "50900"
},
{
"AccountID": "pro",
"Requests": "251"
},
{
"AccountID": "5577497",
"Requests": "51900"
}
]
You could (for example) now use jq's group_by to group these objects by AccountID, and thereby produce the result you want.
jq -S '.results[] | map( { (.field) : .value} ) | add' query-results-aggregate \
| jq -s -c 'group_by(.number_of_requests) | .[]'
This does the trick. Thanks to peak for the guidance.