Bash. How to access parameters while looping through json with jq? - json

i have a details.json file with a lot of entries and a shops.txt file like below. I like to have a little script which compares two values and just return the matching json entries.
[
{
"userName": "Anne",
"email": "anne#stack.com",
"company": {
"name": "Stack GmbH",
},
"details": {
"key": "EFHJKI-KJEFT-DHMNEB",
"prod": "Car",
},
"store": {
"id": "05611a7f-a679-12ad-a3u2-0745e3650a03",
"storeName": "shop-a57ca0a3-120c-1a73-153b-fa4231cab768",
}
},
{
"userName": "Tom",
"email": "tom#stack.com",
"company": {
"name": "Stack GmbH",
},
"details": {
"key": "DFSGSE-FGEAR-GWRTGW",
"prod": "Bike",
},
"store": null
},
]
This is the other file "shops.txt" (can be a lot more of shops inside)
shop-a57ca0a3-120c-1a73-153b-fa4231cab768
The script is looping through the shops, for every shop it loops through the json and should compare the currentShop with the store.shop from json and then echo the user and the shop.
But I can not access the specific parameters inside the json. How can I do this?
#!/bin/bash
shops="shops.txt"
while IFS= read -r line
do
currentShop="$line"
jq -c '.[.userName, .store.storeName]' details.json | while read i; do
if [[ $i.store.storeName == *$currentShop* ]]; then
echo $i.userName
echo $currentShop
fi
done
done < "$shops"

First of all, you might want to 'clean' your json, remove any trailing ,'s etc.
After looping through each line in the file we just need one select() to get the matching object.
The script could look something like:
#!/bin/bash
while read shop; do
echo "Check: $shop"
jq -r --arg storeName "$shop" '.[] | select(.store.storeName == "\($storeName)") | "\(.userName) - \(.store.storeName)"' details.json
done < "shops.txt"
Which will produce
Check: shop-a57ca0a3-120c-1a73-153b-fa4231cab768
Anne - shop-a57ca0a3-120c-1a73-153b-fa4231cab768
I guess this could be combined into a single jq call, but it seems like you want to loop over each entry found
You can test this jq selector on this online JqPlay Demo.

I was able to access the values with the following command:
echo $i | jq -r '.userName'

Related

Merging multiple JSON Lines files into a single JSON object

I'm trying to merge / reduce many JSON objects and somehow I'm not getting the expected result.
I'm only interested in getting all keys, the values and the number of items inside arrays are irrelevant.
file1.json:
{
"customerId": "xx",
"emails": [
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
}
{
"id": "654",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
]
}
The desired output is a JSON object with all possible keys from all input objects. The values are irrelevant, any value from any input object is OK. But all keys from input objects must be present in output object:
{
"emails": [
{
"address": "james#zz.com", <--- any existing value works
"customType": "", <--- any existing value works
"type": "custom", <--- any existing value works
"primary": true <--- any existing value works
}
],
"customerId": "xx", <--- any existing value works
"id": "654" <--- any existing value works
}
I tried reducing it, but it misses many of the keys in the array:
$ jq -s 'reduce .[] as $item ({}; . + $item)' file1.json
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
],
"id": "654"
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
Is it possible to fix this somehow considering how jq works? Or is it possible to solve this issue using another tool?
PS: For those of you that are curious, this is useful to infer a schema that can be created in a database. Given an arbitrary number of JSON objects with an arbitrary structure, it's easy to create a single JSON squished/merged/fused structure that will "accommodate" all JSON objects.
BigQuery is able to autodetect a schema, but only 500 lines are analyzed to come up with it. This presents problems if objects have different structures past that 500 line mark.
With this approach I can squish a JSON Lines file with 1000000s of objects into one line that can be then imported into BigQuery with the autodetect schema flag and it will work every time since BigQuery only has one line to analyze and this line is the "super-schema" of all the objects. After extracting the autodetected schema I can manually fine tune it to make sure types are correct and then recreate the table specifying my tuned schema:
$ ls -1 users*.json | wc --lines
3672
$ cat users*.json > users-all.json
$ cat users-all.json | wc --lines
146482633
$ jq 'squish' users-all.json > users-all-squished.json
$ cat users-all-squished.json | wc --lines
1
$ bq load --autodetect users users-all-squished.json
$ bq show schema --format=prettyjson users > users-schema.json
$ vi users-schema.json
$ bq rm --table users
$ bq mk --table users --schema=users-schema.json
$ bq load users users-all.json
[Some options are missing or changed for readability]
Here is a solution that produces the expected result in the sample example, and seems to meet all the stated requirements. It is similar to one proposed by #pmf on this page.
jq -n --stream '
def squish: map(if type == "number" then 0 else . end);
reduce (inputs | select(length==2)) as [$p, $v] ({}; setpath($p|squish; $v))
'
Output
For the example given in the Q, the output is:
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
}
],
"id": "654"
}
As #peak has pointed out, some aspects are underspecified. For instance, what should happen with .customerId and .id? Are they always the same across all files (as suggested by the sample files provided)? Do you want the items of the .emails array just thrown into one large array, or do you want to have them "merged" by some criteria (e.g. by a common value in their .address field)? Here are some stubs to start from:
Simply concatenate the .emails arrays and take all other parts from the first file:
jq 'reduce inputs as $in (.; .emails += $in.emails)' file*.json
# or simpler
jq '.emails += [inputs.emails[]]' file*.json
Demo Demo
{
"emails": [
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
],
"customerId": "xx",
"id": "654"
}
Merge the objects in the .emails array by a common value in their .address field, with latter values overwriting former values for other fields with colliding names, and discard all other parts from the files:
jq -n 'reduce inputs.emails[] as $e ({}; .[$e.address] += $e) | map(.)' file*.json
Demo
[
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
If you are only interested in a list of unique field names for a given address, regardless of the counts and values used, you can also go with:
jq -n '
reduce inputs.emails[] as $e ({}; .[$e.address][$e | keys_unsorted[]] = 1)
| map_values(keys)
'
Demo
{
"cc#xx.com": [
"address"
],
"james#zz.com": [
"address",
"customType",
"type"
],
"james#x.com": [
"address"
],
"sales#x.com": [
"address",
"primary"
],
"info#x.com": [
"address"
]
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
You can use the --stream flag to break down the structure into an array of paths and values, discard the values part and make the paths unique:
jq --stream -nc '[inputs[0]] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["emails",1,"address"]
["emails",2]
["emails",2,"address"]
["emails",2,"primary"]
["emails",3]
["emails",3,"address"]
["id"]
Trying to build a representation of this, similar to any of the input files, comes with a lot of caveats. For instance, how would you represent in a single structure if one file had .emails as an array of objects, and another had .emails as just an atomic value, say, a string. You would not be able to represent this plurality without introducing new, possibly ambiguous structures (e.g. putting all possibilities into an array).
Therefore, having a list of paths could be a fair compromise. Judging by your desired output, you want to focus more on the object structure, so you could further reduce complexity by discarding the array indices. Depending on your use case, you could replace them with a single value to retain the information of the presence of an array, or discard them entirely:
jq --stream -nc '[inputs[0] | map(numbers = 0)] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["id"]
jq --stream -nc '[inputs[0] | map(strings)] | unique[]' file*.json
["customerId"]
["emails"]
["emails","address"]
["emails","customType"]
["emails","primary"]
["emails","type"]
["id"]
The following program meets these two key requirements:
"all keys from input objects must be present in output object";
"the solution must be agnostic of any keys/values and the solution must not assume any structure or depth."
The approach is the same as one suggested by #pmf, and for the example given in the Q, produces results that are very similar to the one that is shown:
jq -n --stream '
def squish: map(select(type == "string"));
reduce (inputs | select(length==2)) as [$p, $v] ({};
setpath($p|squish; $v))
'
With the given input, this produces:
{
"customerId": "xx",
"emails": {
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
},
"id": "654"
}

select specific values from json & convert into bash variables

I'm new to jq, I have the following JSON & I need to extract FOUR values i.e. SAUCE_KEY, sauceKey, SAUCE_VALUE, sauceValue etc. And I need to covert these bash variables as i.e.
SAUCE_KEY=sauceKey
SAUCE_VALUE=sauceValue
If I will echo it, it should print it's value ie. echo $SAUCE_KEY
I have used the code as:
jq -r '.env_vars[] | with_entries(select([.key] | inside([".env_vars[].name", ".env_vars[].value"])))' | jq -r "to_entries|map(\"\(.value)=\(.value|tostring)\")|.[]"
By doing so, i was able to get values as
name=SAUCE_KEY, value=sauceKey and so on.
{
"#type": "env_vars",
"env_vars": [
{
"#type": "env_var",
"#href": "/repo/xxxxx/env_var/xxxxx",
"#representation": "standard",
"#permissions": {
"read": true,
"write": true
},
"name": "SAUCE_KEY",
"value": "sauceKey"
},
{
"#type": "env_var",
"#href": "/repo/xxxxx/env_var/xxxxx",
"#representation": "standard",
"#permissions": {
"read": true,
"write": true
},
"name": "SAUCE_VALUE",
"value": "sauceValue"
}
]
}
If instead of trying to extract variable names from the JSON, you populate an associate array with the names as keys, it's pretty straightforward.
#!/usr/bin/env bash
declare -A sauces
while IFS=$'\001' read -r name value; do
sauces[$name]=$value
done < <(jq -r '.env_vars[] | "\(.name)\u0001\(.value)"' saucy.json)
for name in "${!sauces[#]}"; do
echo "$name is ${sauces[$name]}"
done
prints out
SAUCE_VALUE is sauceValue
SAUCE_KEY is sauceKey

How to update a subitem in a json file using jq?

Using jq I tried to update this json document:
{
"git_defaults": {
"branch": "master",
"email": "jenkins#host",
"user": "Jenkins"
},
"git_namespaces": [
{
"name": "NamespaceX",
"modules": [
"moduleA",
"moduleB",
"moduleC",
"moduleD"
]
},
{
"name": "NamespaceY",
"modules": [
"moduleE"
]
}
]
}
with adding moduleF to NamespaceY. I need to write the file back again to the original source file.
I came close (but no cigar) with:
jq '. | .git_namespaces[] | select(.name=="namespaceY").modules |= (.+ ["moduleF"])' config.json
and
jq '. | select(.git_namespaces[].name=="namespaceY").modules |= (.+ ["moduleF"])' config.json
The following filter should perform the update you want:
(.git_namespaces[] | select(.name=="NamespaceY").modules) += ["moduleF"]
Note that the initial '.|' in your attempt is not needed; that "NamespaceY" is capitalized in config.json; that the parens as shown are the keys to success; and that += can be used here.
One way to write back to the original file would perhaps be to use 'sponge'; other possibilities are discussed on the jq FAQ https://github.com/stedolan/jq/wiki/FAQ

json array into json stream with jq

This task is similar to this one but in my case I would like to go other way around.
So say we have input:
[
{
"name": "John",
"email": "john#company.com"
},
{
"name": "Brad",
"email": "brad#company.com"
}
]
and desired output is:
{
"name": "John",
"email": "john#company.com"
}
{
"name": "Brad",
"email": "brad#company.com"
}
I tried to write a bash function which will do it in loop:
#!/bin/bash
json=`cat $1`
length=`echo $json | jq '. | length'`
for (( i=0; i<$length ; i++ ))
do
echo $json | jq ".[$i]"
done
but it is obviously extremly slow...
Is there any way how to use jq better for this?
You can use this :
jq '.[]' file
If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array.
Test:
$ jq '.[]' file
{
"email": "john#company.com",
"name": "John"
}
{
"email": "brad#company.com",
"name": "Brad"
}
you can apply ".[]" filter.
This tutorial is very informative
https://stedolan.github.io/jq/tutorial/

How to use `jq` to obtain the keys

My json looks like this :
{
"20160522201409-jobsv1-1": {
"vmStateDisplayName": "Ready",
"servers": {
"20160522201409 jobs_v1 1": {
"serverStateDisplayName": "Ready",
"creationDate": "2016-05-22T20:14:22.000+0000",
"state": "READY",
"provisionStatus": "PENDING",
"serverRole": "ROLE",
"serverType": "SERVER",
"serverName": "20160522201409 jobs_v1 1",
"serverId": 2902
}
},
"isAdminNode": true,
"creationDate": "2016-05-22T20:14:23.000+0000",
"totalStorage": 15360,
"shapeId": "ot1",
"state": "READY",
"vmId": 4353,
"hostName": "20160522201409-jobsv1-1",
"label": "20160522201409 jobs_v1 ADMIN_SERVER 1",
"ipAddress": "10.252.159.39",
"publicIpAddress": "10.252.159.39",
"usageType": "ADMIN_SERVER",
"role": "ADMIN_SERVER",
"componentType": "jobs_v1"
}
}
My key keeps changing from time to time. So for example 20160522201409-jobsv1-1 may be something else tomorrow. Also I may more than one such entry in the json payload.
I want to echo $KEYS and I am trying to do it using jq.
Things I have tried :
| jq .KEYS is the command i use frequently.
Is there a jq command to display all the primary keys in the json?
I only care about the hostname field. And I would like to extract that out. I know how to do it using grep but it is NOT a clean approach.
You can simply use: keys:
% jq 'keys' my.json
[
"20160522201409-jobsv1-1"
]
And to get the first:
% jq -r 'keys[0]' my.json
20160522201409-jobsv1-1
-r is for raw output:
--raw-output / -r: With this option, if the filter’s result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.
Source
If you want a known value below an unknown property, eg xxx.hostName:
% jq -r '.[].hostName' my.json
20160522201409-jobsv1-1