Merge two JSON files using jq in bash - json

I'm hoping someone could help, I'm trying to merge two json files. Here is my bash script:
script_directory="/home/joey/scripts/scripts/delete"
region_file="US.en.json"
cnmts_file="cnmts.json"
wget https://github.com/blawar/titledb/raw/master/$region_file -O $script_directory/$region_file
wget https://github.com/blawar/titledb/raw/master/$cnmts_file -O $script_directory/$cnmts_file
#This is here just to simplify the json files
cat $script_directory/$region_file | jq '.[] | {id: .id}' > $script_directory/region_file_id.txt
cat $script_directory/$cnmts_file | jq '.[] | .[] | {titleId: .titleId, otherApplicationId: .otherApplicationId}' > $script_directory/cnmts_titleId_otherApplicationId.txt
Essentially, I'm given two files:
region_file_id.txt:
{
"id": "01007EF00011E000"
}
{
"id": "0100225000FEE000"
}
{
"id": "0100BCE000598000"
}
{
"id": "0100B42001DB4000"
}
{
"id": "01008A9001DC2000"
}
and cnmts_titleId_otherApplicationId.txt:
{
"titleId": "0100000000010000",
"otherApplicationId": "0100000000010800"
}
{
"titleId": "0100000000010800",
"otherApplicationId": "0100000000010000"
}
{
"titleId": "010000000e5ee000",
"otherApplicationId": "010000000e5ee800"
}
{
"titleId": "010000000eef0000",
"otherApplicationId": "010000000eef0800"
}
{
"titleId": "010000000eef0800",
"otherApplicationId": "010000000eef0000"
}
{
"titleId": "0100000011d90000",
"otherApplicationId": "0100000011d90800"
}
{
"titleId": "0100000011d90800",
"otherApplicationId": "0100000011d90000"
}
{
"titleId": "0100000011d90800",
"otherApplicationId": "0100000011d90000"
}
Please note, this is only a snippet of the files, feel free to run the bash script to get a more accurate file.
All the "id" in the 'region_file_id' equal to a "titleId" somewhere in 'cnmts_titleId_otherApplicationId' (the reverse is not true though as it included id from different regions). I'm trying to grab the "otherApplicationId" values for each "id" in 'region_file_id' by cross referencing them and creating a json like: (repeated for every 'id' in region_file_id)
{
"id": "111000"
"titleId": "111000" (this one is optional as it is a duplicate from 'id')
"otherApplicationId": 111800"
}
I've tried searching and tried different snippets:
jq -s '.[0] * .[1]' $script_directory/region_file_id.txt cnmts_titleId_otherApplicationId.txt (only returned 1 object for some reason)
jq -s '{ .[0] as $u | .[1] | select(.id == $u.titleId) |= $u }' $script_directory/region_file_id.txt cnmts_titleId_otherApplicationId.txt
Update:
As peak pointed out:
jq -n --slurpfile ids region_file_id.txt '
INDEX(inputs; .titleId | ascii_upcase) as $dict
| $ids[].id as $id
| {$id} + $dict[$id]
' cnmts_titleId_otherApplicationId.txt > merged.txt
This seems to work until I hit "null" values where my file doesn't include the correct id, which is another problem all together!

All the "id" in the 'region_file_id' equal to a "titleId" somewhere in 'cnmts_titleId_otherApplicationId'
If that really is the case, then you could proceed as follows:
< cnmts_titleId_otherApplicationId.txt jq -n --slurpfile ids region_file_id.txt '
INDEX(inputs; .titleId) as $dict
| $ids[].id as $id
| {$id} + $dict[$id]
'

Related

Retrieving required set-off json objects using through "Jq" method

I have below mentioned Json file. I wanted to do the below checks.
Get 1st 5 objects from the whole list and save them in a separate file (i.e FirstTopObject.json)
Get another set-off 5 objects and store them into another file (i.e SecondTopObject.json)
Get the last 5 objects and store them into another file (i.e ThirdTopObject.json)
Basically, wanted to split the Objects based on the Numbers and save them into a separate file.
Is there any solution is available to achieve through the “jq” function/method?
Input File:
{
"storeId": "0001"
}
{
"storeId": "0002"
}
{
"storeId": "0003"
}
{
"storeId": "0004"
}
{
"storeId": "0005"
}
{
"storeId": "0006"
}
{
"storeId": "0007"
}
{
"storeId": "0008"
}
{
"storeId": "0009"
}
{
"storeId": "00010"
}
{
"storeId": "00011"
}
{
"storeId": "00012"
}
{
"storeId": "00013"
}
{
"storeId": "00014"
}
{
"storeId": "00015"
}
enter code here
enter code here
Expecting output:
FirstTopObject.json should have the below set.
{
"storeId": "0001"
}
{
"storeId": "0002"
}
{
"storeId": "0003"
}
{
"storeId": "0004"
}
{
"storeId": "0005"
}
SecondTopObject.json - shold contain below setoff objects.
{
"storeId": "0006"
}
{
"storeId": "0007"
}
{
"storeId": "0008"
}
{
"storeId": "0009"
}
{
"storeId": "00010"
}
Like wise for other set.
It Would be more helpful if some help me.
Thanks in advance!
You could use JQ to process the input file - reformat it using -c (compact) and then use standard unix tools to split the files, i.e.
cat input | jq --slurp -c .[] | head -5 | jq . > FirstTopObject.json
cat input | jq --slurp -c .[] | sed '6,10!d' | jq . > SecondTopObject.json
cat input | jq --slurp -c .[] | tail -5 | jq . > ThirdTopObject.json
You can use a combination of the jq functions to_entries and group_by for this, along with a little bash.
This snippet will create 25 strings ("line 0", "line 1", etc.), group them by 5s, and write them into files 0.json, 1.json, etc. Everything before to_entries can be replaced with any list. In your case, you can use the slurp flag -s to get all your JSON objects in your input file into a list.
FILE_NUM=0
jq -nc '
# create input
["line " + (range(25) | tostring)] |
# process input
to_entries | group_by(.key / 5 | floor)[] | map(.value)
' | while read LINE; do echo "$LINE" > "/tmp/$((FILE_NUM++)).json"; done
There is no need to slurp the input file! Even if the output must be pretty-printed, there is no need for more than four invocations of jq altogether.
Handling a small input file
If the input is not so big, you can simply run
jq -c . input
directing the output to a temporary file, and then split that file into three using whichever standard command-line tools you find most convenient (a single invocation of awk might be worth considering ...).
Handling a very large input file
If the input file is very large, then it would make sense to use jq just to copy the (15) items of interest into a temporary file, and then process that file:
Step 1
Invoke the following program with jq -cn:
def echo($n1; $n2; $last):
foreach (inputs,null) as $in ({ix:-1, first:[], second:[], last:[]};
if $in then
.ix += 1
| if .ix < $n1
then .first += [$in]
elif .ix < $n1+n2 then .second += [$in]
else .last += [$in]
| .last = (.last[ - $last: ])
end
else . end;
if $in == null then del(.ix) else empty end
)
| .[];
echo(5;5;5)
(This program is somewhat complex because it makes no assumptions about the relative sizes of the three blocks.)
Step 2
Assuming the output from Step 1 is in input.tmp, then run:
sed -n 1p input.tmp | jq .[] > FirstTopObject.json
sed -n 2p input.tmp | jq .[] > SecondTopObject.json
sed -n 3p input.tmp | jq .[] > ThirdTopObject.json

JQ : Parse specific output (get IP) from JSON file

I want to get the IPs that has 'server.sh' value. My current script gets all the IPs
test.json
{
"nodes": {
"test1.local": {
":ip": "192.168.56.30",
":server": "server.sh",
":client": "client.sh"
},
"test2.local": {
":ip": "192.168.56.31",
":server": "server.sh",
":client": "client.sh"
},
"test3.local": {
":ip": "192.168.56.32",
":client": "client.sh"
}
}
}
test.sh
ips=`jq -c '.nodes | to_entries | map(.value.":ip")| map_values(.+":4648")' test.json`
echo $ips
["192.168.56.30:4648","192.168.56.31:4648","192.168.56.32:4648"]
Is it ok for your task?
jq '.nodes|.[]|select(.":server"=="server.sh")|.":ip"+":4648"' test.json
"192.168.56.30:4648"
"192.168.56.31:4648"

Print key if any nested value matches a set value

This is best explained with expected input and output.
Given this input:
{
"27852380038": {
"compute_id": 34234234,
"to_compute": [
{
"asset_id": 304221854,
"new_scheme": "mynewscheme",
"original_host": "oldscheme1234.location.com"
},
{
"asset_id": 12123121,
"new_scheme": "myotherscheme",
"original_host": "olderscheme1234.location.com"
}
]
},
"31352333022": {
"compute_id": 43888877,
"to_compute": [
{
"asset_id": 404221555,
"new_scheme": "mynewscheme",
"original_host": "oldscheme1234.location.com"
},
{
"asset_id": 52123444,
"new_scheme": "myotherscheme",
"original_host": "olderscheme1234.location.com"
}
]
}
}
And the asset_id that I'm searching for, 12123121, the output should be:
27852380038
So I want the top level keys where any of the asset_ids in to_compute match my input asset_id.
I haven't seen any jq example so far that combines nested access with an any test / if else.
The task can be accomplished without using environment variables, e.g.
< input.json jq -r --argjson ASSET_ID 12123121 '
to_entries[]
| {key, asset_id: .value.to_compute[].asset_id}
| select(.asset_id==$ASSET_ID)
| .key'
or more efficiently, using the filter:
to_entries[]
| select( any( .value.to_compute[]; .asset_id==$ASSET_ID) )
| .key
With some help from a coworker I was able to figure it out:
$ export ASSET_ID=12123121
$ cat input.json | jq -r "to_entries[] | .value.to_compute[] + {job: .key} | select(.asset_id==$ASSET_ID) | .job"
27852380038

Building new JSON with JQ and bash

I am trying to create JSON from scratch using bash.
The final structure needs to be like:
{
"hosts": {
"a_hostname" : {
"ips" : [
1,
2,
3
]
},
{...}
}
}
First I'm creating an input file with the format:
hostname ["1.1.1.1","2.2.2.2"]
host-name2 ["3.3.3.3","4.4.4.4"]
This is being created by:
for host in $( ansible -i hosts all --list-hosts ) ; \
do echo -n "${host} " ; \
ansible -i hosts $host -m setup | sed '1c {' | jq -r -c '.ansible_facts.ansible_all_ipv4_addresses' ; \
done > hosts.txt
The key point here is that the IP list/array, is coming from a JSON file and being extracted by jq. This extraction outputs an already valid / quoted JSON array, but as a string in a txt file.
Next I'm using jq to parse the whole text file into the desired JSON:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]):
{ips:.[1]}
}
] | add }
' < ~/hosts.txt
This is almost correct, everything except for the IPs value which is treated as a string and quoted leading to:
{
"hosts": {
"hostname1": {
"ips": "[\"1.1.1.1\",\"2.2.2.2\"]"
},
"host-name2": {
"ips": "[\"3.3.3.3\",\"4.4.4.4\"]"
}
}
}
I'm now stuck at this final hurdle - how to insert the IPs without causing them to be quoted again.
Edit - quoting solved by using {ips: .[1] | fromjson }} instead of {ips:.[1]}.
However this was completely negated by #CharlesDuffy's help suggesting converting to TSV.
Original Q body:
So far I've got to
jq -n {hosts:{}} | \
for host in $( ansible -i hosts all --list-hosts ) ; \
do jq ".hosts += {$host:{}}" | \
jq ".hosts.$host += {ips:[1,2,3]}" ; \
done ;
([1,2,3] is actually coming from a subshell but including it seemed unnecessary as that part works, and made it harder to read)
This sort of works, but there seems to be 2 problems.
1) Final output only has a single host in it containg data from the first host in the list (this persists even if the second problem is bypassed):
{
"hosts": {
"host_1": {
"ips": [
1,
2,
3
]
}
}
}
2) One of the hostnames has a - in it, which causes syntax and compiler errors from jq. I'm stuck going around quote hell trying to get it to be interpreted but also quoted. Help!
Thanks for any input.
Let's say your input format is:
host_1 1 2 3
host_2 2 3 4
host-with-dashes 3 4 5
host-with-no-addresses
...re: edit specifying a different format: Add #tsv onto the JQ command producing the existing format to generate this one instead.
If you want to transform that to the format in question, it might look like:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]): .[1:]}
] | add
}' <input.txt
Which yields as output:
{
"hosts": {
"host_1": [
"1",
"2",
"3"
],
"host_2": [
"2",
"3",
"4"
],
"host-with-dashes": [
"3",
"4",
"5"
],
"host-with-no-addresses": []
}
}

jq json move object to nested object and iterating over unknown names/numbers of objects

I've been looking over several examples of 'jq' parsing of json strings, all very helpful, but not conclusive for my particular problem
Here's my input json :
{
"outcome" : "TrBean",
"result" : {"TrAct" : {
"executiontime" : 16938570,
"invocations" : 133863,
"waittime" : 4981
}}
}
{
"outcome" : "WwwBean",
"result" : {}
}
{
"outcome": "CRFeatureBean",
"result": {
"CRChannels": {
"executiontime": 78127,
"invocations": 9983,
"waittime": 213
},
"getCRChannels": {
"executiontime": 98704,
"invocations": 10113,
"waittime": 212
},
"getCRToMigrate": {
"executiontime": 32,
"invocations": 4,
"waittime": 0
},
"getCRId": {
"executiontime": 28198633,
"invocations": 747336,
"waittime": 19856
}
}
}
I'm trying to feed graphite via collectd exec plugin (PUTVAL), so I need info in one line. I tried with ./jq '.result|to_entries[]|{"method:" .key, "inv": .value.invocations}|"PUTVAL \(.method)/invoke:\(.invokes)"' ... but I need to have "outcome" in every line too.
Also I do not know the amount, nor the names of the result-objects
So, I'd like to end up with :
TrBean_TrAct
WwwBean
CRFeatureBean_CRChannels
CRFeatureBean_getCRChannels
CRFeatureBean_getCRToMigrate
CrFeatureBean_getCRId
The following jq filter produces the desired output when jq is invoked with the -r command-line option:
((.result | keys_unsorted[]) // null) as $key
| if $key == null then .outcome
else [.outcome, $key] | join("_")
end
There are of course many possible variations, e.g.
((.result | keys_unsorted[]) // null) as $key
| [.outcome, ($key // empty)]
| join("_")
or if you want a short one-liner:
.outcome + ("_" + (.result | keys_unsorted[]) // null)
In any case, the key to simplicity here is to generate the keys of .result as a stream. Handling the "edge case" makes the solution slightly more complicated than it would otherwise be, i.e. .outcome + "_" + (.result | keys_unsorted[])
Example invocation: jq -r -f program.jq input.json