Bash JSON compare two list and delete id - json

I have a JSON endpoint which I can fetch value with curl and yml local file. I want to get the difference and delete it with id of name present on JSON endpoint.
JSON's endpoint
[
{
"hosts": [
"server1"
],
"id": "qz9o847b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "V1_toto_a"
},
{
"hosts": [
"server2"
],
"id": "a6aa847b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "V1_tata_b"
},
{
"hosts": [
"server3"
],
"id": "a6d9ee7b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "V1_titi_c"
}
]
files.yml
---
instance:
toto:
name: "toto"
tata:
name: "tata"
Between JSON's endpoint and local file, I want to delete it with id of tata, because it is the difference between the sources.
declare -a arr=(_a _b _c)
ar=$(cat files.yml | grep name | cut -d '"' -f2 | tr "\n" " ")
fileItemArray=($ar)
ARR_PRE=("${fileItemArray[#]/#/V1_}")
for i in "${arr[#]}"; do local_var+=("${ARR_PRE[#]/%/$i}"); done
remote_var=$(curl -sX GET "XXXX" | jq -r '.[].name | #sh' | tr -d \'\")
diff_=$(echo ${local_var[#]} ${remote_var[#]} | tr ' ' '\n' | sort | uniq -u)
output = titi
the code works, but I want to delete the titi with id dynamically
curl -X DELETE "XXXX" $id_titi
I am trying to delete with bash script, but I have no idea to continue...

Your endpoint is not proper JSON as it has
commas after the .name field but no following field
no commas between the elements of the top-level array
If this is not just a typo from pasting your example into this question, then you'd need to address this first before proceeding. This is how it should look like:
[
{
"hosts": [
"server1"
],
"id": "qz9o847b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "toto"
},
{
"hosts": [
"server2"
],
"id": "a6aa847b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "tata"
},
{
"hosts": [
"server3"
],
"id": "a6d9ee7b-f07c-49d1-b1fa-e5ed0b2f0519",
"name": "titi"
}
]
If your endpoint is proper JSON, try the following. It extracts the names from your .yml file (just as you do - there are plenty of more efficient and less error-prone ways but I'm trying to adapt your approach as much as possible) but instead of a Bash array generates a JSON array using jq which for Bash is a simple string. For your curl output it's basically the same thing, extracting a (JSON) array of names into a Bash string. Note that in both cases I use quotes <var>="$(…)" to capture strings that may include spaces (although I also use the -c option for jq to compact it's output to a single line). For the difference between the two, everything is taken over by jq as it can easily be fed with the JSON arrays as variables, perform the subtraction and output in your preferred format:
fromyml="$(cat files.yml | grep name | cut -d '"' -f2 | jq -Rnc '[inputs]')"
fromcurl="$(curl -sX GET "XXXX" | jq -c 'map(.name)')"
diff="$(jq -nr --argjson fromyml "$fromyml" --argjson fromcurl "$fromcurl" '
$fromcurl - $fromyml | .[]
')"
The Bash variable diff now contains a list of names only present in the curl output ($fromcurl - $fromyml), one per line (if, other than in your example, there happens to be more than one). If the curl output had duplicates, they will still be included (use $fromcurl - $fromyml | unique | .[] to get rid of them):
titi
As you can see, this solution has three calls to jq. I'll leave it to you to further reduce that number as it fits your general workflow (basically, it can be put together into one).

Getting the output of a program into a variable can be done using read.
perl -M5.010 -MYAML -MJSON::PP -e'
sub get_next_file { local $/; "".<> }
my %filter = map { $_->{name} => 1 } values %{ Load(get_next_file)->{instance} };
say for grep !$filter{$_}, map $_->{name}, #{ decode_json(get_next_file) };
' b.yaml a.json |
while IFS= read -r id; do
curl -X DELETE ..."$id"...
done
I used Perl here because what you had was no way to parse a YAML file. The snippet requires having installed the YAML Perl module.

Related

How do I print a specific value of an array given a condition in jq if there is no key specified

I am trying to output the value for .metadata.name followed by the student's name in .spec.template.spec.containers[].students[] array using the regex test() function in jq.
I am having trouble to retrieve the individual array value since there is no key specified for the students[] array.
For example, if I check the students[] array if it contains the word "Jeff", I would like the output to display as below:
student-deployment: Jefferson
What i have tried:
I've tried the command below which somewhat works but I am not sure how to get only the "Jefferson" value. The command below would print out all of the students[] array values which is not what I want. I am using Powershell to run the command below.
kubectl get deployments -o json | jq -r '.items[] | select(.spec.template.spec.containers[].students[]?|test("\"^Jeff.\"")) | .metadata.name, "\":\t\"", .spec.template.spec.containers[].students'
Is there a way to print a specific value of an array given a condition in jq if there is no key specified? Also, would the solution work if there are multiple deployments?
The deployment template below is in json and I shortened it to only the relevant parts.
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "student-deployment",
"namespace": "default"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"students": [
"Alice",
"Bob",
"Peter",
"Sally",
"Jefferson"
]
}
]
}
}
}
}
]
}
For this approch, we introduce a variable $pattern. You may set it with --arg pattern to your regex, e.g. "Jeff" or "^Al" or "e$" to have the student list filtered by test, or leave it empty to see all students.
Now, we iterate over all .item[] elements (i.e. over "all deployments"). For each found, we output the content of .metadata.name followed by a literal colon and a space. Then we iterate again over all .spec.template.spec.containers[].students[], perform the pattern test and concatenate the outcome.
To print out raw strings instead of JSON, we use the -r option when calling jq.
kubectl get deployments -o json \
| jq --arg pattern "Jeff" -r '
.items[]
| .metadata.name + ": " + (
.spec.template.spec.containers[].students[]
| select(test($pattern))
)
'
To retrieve the "students" array(s) in the input, you could use this filter:
.items[]
| paths(objects) as $p
| getpath($p)
| select( objects | has("students") )
| .students
You can then add additional filters to select the particular student(s) of interest, e.g.
| .[]
| select(test("Jeff"))
And then add any postprocessing filters, e.g.
| "student-deployment: \(.)"
Of course you can obtain the students array in numerous other ways.

How to iterate a JSON array of objects with jq and grab multiple variables from each object in each loop

I need to grab variables from JSON properties.
The JSON array looks like this (GitHub API for repository tags), which I obtain from a curl request.
[
{
"name": "my-tag-name",
"zipball_url": "https://api.github.com/repos/path-to-my-tag-name",
"tarball_url": "https://api.github.com/repos/path-to-my-tag-name-tarball",
"commit": {
"sha": "commit-sha",
"url": "https://api.github.com/repos/path-to-my-commit-sha"
},
"node_id": "node-id"
},
{
"name": "another-tag-name",
"zipball_url": "https://api.github.com/repos/path-to-my-tag-name",
"tarball_url": "https://api.github.com/repos/path-to-my-tag-name-tarball",
"commit": {
"sha": "commit-sha",
"url": "https://api.github.com/repos/path-to-my-commit-sha"
},
"node_id": "node-id"
},
]
In my actual JSON there are 100s of objects like these.
While I loop each one of these I need to grab the name and the commit URL, then perform more operations with these two variables before I get to the next object and repeat.
I tried (with and without -r)
tags=$(curl -s -u "${GITHUB_USERNAME}:${GITHUB_TOKEN}" -H "Accept: application/vnd.github.v3+json" "https://api.github.com/repos/path-to-my-repository/tags?per_page=100&page=${page}")
for row in $(jq -r '.[]' <<< "$tags"); do
tag=$(jq -r '.name' <<< "$row")
# I have also tried with the syntax:
url=$(echo "${row}" | jq -r '.commit.url')
# do stuff with $tag and $url...
done
But I get errors like:
parse error: Unfinished JSON term at EOF at line 2, column 0 jq: error
(at :1): Cannot index string with string "name" } parse error:
Unmatched '}' at line 1, column 1
And from the terminal output it appears that it is trying to parse $row in a strange way, trying to grab .name from every substring? Not sure.
I am assuming the output from $(jq '.[]' <<< "$tags") could be valid JSON, from which I could again use jq to grab the object properties I need, but maybe that is not the case? If I output ${row} it does look like valid JSON to me, and I tried pasting the results in a JSON validator, everything seems to check out...
How do I grab the ".name" and ".commit.url" for each of these object before I move onto the next one?
Thanks
It would be better to avoid calling jq more than once. Consider, for example:
while read -r name ; do
read -r url
echo "$name" "$url"
done < <( curl .... | jq -r '.[] | .name, .commit.url' )
where curl .... signifies the relevant invocation of curl.

Building new JSON with JQ and bash

I am trying to create JSON from scratch using bash.
The final structure needs to be like:
{
"hosts": {
"a_hostname" : {
"ips" : [
1,
2,
3
]
},
{...}
}
}
First I'm creating an input file with the format:
hostname ["1.1.1.1","2.2.2.2"]
host-name2 ["3.3.3.3","4.4.4.4"]
This is being created by:
for host in $( ansible -i hosts all --list-hosts ) ; \
do echo -n "${host} " ; \
ansible -i hosts $host -m setup | sed '1c {' | jq -r -c '.ansible_facts.ansible_all_ipv4_addresses' ; \
done > hosts.txt
The key point here is that the IP list/array, is coming from a JSON file and being extracted by jq. This extraction outputs an already valid / quoted JSON array, but as a string in a txt file.
Next I'm using jq to parse the whole text file into the desired JSON:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]):
{ips:.[1]}
}
] | add }
' < ~/hosts.txt
This is almost correct, everything except for the IPs value which is treated as a string and quoted leading to:
{
"hosts": {
"hostname1": {
"ips": "[\"1.1.1.1\",\"2.2.2.2\"]"
},
"host-name2": {
"ips": "[\"3.3.3.3\",\"4.4.4.4\"]"
}
}
}
I'm now stuck at this final hurdle - how to insert the IPs without causing them to be quoted again.
Edit - quoting solved by using {ips: .[1] | fromjson }} instead of {ips:.[1]}.
However this was completely negated by #CharlesDuffy's help suggesting converting to TSV.
Original Q body:
So far I've got to
jq -n {hosts:{}} | \
for host in $( ansible -i hosts all --list-hosts ) ; \
do jq ".hosts += {$host:{}}" | \
jq ".hosts.$host += {ips:[1,2,3]}" ; \
done ;
([1,2,3] is actually coming from a subshell but including it seemed unnecessary as that part works, and made it harder to read)
This sort of works, but there seems to be 2 problems.
1) Final output only has a single host in it containg data from the first host in the list (this persists even if the second problem is bypassed):
{
"hosts": {
"host_1": {
"ips": [
1,
2,
3
]
}
}
}
2) One of the hostnames has a - in it, which causes syntax and compiler errors from jq. I'm stuck going around quote hell trying to get it to be interpreted but also quoted. Help!
Thanks for any input.
Let's say your input format is:
host_1 1 2 3
host_2 2 3 4
host-with-dashes 3 4 5
host-with-no-addresses
...re: edit specifying a different format: Add #tsv onto the JQ command producing the existing format to generate this one instead.
If you want to transform that to the format in question, it might look like:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]): .[1:]}
] | add
}' <input.txt
Which yields as output:
{
"hosts": {
"host_1": [
"1",
"2",
"3"
],
"host_2": [
"2",
"3",
"4"
],
"host-with-dashes": [
"3",
"4",
"5"
],
"host-with-no-addresses": []
}
}

Linux CLI - How to get substring from JSON jq + grep?

I need to pull a substring from JSON. In the JSON doc below, I need the end of the value of jq '.[].networkProfile.networkInterfaces[].id' In other words, I need just A10NICvw4konls2vfbw-data to pass to another command. I can't seem to figure out how to pull a substring using grep. I've seem regex examples out there but haven't been successful with them.
[
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Compute/virtualMachines/A10VNAvw4konls2vfbw",
"instanceView": null,
"licenseType": null,
"location": "centralus",
"name": "A10VNAvw4konls2vfbw",
"networkProfile": {
"networkInterfaces": [
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Network/networkInterfaces/A10NICvw4konls2vfbw-data",
"resourceGroup": "IPv6v2"
}
]
}
}
]
In your case, sub(".*/";"") will do the trick as * is greedy:
.[].networkProfile.networkInterfaces[].id | sub(".*/";"")
Try this:
jq -r '.[]|.networkProfile.networkInterfaces[].id | split("/") | last'
The -r tells JQ to print the output in "raw" form - in this case, that means no double-quotes around the string value.
As for the jq expression, after you access the id you want, piping it (still inside jq) through split("/") turns it into an array of the parts between slashes. Piping that through the last function (thanks, #Thor) returns just the last element of the array.
If you want to do it with grep here is one way:
jq -r '.[].networkProfile.networkInterfaces[].id' | grep -o '[^/]*$'
Output:
A10NICvw4konls2vfbw-data

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]