Compare two JSON arrays and iterate over the remaining items - json

I have two arrays with numbers that are already stored in variables:
$SLOT_IDS = [1,2,3,4,5]
$PR_IDS = [3,4]
I would like to find which numbers are in array 1 but not array 2. So in this case it would be
$OBSOLETE_SLOT_IDS = [1,2,5]
and then I would like to run a command foreach of those numbers and insert it into a placeholder:
So this command:
az webapp deployment slot delete -g group --name webapp --slot pr-<PLACEHOLDER>
Should be run three times:
az webapp deployment slot delete -g group --name webapp --slot pr-1
az webapp deployment slot delete -g group --name webapp --slot pr-2
az webapp deployment slot delete -g group --name webapp --slot pr-5
I know that should look something like this (it is required that it is inline):
for i in $OBSOLETE_SLOT_IDS; do az webapp deployment slot delete -g group --name webapp --slot pr-$i; done
So my questions:
How can I calculte $OBSOLETE_SLOT_IDS from the other two variables with an inline command
What is the correct version of the for loop
comment: seems that the variables do not contain actual arrays. They are basically the return values of some curl calls that I stored in variables:

A shorter approach that uses jq to get the difference of the two arrays:
#!/usr/bin/env bash
slot_ids="[1,2,3,4,5]"
pr_ids="[3,4]"
while read -r id; do
az webapp deployment slot delete -g group --name webapp --slot "pr-$id"
done < <(jq -n --argjson a "$slot_ids" --argjson b "$pr_ids" '$a - $b | .[]')

jq -r '.[]' will transform your array to a stream with one number per line -- which is the format that standard UNIX tools expect to work with.
Once we have the numbers in a sorted form, we can use comm to compare the two streams. -3 tells comm to ignore contents present in both streams, and -2 tells it to ignore content present only in the second stream, so comm -23 prints only files unique to the first stream.
Using readarray (added in bash 4.0) then lets us read that content into an array, which we can iterate over in our for loop.
#!/usr/bin/env bash
slot_ids='[1,2,3,4,5]'
pr_ids='[3,4]'
readarray -t filtered_slot_ids <(
comm -23 \
<(jq -r '.[]' <<<"$slot_ids" | sort) \
<(jq -r '.[]' <<<"$pr_ids" | sort))
for i in "${filtered_slot_ids[#]}"; do
az webapp deployment slot delete -g group --name webapp --slot "pr-$i"
done

Related

Batch Processing Curl API Requests in Bash?

Need to query an API endpoint for specific parameters, but there's a parameter limit of 20.
params are gathered into an array & stored in a JSON file, and ref'd in a variable tacked onto the end of my curl command, which generates the full curl API request.
curl -s -g GET '/api/endpoint?parameters='$myparams
eg.
curl -s -g GET '/api/endpoint?parameters=["1","2","3","etc"]'
This works fine when the params json is small and below the parameter limit per request. Only problem is params list fluctuates but is many times larger than the request limit.
My normal thinking would be to iterate through the param lines, but that would create many requests and probably block me too.
What would a good approach be to parse the parameter array json and generate curl API requests respectful of the parameter limit, with the minimum requests? Say its 115 params now, so that'd create 5 api requests of 20 params tacked on & 1 of 15..
You can chunk the array with undocumented _nwise function and then use that, e.g.:
<<JSON jq -r '_nwise(3) | "/api/endpoint?parameters=\(.)"'
["1","2","3","4","5","6","7","8"]
JSON
Output:
/api/endpoint?parameters=["1","2","3"]
/api/endpoint?parameters=["4","5","6"]
/api/endpoint?parameters=["7","8"]
This will generate the URLs for your curl calls, which you can then save in a file or consume directly:
<input.json jq -r ... | while read -r url; do curl -s -g -XGET "$url"; done
Or generate the query string only and use it in your curl call (pay attention to proper escaping/quoting):
<input.json jq -c '_nwise(3)' | while read -r qs; do curl -s -g -XGET "/api/endpoint?parameters=$qs"; done
Depending on your input format and requirements regarding robustness, you might not need jq at all; sed and paste can do the trick:
<<IN sed 's/\\/&&/g;s/"/\\"/g' | sed 's/^/"/;s/$/"/' | paste -sd ',,\n' | while read -r items; do curl -s -g -XGET "/api/endpoint?parameters=[$items]" done;
1
2
3
4
5
6
7
8
IN
Output:
curl -s -g -XGET /api/endpoint?parameters=["1","2","3"]
curl -s -g -XGET /api/endpoint?parameters=["4","5","6"]
curl -s -g -XGET /api/endpoint?parameters=["7","8"]
Explanation:
sed 's/\\/&&/g;s/"/\\"/g': replace \ with \\ and " with \".
sed 's/^/"/;s/$/"/': wrap each line/item in quotes
paste -sd ',,\n': take 3 lines and join them by a comma (repeat the comma character as many times as you need items minus 1)
while read -r items; do curl -s -g -XGET "/api/endpoint?parameters=[$items]"; done;: read generated items, wrap them in brackets and run curl

shell script: use pipelines instead of files for batch processing

I'm using these two commands in order to process a huge single sequence of json objects:
$ jq -c '.[]' csvjson.json | split -l 25 - splitted
Above command, creates several splitted-* files, containing 25 lines each one.
$ jq --slurp 'map({PutRequest: {Item: map_values({S: .})}})' splitted-n > output-n.json
Is there any way to pipeline above two commands?
Is there any way to pipeline above two commands?
We can make use of the split --filter option:
jq -c '.[]' csvjson.json |
split -l25 --filter='jq --slurp "map({PutRequest: {Item: map_values({S: .})}})" >$FILE.json' - output

Shell command - How to run?

When I put a command on terminal, it works just fine, but when I put the same command in a .sh script and then run it, it doesn't give any output. What might be the reason for this?
The command:
IFS=$'\t'; while read -r k v; do
export "$k=\"$v\""
This is kind of expected since export sets the environment variable for that particular shell.
Docs -
export command is used to export a variable or function to the
environment of all the child processes running in the current shell.
export -f functionname # exports a function in the current shell. It
exports a variable or function with a value.
So when you create a sh script it runs the specified commands into a different shell which terminates once the script exits.
It works with the sh script too -
data.sh
#!/bin/bash
IFS=$'\t'; while read -r k v; do
export "$k=\"$v\""
echo $HELLO1
echo $SAMPLEKEY
done < <(jq -r '.data | to_entries[] | [(.key|ascii_upcase), .value] | #tsv' data.json)
Output -
$ ./data.sh
"world1"
"world1"
"samplevalue"
Which suggests that your variables are getting exported but for that particular shell env.
In case you want to make them persistent, try putting scripts or exporting them via ~/.bashrc OR ~/.profile.
Once you put them in ~/.bashrc OR ~/.profile, you will find the output something as below -
I used ~/.bash_profile on my MAC OS -
Last login: Thu Jan 25 15:15:42 on ttys006
"world1"
"world1"
"samplevalue"
viveky4d4v#020:~$ echo $SAMPLEKEY
"samplevalue"
viveky4d4v#020:~$ echo $HELLO1
"world1"
viveky4d4v#020:~$
Which clarifies that your env variables will get exported whenever you open a new shell, the logic for this lies in .bashrc (https://unix.stackexchange.com/questions/129143/what-is-the-purpose-of-bashrc-and-how-does-it-work)
Put your script as it is ~/.bashrc at the end -
IFS=$'\t'; while read -r k v; do
export "$k=\"$v\""
echo $HELLO1
echo $SAMPLEKEY
done < <(jq -r '.data | to_entries[] | [(.key|ascii_upcase), .value] | #tsv' data.json)
You need to make sure that data.json stays in user's home directory.
Basically: A child process can't change the environment of it's parent process.
You need to source the script instead of executing it:
source your_script.sh
source runs the script in the current shell which makes it possible to modify the environment.
Alternatively you can create a function in your shell startup files (e.g. ~/.bashrc):
my_function() {
IFS=$'\t'; while read -r k v; do
export "$k=\"$v\""
done < <(jq -r '.data | to_entries[] | [(.key|ascii_upcase), .value] | #tsv' /path/to/data.json)
}
After you've started a new shell you can run
my_function

Setting the SGE cluster job name with Snakemake while using DRMAA?

Problem
I'm not sure if the -N argument is being saved. SGE Cluster. Everything works except for the -N argument.
Snakemake requires a valid -N call
It doesn't set the job name properly.
It always reverts to the default name. This is my call, which has the same results, with or without the -N argument.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan"
The only way I have found to influence the job name is to use --jobname.
snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan" --jobname "{rule}.{wildcards}.{jobid}"
Background
I've tried a variety of things. Usually I actually just use a cluster configuration file, but that isn't working either, so that's why in the code above, I ditched the file system to make sure it's the '-N' command which isn't being saved.
My usual call is:
snakemake --drmaa "{cluster.clusterSpec}" --jobs 10 --cluster-config input/config.json
1) If I use '-n' instead of '-N', I receive a workflow error:
drmaa.errors.DeniedByDrmException: code 17: ERROR! invalid option argument "-n"
2) If I use '-N', but give it an incorrect wildcard, say {rule.name}:
AttributeError: 'str' object has no attribute 'name'
3) I cannot use both --drmaa AND --cluster:
snakemake: error: argument --cluster/-c: not allowed with argument --drmaa
4) If I specify the {jobid} in the config.json file, then Snakemake doesn't know what to do with it.
RuleException in line 13 of /extscratch/clc/projects/tboyarski/gitRepo-LCR-BCCRC/Snakemake/modules/mpileup/mpileupSPLIT:
NameError: The name 'jobid' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
EDIT Added #5 w/ Solution
5) I can set the job name using the config.json and just concatenate the jobid on afterwards in my snakemake call. That way I have a generic snakemake call (--jobname "{cluster.jobName}.{jobid}"), and a highly configurable and specific job name ({rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}) which results in:
mpileupSPLIT-Pfeiffer_chr19.1.e7152298
The 1 is the Snakemake jobid according to the DAG.
The 7152298 is my cluster's job number.
2nd EDIT - Just tried v3.12, same thing. Concatenation must occur in snakemake call.
Alternative solution
I would also be okay with something like this:
snakemake --drmaa "{cluster.clusterSpec}" --jobname "{cluster.jobName}" --jobs 10 --cluster-config input/config.json
With my cluster file like this:
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -n {rule}.{wildcards}.varScan",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}.{jobid}"
}
Documentation Reviewed
I've read the documentation but I was unable to figure it out.
http://snakemake.readthedocs.io/en/latest/executable.html?-highlight=job_name#cluster-execution
http://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#snakefiles-cluster-configuration
https://groups.google.com/forum/#!topic/snakemake/whwYODy_I74
System
Snakemake v3.10.2 (Will try newest conda version tomorrow)
Red Hat Enterprise Linux Server release 5.4
SGE Cluster
Solution
Use '--jobname' in your snakemake call instead of '-N' in your qsub parameter submission
Setup your cluster config file to have a targetable parameter for the jobname suffix. In this case these are the overrides for my Snakemake rule named "mpileupSPLIT":
"mpileupSPLIT": {
"clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1",
"jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}"
}
Utilize a generic Snakemake call which includes {jobid}. On a cluster (SGE), the 'jobid' variable contains both the Snakemake Job# and the Cluster Job#, both are valuable as the first corresponds to the Snakemake DAG and the later is for cluster logging. (E.g. --jobname "{cluster.jobName}.{jobid}")
EDIT Added solution to resolve post.

How to pass arguments from cmd to tcl script of ModelSim

I run Modelsim in the cmd from a python program.
I use the following code which call a tcl script which run the modelsim:
os.system("vsim -c -do top_tb_simulate_reg.tcl " )
The tcl script contain the following:
vsim -voptargs="+acc" +UVM_TESTNAME=test_name +UVM_MAX_QUIT_COUNT=1 +UVM_VERBOSITY=UVM_LOW \
-t 1ps -L unisims_verm -L generic_baseblocks_v2_1_0 -L axi_infrastructure_v1_1_0 \
-L dds_compiler_v6_0_12 -lib xil_defaultlib xil_defaultlib.girobo2_tb_top \
xil_defaultlib.glbl
I want that the value of the +UVM_TESTNAME will be an argument which I passed from the cmd when I execute:
os.system("vsim -c -do top_tb_simulate_reg.tcl " )
How can I do it?
I tried the following with no succees:
Python script:
os.system("vsim -c -do top_tb_simulate_reg.tcl axi_rd_only_test" )
Simulation file (tcl script)
vsim -voptargs="+acc" +UVM_TESTNAME=$argv +UVM_MAX_QUIT_COUNT=1 +UVM_VERBOSITY=UVM_LOW \
-t 1ps -L unisims_verm -L generic_baseblocks_v2_1_0 -L axi_infrastructure_v1_1_0 \
-L dds_compiler_v6_0_12 -lib xil_defaultlib xil_defaultlib.girobo2_tb_top \
xil_defaultlib.glbl
I got the following error:
# ** Error: (vsim-3170) Could not find 'C:/raft/raftortwo/girobo2/ver/sim/work.axi_rd_only_test'.
The problem is that the vsim binary is doing its own processing of the arguments, and that is interfering. While yes, you can probably find a way around this by reading the vsim documentation, the simplest way around this is to pass values via environment variables. They're inherited by a process from its parent process, and are fine for passing most things. (The exception are security tokens, which should always be passed in files with correctly-set permissions, rather than either environment variables or command-line arguments.)
In your python code:
# Store the value in the *inheritable* environment
os.environ["MY_TEST_CASE"] = "axi_rd_only_test"
# Do the call; the environment gets passed over behind the scenes
os.system("vsim -c -do top_tb_simulate_reg.tcl " )
In your tcl code:
# Read out of the inherited environment
set name $env(MY_TEST_CASE)
# Use it! (Could do this as one line, but that's hard to read)
vsim -voptargs="+acc" +UVM_TESTNAME=$name +UVM_MAX_QUIT_COUNT=1 +UVM_VERBOSITY=UVM_LOW \
-t 1ps -L unisims_verm -L generic_baseblocks_v2_1_0 -L axi_infrastructure_v1_1_0 \
-L dds_compiler_v6_0_12 -lib xil_defaultlib xil_defaultlib.girobo2_tb_top \
xil_defaultlib.glbl
Late to the party but I found a great workaround for your obstacle. The do command within Modelsim's TCL instance does accept parameters. See command reference.
vsim -c -do filename.tcl can't take parameters, but you can use vsim -c -do "do filename.tcl params".
In your case this translates to os.system('vsim -c -do "do top_tb_simulate_reg.tcl axi_rd_only_test"'). Your .tcl script will find the parameter passed through the variable $1.
I hope to helps anyone!