I write this post because since I use slurm, I have not been able to use ray correctly.
Whenever I use the commands :
ray.init
trainer = A3CTrainer(env = “my_env”) (I have registered my env on tune)
, the program crashes with the following message :
core_worker.cc:137: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
The program works fine on my computer, the problem appeared with the use of Slurm. I only ask slurm for one gpu.
Thank you for reading me and maybe answering.
Have a great day
Some precisions about the code
#Alex
I used the following code :
import ray
from ray.rllib.agents.a3c import A3CTrainer
import tensorflow as tf
from MM1c_queue_env import my_env #my_env is already registered in tune
ray.shutdown()
ray.init(ignore_reinit_error=True)
trainer = A3CTrainer(env = "my_env")
print("success")
Both lines with trainer and init cause the program to crash with the error mentionned in my previous comment. To launch the program with slurm, I use the following program :
#!/bin/bash
#SBATCH --job-name=rl_for_insensitive_policies
#SBATCH --time=0:05:00
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu
module load anaconda3/2020.02/gcc-9.2.0
python test.py
Limit the number of CPUs
Ray will launch as many worker processes as your execution node has CPUs (or CPU cores). If that's more than you reserved, slurm will start killing processes.
You can limit the number of worker processes as such:
import ray
ray.init(ignore_reinit_error=True, num_cpus=4)
print("success")
You can find the detailed instructions of running Ray with SLURM in the documentation. The below instruction is based on it.
I used the information in this link too.
You should launch a process for head and launch as many processes as worker nodes you have. Then, the worker nodes must be connected to the head node.
#!/bin/bash
#SBATCH -p gpu
#SBATCH -t 00:05:00
#SBATCH --job-name= 'rl_for_insensitive_policies'
--tasks-per-node must be one based on the documentation.
#SBATCH --nodes=2
#SBATCH --exclusive
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
After specifying some resources, load your environment
module load anaconda3/2020.02/gcc-9.2.0
Then, you need to obtain the head ip address.
Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-
address)
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
head_node_ip=${ADDR[2]}
else
head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi
port=6379
ip_head=$head_node_ip:$port
export ip_head
echo "IP Head: $ip_head"
redis_password=$(uuidgen)
echo "redis_password: "$redis_password
nodeManagerPort=6700
objectManagerPort=6701
rayClientServerPort=10001
redisShardPorts=6702
minWorkerPort=10002
maxWorkerPort=19999
The below code launches the head node.
echo "Starting HEAD at $head_node"
srun --nodes=1 --ntasks=1 -w "$head_node" \
ray start --head --node-ip-address="$head_node_ip" \
--port=$port \
--node-manager-port=$nodeManagerPort \
--object-manager-port=$objectManagerPort \
--ray-client-server-port=$rayClientServerPort \
--redis-shard-ports=$redisShardPorts \
--min-worker-port=$minWorkerPort \
--max-worker-port=$maxWorkerPort \
--redis-password=$redis_password \
--num-cpus "${SLURM_CPUS_PER_TASK}" \
--num-gpus "${SLURM_GPUS_PER_TASK}" \
--block &
sleep 10
number of nodes other than the head node
worker_num=$((SLURM_JOB_NUM_NODES - 1))
The below loop launches some workers (one worker for each node).
for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
echo "Starting WORKER $i at $node_i"
srun --nodes=1 --ntasks=1 -w "$node_i" \
ray start --address "$ip_head" \
--redis-password=$redis_password \
--num-cpus "${SLURM_CPUS_PER_TASK}" \
--num-gpus "${SLURM_GPUS_PER_TASK}" \
--block &
sleep 5
done
it is better to add some argeparse arguments to your code so that you can give it the specified resources and the redis-password.
python test.py --redis-password $redis_password --num-cpus
$SLURM_CPUS_PER_TASK --num-gpus $SLURM_GPUS_PER_TASK
if you get "unable to connect to GCS server" error , use the below values or use some new values. Two users cannot use same port.
port=6380
nodeManagerPort=6800
objectManagerPort=6801
rayClientServerPort=20001
redisShardPorts=6802
minWorkerPort=20002
maxWorkerPort=29999
in your test.py, add the arguments and initialize Ray
import ray
import argparse
parser = argparse.ArgumentParser(description="Script for training RLLIB
agents")
parser.add_argument("--num-cpus", type=int, default=0)
parser.add_argument("--num-gpus", type=int, default=0)
parser.add_argument("--redis-password", type=str, default=None)
args = parser.parse_args()
ray.init(_redis_password=args.redis_password, address=os.environ["ip_head"])
config["num_gpus"] = args.num_gpus
config["num_workers"] = args.num_cpus
Related
I'm using nextflow to analyse minION data. Blast+ terminates with error exit status (2), Command exit status:2 and Command output: (empty)
-HP-Z6-G4-Workstation:~/nextflow_pipelines/nf_pipeline/20221025_insect$ nextflow cat_working_nextflow.nf
N E X T F L O W ~ version 22.04.5
Launching `cat_working_nextflow.nf` [admiring_hopper] DSL1 - revision: 2916bc12af
executor > local (78)
[38/2d0584] process > concatinate (AIG363_pass_barcode01_0eb3c2c3_2.fastq) [100%] 38 of 38 ✔
[dd/3cabdf] process > fastqconvert (output.fastq) [100%] 38 of 38 ✔
[47/dab2cd] process > blast_raw (insect.fasta) [ 0%] 0 of 38
executor > local (78)
[38/2d0584] process > concatinate (AIG363_pass_barcode01_0eb3c2c3_2.fastq) [100%] 38 of 38 ✔
[dd/3cabdf] process > fastqconvert (output.fastq) [100%] 38 of 38 ✔
[47/dab2cd] process > blast_raw (insect.fasta) [ 2%] 1 of 37, failed: 1
Error executing process > 'blast_raw (insect.fasta)'
Caused by:
Process `blast_raw (insect.fasta)` terminated with an error exit status (2)
Command executed:
blastn -query insect.fasta -db /home/blast/nt_db_20221011/nt -outfmt 11 -out blastrawreads.asn -evalue 0.1 -numgnments 1
blast_formatter blastr-archive blastrawreads.asn awrea-outfmt 5 -out blastrawreads.xml
blast_formatter -archive blastrawreads.asn -outfmt "6 qaccver saccver pident length evalue bitscore stitle" -out blastrawreads_rt.tsv
sort -n -r -k 6 blastrawreads_unsort.tsv > blastrawreads.tsv
Command exit status:
2
Command output:
(empty)
Command error:
Warning: [blastn] Examining 5 or more matches is recommended
BLAST Database error: No alias or index file found for nucleotide database [/home/blast/nt_db_20221011/nt] in search path [/home/shaextflow_pipelines/nf_pipeline/20221025_insect/work/96/e885b7e53e1bcf30e33526265e9a3c::]
Work dir:
/home/nextflow_pipelines/nf_pipeline/20221025_insect/work/96/e885b7e53e1bcf30e33526265e9a3c
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
The nf file:
\#!/usr/bin/env nextflow
//data_location
params.outdir = './results'
params.in = "$PWD/\*.fastq"
dataset = Channel.fromPath(params.in)
params.db = "/home/blast/nt_db_20221011/nt"
process concatenate {
tag "$x"
publishDir "${params.outdir}", mode:'copy'
input:
path (x) from dataset
output:
path ("output.fastq") into cat_ch
script:
"""
cat $x > output.fastq
"""
}
process fastqconvert {
tag "$y"
publishDir "${params.outdir}", mode:'copy'
input:
path (y) from cat_ch
output:
path ("insect.fasta") into convert1_ch,convert2_ch,convert3_ch
script:
"""
seqtk seq -a $y > insect.fasta
"""
}
process blast_raw {
tag "$z"
publishDir "${params.outdir}", mode:'copy'
input:
path (z) from convert1_ch
output:
path ('blastrawreads.xml') into blastrawreads_xml_ch
script:
"""
blastn \
-query $z -db ${params.db} \
-outfmt 11 -out blastrawreads.asn \
-evalue 0.1 \
-num_alignments 1 \
blast_formatter \
-archive blastrawreads.asn \
-outfmt 5 -out blastrawreads.xml
blast_formatter \
-archive blastrawreads.asn \
-outfmt "6 qaccver saccver pident length evalue bitscore stitle" -out blastrawreads_unsort.tsv
sort -n -r -k 6 blastrawreads_unsort.tsv > blastrawreads.tsv
"""
}
I can see that the insect.fasta file has been produced and has the appropriate permissions and is located in the expected dir.
I used the following command to download the nt database
update_blastdb.pl --decompress nt --passive --source gcp
gcp is the google cloud in Australia
The nt database is ~26GiG in size.
I really need an excel, asn and fasta file from blast results for downstream analysis.
Any help would be much appreciated.
BLAST Database error: No alias or index file found for nucleotide
database [/home/blast/nt_db_20221011/nt]
I think you should be able to re-create the above error independently of Nextflow using:
blastdbcmd -db /home/blast/nt_db_20221011/nt -info
Note that the db argument must be a dbname, not a path. For /home/blast/nt_db_20221011/nt to work correctly, you should be able to list your db files using: ls /home/blast/nt_db_20221011/nt.*
Not sure if there's a typo in your question, but the size of the nt database is about an order of magnitude larger, at approximately 250G. I wonder if simply re-downloading the database fixes the problem? Note that you can get a list of BLAST databases (showing their sizes and dates last updated) using:
update_blastdb.pl --showall pretty --source gcp
Note also that DSL1 is now end-of-life1 and will be removed going forward. I strongly recommend migrating to using DSL2 syntax when you get a chance.
From the comments:
The problem is that when you use params to specify a path, the path or files specified will not be localized inside the process working directory when the task is run. What you want is just a second input (value) channel. For example, using DSL2 syntax:
params.db = "/home/blast/Geminiviridae_db_20221118/geminiviridae"
process blast_raw {
tag { query_fasta }
input:
path query_fasta
path db
output:
path "geminiviridae.xml"
"""
blastn \\
-query "${query_fasta}" \\
-db "${db}" \\
-max_target_seqs 10 \\
-outfmt 5 \\
-out "geminiviridae.xml"
"""
}
workflow {
db = file( params.db )
blast_raw( your_upstream_ch, db)
}
I'm monitoring a web page that displays the status of a few hundred items. The page looks like this:
{"arrisId":"a000098","status":"Running","startTime":"2018-05-10T08:02:19.563Z"},{"arrisId":"a000101","status":"Running","startTime":"2018-05-10T08:02:19.892Z"},{"arrisId":"a000107","status":"Running","startTime":"2018-05-10T08:02:28.556Z"},...
What I want to do is trigger when 1 of the things is "Not Running", but I would like to display only the item that is not working and not the entire page. Hope that makes sense. I could use web.page.regexp and send a message that something is not running, but if I use web.page.get, is there a way to configure a trigger to display the not running and the 25 or so characters in front of that?
I hope this question makes sense.
Your best course of action is to use Low Level Discovery.
Your LLD rule will run a script to ingest your main status page, then parse it and use the fields to create your items according to the "Item prototypes" you define.
The item prototype themselves will need a script as well to get their respective information (unless you are willing to use the Zabbix in beta)
I've done a simple setup, using mock json from here:
LLD Script: will parse the mock json and convert it into a Zabbix LLD compliant format:
import requests
import json
jsonSource = "https://jsonplaceholder.typicode.com/users"
lld = {}
data = []
lld['data'] = data
session = requests.Session()
response = session.get(jsonSource)
for jsonObject in response.json():
data.append ( {
'{#NAME}': jsonObject['name'],
'{#ID}': jsonObject['id'],
'{#URL}': jsonSource + '/' + str(jsonObject['id'])
} )
print json.dumps(lld)
Item GET Script: get a specific field of a specific item (will become obsolete with http agent item from Zabbix 4.0):
import requests
import json
import sys, argparse
parser = argparse.ArgumentParser()
parser.add_argument('-i', required=True, metavar='User ID')
parser.add_argument('-f', required=True, metavar='\"Requested JSON Field\"')
args = parser.parse_args()
jsonSource = "https://jsonplaceholder.typicode.com/users/" + args.i
session = requests.Session()
response = session.get(jsonSource)
print (response.json()[args.f])
Command line usage sample:
$ jsonLLD.py
{"data": [{"{#ID}": 1, "{#URL}": "https://jsonplaceholder.typicode.com/users/1", "{#NAME}": "Leanne Graham"}, {"{#ID}": 2, "{#URL}": "https://jsonplaceholder.typicode.com/users/2", "{#NAME}": "Ervin Howell"},
[cut]
$ jsonGet.py -i 10 -f phone
024-648-3804
$ jsonGet.py -i 10 -f name
Clementina DuBuque
Then you have to set it up into Zabbix:
create a new template
create a Discovery rule of "Zabbix agent" type and set it to run system.run[/usr/bin/jsonLLD.py] (mind the path!)
create an item prototype for each json field you want to work on (ie: Item name: {#NAME} telephone number, Item key system.run[/usr/bin/jsonGet.py -i {#ID} -f phone] )
create trigger prototypes accordingly
associate an host to the template
In your situation I'd use the Zabbix server itself as host, and install the scripts in its /usr/bin.
Watch the Zabbix Agent's log to see the discovery and item gathering process:
1972:20180519:121849.052 Executing command '/usr/bin/jsonGet.py -i 1 -f phone'
1971:20180519:121850.054 Executing command '/usr/bin/jsonGet.py -i 2 -f phone'
1974:20180519:121851.055 Executing command '/usr/bin/jsonGet.py -i 3 -f phone'
1974:20180519:121852.073 Executing command '/usr/bin/jsonGet.py -i 4 -f phone'
1974:20180519:121853.076 Executing command '/usr/bin/jsonGet.py -i 5 -f phone'
1973:20180519:121854.077 Executing command '/usr/bin/jsonGet.py -i 6 -f phone'
1972:20180519:121855.079 Executing command '/usr/bin/jsonGet.py -i 7 -f phone'
[cut]
I have been dealing with this problem for almost a month now, and I feel frustrated, Any help would be greatly appreciated.
I am trying to write a widget for my takenote command. The purpose of the widget is to feed all the markdown files in ~/notes folder into fzf so that the user can select one of them and starts editing it.
After the user types takenote and presses <tab> I expect the widget to run.
Here is the _takenote.zsh widget definition:
#compdef takenote
local file=$( find -L "$HOME/notes/" -print 2> /dev/null | fzf-tmux +m )
zle reset-prompt
compadd $file
return 1
Unfortunately, the above code doesn't work because of zle reset-prompt, if I remove it then the result would be like this:
And after selecting the file it would turn into:
Which as you see will corrupt the prompt and the command itself.
It appears to me that what I need to do is do a zle reset-prompt
before calling compadd but this can only work when I bind the function to a key otherwise, I will get the following error:
widgets can only be called when ZLE is active
I finally found a workaround for the issue. Although I am not satisfied with the strategy since it is not self contained in the widget itself, but it works. The solution involves trapping fzf-completion after it is invoked and calling zle reset-prompt.
For registering the trap add the following snippet to your .zshrc file (see Zsh menu completion causes problems after zle reset-prompt
):
TMOUT=1
TRAPALRM() {
if [[ "$WIDGET" =~ ^(complete-word|fzf-completion)$ ]]; then
# limit the reset-prompt functionality to the `takenote` script
if [[ "$LBUFFER" == "takenote "* ]]; then
zle reset-prompt
fi
fi
}
The _takenote widget:
#compdef takenote
local file=$( find -L "$HOME/notes/" -print 2> /dev/null | fzf-tmux +m )
compadd $file
return 0
p.s: I would still love to move the trap inside the widget, and avoid registering it in the init script (.zshrc)
After two days, I finally managed to find a hint on how to achieve it thanks to the excellent fzf-tab-completion project:
https://github.com/lincheney/fzf-tab-completion/blob/c91959d81320935ae88c090fedde8dcf1ca70a6f/zsh/fzf-zsh-completion.sh#L120
So actually, all that you need to do is:
#compdef takenote
local file=$( find -L "$HOME/notes/" -print 2> /dev/null | fzf-tmux +m )
compadd $file
TRAPEXIT() {
zle reset-prompt
}
return 0
And it finally works. Cheers!
I was getting the same error when trying to use bindkey for a widget to use vim to open the fzf selected file. Turns out I have to open the file in function1 and then have a function2 calling function1 and then reset-prompt to avoid this widgets can only be called when ZLE is active error. Like you said, it is really frustrating and took me almost a day to figure out!
Example code:
## use rg to get file list
export FZF_DEFAULT_COMMAND='rg --files --hidden'
## file open (function1)
__my-fo() (
setopt localoptions pipefail no_aliases 2> /dev/null
local file=$(eval "${FZF_DEFAULT_COMMAND}" | FZF_DEFAULT_OPTS="--height ${FZF_TMUX_HEIGHT:-40%} --reverse $FZF_DEFAULT_OPTS --preview 'bat --color=always --line-range :500 {}'" $(__fzfcmd) -m "$#" | while read item; do
echo -n "${(q)item}"
done)
local ret=$?
if [[ -n $file ]]; then
$EDITOR $file
fi
return $ret
)
## define zsh widget(function2)
__my-fo-widget(){
__my-fo
local ret=$?
zle reset-prompt
return $ret
}
zle -N __my-fo-widget
bindkey ^p __my-fo-widget
I cannot get the gnu parallel function to implement a custom function that I built.
My function is:
function run_cuffLinks() {
inputBAM="${HOME}/Analyses/P_miniata/CleanUpPipeline/TH_${1}/${1}.realigned.bam"
if [[ ! -f $inputBAM ]]; then echo -e "$inputBAM could not be found\nexit 1" ; fi
WORKING_DIR="${HOME}/data/CuffLinks/TH_$1"
if [[ ! -d $WORKING_DIR ]]; then mkdir -p $WORKING_DIR; fi
REF="${HOME}/ReferenceSequences/GATK_pmin.scaf.fa"
if [[ ! -f $REF ]]; then echo -e "$inputBAM could not be found\nexit 1" ; exit 1; fi
GTF_FILE="${HOME}/ReferenceSequences/genes.sorted.gff3"
if [[ ! -f $GTF_FILE ]]; then echo -e "$inputBAM could not be found\nexit 1" ; exit 1; fi
cufflinks \
--output-dir $WORKING_DIR \
--num-threads 2 \
--frag-len-mean 100 \
--GTF-guide $GTF_FILE \
--frag-bias-correct $REF \
-L "HH" \
$inputBAM ;
}
When I enter:
parallel --no-notice -j+2 run_cuffLinks {} ::: sample1 sample2 sample3
I get the output:
/bin/bash: run_cuffLinks: command not found
/bin/bash: run_cuffLinks: command not found
/bin/bash: run_cuffLinks: command not found
If I include a '$' symbol in front of the function name, I get:
/bin/bash: sample1: command not found
/bin/bash: sample2: command not found
/bin/bash: sample3: command not found
I have also tried using the -pipe --recend and --rrs options, but without a positive result.
Is GNU parallel not able to process user-defined functions?
You do not write whether you have walked through the tutorial (man parallel_tutorial). In that it shows that you must export -f the function, and since you do not write that, I believe you might have forgotten that:
export -f run_cuffLinks
parallel ...
Since version 20180522 you can also use env_parallel:
env_parallel --session
[define functions and variables here that you want parallel to see]
# Use env_parallel like you would parallel
env_parallel run_cuffLinks ...
PS: Use --bibtex once to avoid --no-notice in the future.
I am working with bash shell script. I need to execute an URL using shell script and then parse the json data coming from it.
This is my URL - http://localhost:8080/test_beat and the responses I can get after hitting the URL will be from either these two -
{"error": "error_message"}
{"success": "success_message"}
Below is my shell script which executes the URL using wget.
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
#grep $DATA for error and success key
Now I am not sure how to parse json response in $DATA and see whether the key is success or error. If the key is success, then I will print a message "success" and print $DATA value and exit out of the shell script with zero status code but if the key is error, then I will print "error" and print $DATA value and exit out of the shell script with non zero status code.
How can I parse json response and extract the key from it in shell script?
I don't want to install any library to do this since my JSON response is fixed and it will always be same as shown above so any simpler way is fine.
Update:-
Below is my final shell script -
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/tester)
echo $DATA
#grep $DATA for error and success key
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
case "$KEY" in
success)
exit 0
;;
error)
exit 1
;;
esac
Does this looks right?
If you are going to be using any more complicated json from the shell and you can install additional software, jq is going to be your friend.
So, for example, if you want to just extract the error message if present, then you can do this:
$ echo '{"error": "Some Error"}' | jq ".error"
"Some Error"
If you try this on the success case, it will do:
$echo '{"success": "Yay"}' | jq ".error"
null
The main advantage of the tool is simply that it fully understands json. So, no need for concern over corner cases and whatnot.
#!/bin/bash
IFS= read -d '' DATA < temp.txt ## Imitates your DATA=$(wget ...). Just replace it.
while IFS=\" read -ra LINE; do
case "${LINE[1]}" in
error)
# ERROR_MSG=${LINE[3]}
printf -v ERROR_MSG '%b' "${LINE[3]}"
;;
success)
# SUCCESS_MSG=${LINE[3]}
printf -v SUCCESS_MSG '%b' "${LINE[3]}"
;;
esac
done <<< "$DATA"
echo "$ERROR_MSG|$SUCCESS_MSG" ## Shows: error_message|success_message
* %b expands backslash escape sequences in the corresponding argument.
Update as I didn't really get the question at first. It should simply be:
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
[[ $KEY == success ]] ## Gives $? = 0 if true or else 1 if false.
And you can examine it further:
case "$KEY" in
success)
echo "Success message: $MESSAGE"
exit 0
;;
error)
echo "Error message: $MESSAGE"
exit 1
;;
esac
Of course similar obvious tests can be done with it:
if [[ $KEY == success ]]; then
echo "It was successful."
else
echo "It wasn't."
fi
From your last comment it can be simply done as
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
echo "$DATA" ## Your really need to show $DATA and not $MESSAGE right?
[[ $KEY == success ]]
exit ## Exits with code based from current $?. Not necessary if you're on the last line of the script.
You probably already have python installed, which has json parsing in the standard library. Python is not a great language for one-liners in shell scripts, but here is one way to use it:
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
if python -c '
import json, sys
exit(1 if "error" in json.loads(sys.stdin.read()) else 0)' <<<"$DATA"
then
echo "SUCCESS: $DATA"
else
echo "ERROR: $DATA"
exit 1
fi
Given:
that you don't want to use JSON libraries.
and that the response you're parsing is simple and the only thing you care about is the presence of substring "success", I suggest the following simplification:
#!/bin/bash
wget -O - -q -t 1 http://localhost:8080/tester | grep -F -q '"success"'
exit $?
-F tells grep to search for a fixed (literal) string.
-q tells grep to produce no output and instead only reflect via its exit code whether a match was found or not.
exit $? simply exits with grep's exit code ($? is a special variable that reflects the most recently executed command's exit code).
Note that if you all you care about is whether wget's output contains "success", the above pipeline will do - no need to capture wget's output in an aux. variable.