BLAST+ exits with error exit status (2) when using nextflow - blast

I'm using nextflow to analyse minION data. Blast+ terminates with error exit status (2), Command exit status:2 and Command output: (empty)
-HP-Z6-G4-Workstation:~/nextflow_pipelines/nf_pipeline/20221025_insect$ nextflow cat_working_nextflow.nf
N E X T F L O W ~ version 22.04.5
Launching `cat_working_nextflow.nf` [admiring_hopper] DSL1 - revision: 2916bc12af
executor > local (78)
[38/2d0584] process > concatinate (AIG363_pass_barcode01_0eb3c2c3_2.fastq) [100%] 38 of 38 ✔
[dd/3cabdf] process > fastqconvert (output.fastq) [100%] 38 of 38 ✔
[47/dab2cd] process > blast_raw (insect.fasta) [ 0%] 0 of 38
executor > local (78)
[38/2d0584] process > concatinate (AIG363_pass_barcode01_0eb3c2c3_2.fastq) [100%] 38 of 38 ✔
[dd/3cabdf] process > fastqconvert (output.fastq) [100%] 38 of 38 ✔
[47/dab2cd] process > blast_raw (insect.fasta) [ 2%] 1 of 37, failed: 1
Error executing process > 'blast_raw (insect.fasta)'
Caused by:
Process `blast_raw (insect.fasta)` terminated with an error exit status (2)
Command executed:
blastn -query insect.fasta -db /home/blast/nt_db_20221011/nt -outfmt 11 -out blastrawreads.asn -evalue 0.1 -numgnments 1
blast_formatter blastr-archive blastrawreads.asn awrea-outfmt 5 -out blastrawreads.xml
blast_formatter -archive blastrawreads.asn -outfmt "6 qaccver saccver pident length evalue bitscore stitle" -out blastrawreads_rt.tsv
sort -n -r -k 6 blastrawreads_unsort.tsv > blastrawreads.tsv
Command exit status:
2
Command output:
(empty)
Command error:
Warning: [blastn] Examining 5 or more matches is recommended
BLAST Database error: No alias or index file found for nucleotide database [/home/blast/nt_db_20221011/nt] in search path [/home/shaextflow_pipelines/nf_pipeline/20221025_insect/work/96/e885b7e53e1bcf30e33526265e9a3c::]
Work dir:
/home/nextflow_pipelines/nf_pipeline/20221025_insect/work/96/e885b7e53e1bcf30e33526265e9a3c
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
The nf file:
\#!/usr/bin/env nextflow
//data_location
params.outdir = './results'
params.in = "$PWD/\*.fastq"
dataset = Channel.fromPath(params.in)
params.db = "/home/blast/nt_db_20221011/nt"
process concatenate {
tag "$x"
publishDir "${params.outdir}", mode:'copy'
input:
path (x) from dataset
output:
path ("output.fastq") into cat_ch
script:
"""
cat $x > output.fastq
"""
}
process fastqconvert {
tag "$y"
publishDir "${params.outdir}", mode:'copy'
input:
path (y) from cat_ch
output:
path ("insect.fasta") into convert1_ch,convert2_ch,convert3_ch
script:
"""
seqtk seq -a $y > insect.fasta
"""
}
process blast_raw {
tag "$z"
publishDir "${params.outdir}", mode:'copy'
input:
path (z) from convert1_ch
output:
path ('blastrawreads.xml') into blastrawreads_xml_ch
script:
"""
blastn \
-query $z -db ${params.db} \
-outfmt 11 -out blastrawreads.asn \
-evalue 0.1 \
-num_alignments 1 \
blast_formatter \
-archive blastrawreads.asn \
-outfmt 5 -out blastrawreads.xml
blast_formatter \
-archive blastrawreads.asn \
-outfmt "6 qaccver saccver pident length evalue bitscore stitle" -out blastrawreads_unsort.tsv
sort -n -r -k 6 blastrawreads_unsort.tsv > blastrawreads.tsv
"""
}
I can see that the insect.fasta file has been produced and has the appropriate permissions and is located in the expected dir.
I used the following command to download the nt database
update_blastdb.pl --decompress nt --passive --source gcp
gcp is the google cloud in Australia
The nt database is ~26GiG in size.
I really need an excel, asn and fasta file from blast results for downstream analysis.
Any help would be much appreciated.

BLAST Database error: No alias or index file found for nucleotide
database [/home/blast/nt_db_20221011/nt]
I think you should be able to re-create the above error independently of Nextflow using:
blastdbcmd -db /home/blast/nt_db_20221011/nt -info
Note that the db argument must be a dbname, not a path. For /home/blast/nt_db_20221011/nt to work correctly, you should be able to list your db files using: ls /home/blast/nt_db_20221011/nt.*
Not sure if there's a typo in your question, but the size of the nt database is about an order of magnitude larger, at approximately 250G. I wonder if simply re-downloading the database fixes the problem? Note that you can get a list of BLAST databases (showing their sizes and dates last updated) using:
update_blastdb.pl --showall pretty --source gcp
Note also that DSL1 is now end-of-life1 and will be removed going forward. I strongly recommend migrating to using DSL2 syntax when you get a chance.
From the comments:
The problem is that when you use params to specify a path, the path or files specified will not be localized inside the process working directory when the task is run. What you want is just a second input (value) channel. For example, using DSL2 syntax:
params.db = "/home/blast/Geminiviridae_db_20221118/geminiviridae"
process blast_raw {
tag { query_fasta }
input:
path query_fasta
path db
output:
path "geminiviridae.xml"
"""
blastn \\
-query "${query_fasta}" \\
-db "${db}" \\
-max_target_seqs 10 \\
-outfmt 5 \\
-out "geminiviridae.xml"
"""
}
workflow {
db = file( params.db )
blast_raw( your_upstream_ch, db)
}

Related

YAML_FILE_ERROR: did not find expected key

I tried to run Buildspec using aws codeBuild and trying to generate process.json file on-fly using jq command. But it gives an error while executing and build goes failed..
build:
commands:
- cp $CODEBUILD_SRC_DIR/qe/performance/* apache-jmeter-5.2/bin/
- cd apache-jmeter-5.2/bin/
- DATE=`date "+%Y%m%d-%H-%M-%S"`
- aws s3 cp $DATE-Report s3://$JMeterScanResultBucket/${ProjectName}/$DATE --recursive
- jq -n --arg appname "$appname" '{apps: [ {project: wsg, issuetype: "Test Execution", summary: "Test Execution for junit Execution"}]}' > process.json
however, I have received following error: Line 20 goes to above "jq" command
DOWNLOAD_SOURCE
Failed
YAML_FILE_ERROR: did not find expected key at line 20
A colon plus a space (or newline) in YAML means it's a key-value pair in a mapping:
key: value
Your jq command contains several colons followed by spaces.
Since you want a single string, you must quote it.
There are several ways to do that in YAML.
Single or double quoting wouldn't be ideal here because the string contains both quote types.
A folded block scalar is probably the best solution here. Newlines will be folded together as spaces.
- >
jq -n --arg appname "$appname"
'{apps: [ {project: wsg, issuetype: "Test Execution",
summary: "Test Execution for junit Execution"}]}'
> process.json
An alternative would be the literal block scalar, where you have to escape the linebreak like in a shell script:
- |
jq -n --arg appname "$appname" \
'{apps: [ {project: wsg, issuetype: "Test Execution", \
summary: "Test Execution for junit Execution"}]}' \
> process.json

mongoexport - issue with JSON query (extended JSON - Invalid JSON input)

I have started learning MongoDB recently. Today the instructor taught us the mongoexport command. While practicing the same, I face a typical issue which none of the other batchmates including the instructor faced. I use MongoDB version 4.2.0 on my Windows 10 machine.
If I use mongoexport for my collection without any -q parameter to specify any filtering condition, it works fine.
mongoexport -d trainingdb -c employee -f empId,name,designation -o \mongoexport\all-employees.json
2019-09-17T18:00:30.300+0530 connected to: mongodb://localhost/
2019-09-17T18:00:30.314+0530 exported 3 records
However, whenever I specify the JSON query as -q (or --query) it gives an error as follows.
mongoexport -d trainingdb -c employee -f empId,name,designation -q {'designation':'Developer'} -o \mongoexport\developers.json
2019-09-17T18:01:45.381+0530 connected to: mongodb://localhost/
2019-09-17T18:01:45.390+0530 Failed: error parsing query as Extended JSON: invalid JSON input
The same error persists in all the different flavors I had attempted with for the query.
-q {'designation':'Developer'}
--query {'designation':'Developer'}
-q "{'designation':'Developer'}"
I had even attempted with a different query condition on the 'empId' as -q {'empId':'1001'} But no luck. I keep getting the same error.
As per one of the suggestions given in the StackOverflow website, I tried with the following option but getting a different error.
-q '{"designation":"Developer"}'
The error is : 'query '[39 123 101 109 112 73 100 58 49 48 48 49 125 39]' is not valid JSON: json: cannot unmarshal string into Go value of type map[string]interface {}'.
2019-09-17T20:24:58.878+0530 query '[39 123 101 109 112 73 100 58 49 48 48 49 125 39]' is not valid JSON: json: cannot unmarshal string into Go value of type map[string]interface {}
2019-09-17T20:24:58.882+0530 try 'mongoexport --help' for more information
I am really not sure what is missing here ? Tried with a bit of Googling and also gone through the official MongoDB documentation of the mongoexport - but no luck.
The employee collection in my system looks like the follows with 3 documents.
> db.employee.find().pretty()
{
"_id" : ObjectId("5d80d1ae0d4d526a42fd95ad"),
"empId" : 1001,
"name" : "Raghavan",
"designation" : "Developer"
}
{
"_id" : ObjectId("5d80d1b20d4d526a42fd95ae"),
"empId" : 1002,
"name" : "Kannan",
"designation" : "Architect"
}
{
"_id" : ObjectId("5d80d1b40d4d526a42fd95af"),
"empId" : 1003,
"name" : "Sathish",
"designation" : "Developer"
}
>
Update
As suggested by #NikosM, I have saved the query in a .json file (query.json) and tried the same mongoexport command with the new approach. Still, no luck. Same Marshal error.
cat query.json
{"designation":"Developer"}
mongoexport -d trainingdb -c employee -f empId,name,designation -q 'query.json' -o \mongoexport\developers.json
2019-09-17T21:16:32.849+0530 query '[39 113 117 101 114 121 46 106 115 111 110 39]' is not valid JSON: json: cannot unmarshal string into Go value of type map[string]interface {}
2019-09-17T21:16:32.852+0530 try 'mongoexport --help' for more information
Any help on this will be highly appreciated.
The following different approach made it work at last - where I had specified the JSON query with the double quotes escaped with the backslash : -q "{\"designation\":\"Developer\"}".
mongoexport -d trainingdb -c employee -f empId,name,designation -q "{\"designation\":\"Developer\"}" -o \mongoexport\developers.json
2019-09-17T21:33:01.642+0530 connected to: mongodb://localhost/
2019-09-17T21:33:01.658+0530 exported 2 records
cat developers.json
{"_id":{"$oid":"5d80d1ae0d4d526a42fd95ad"},"empId":1001.0,"name":"Raghavan","designation":"Developer"}
{"_id":{"$oid":"5d80d1b40d4d526a42fd95af"},"empId":1003.0,"name":"Sathish","designation":"Developer"}
Thank you very much #Caconde. Your suggestion helped.
But I am really not sure why this does not work in my machine alone and the reason for this tweak in the format of the query.
There is another approaches that I found out to work which were using the triple double-quote (""") for outside encasing.
mongoexport -d trainingdb -c employee -f empId,name,designation -q """ {"designation":"Developer"} """ -o \mongoexport\developers.json
The following different approach made it work at last - where I had specified the JSON query with the double quotes escaped with the backslash : -q "{"designation":"Developer"}".
for me it was
"{\"sensor_name\":\"Heat Recovery System Header Mass Flow\"}"
THIS ANSWER SOLVED MY ISSUE TYSM

Zabbix - triggering on text, displaying only part of the text

I'm monitoring a web page that displays the status of a few hundred items. The page looks like this:
{"arrisId":"a000098","status":"Running","startTime":"2018-05-10T08:02:19.563Z"},{"arrisId":"a000101","status":"Running","startTime":"2018-05-10T08:02:19.892Z"},{"arrisId":"a000107","status":"Running","startTime":"2018-05-10T08:02:28.556Z"},...
What I want to do is trigger when 1 of the things is "Not Running", but I would like to display only the item that is not working and not the entire page. Hope that makes sense. I could use web.page.regexp and send a message that something is not running, but if I use web.page.get, is there a way to configure a trigger to display the not running and the 25 or so characters in front of that?
I hope this question makes sense.
Your best course of action is to use Low Level Discovery.
Your LLD rule will run a script to ingest your main status page, then parse it and use the fields to create your items according to the "Item prototypes" you define.
The item prototype themselves will need a script as well to get their respective information (unless you are willing to use the Zabbix in beta)
I've done a simple setup, using mock json from here:
LLD Script: will parse the mock json and convert it into a Zabbix LLD compliant format:
import requests
import json
jsonSource = "https://jsonplaceholder.typicode.com/users"
lld = {}
data = []
lld['data'] = data
session = requests.Session()
response = session.get(jsonSource)
for jsonObject in response.json():
data.append ( {
'{#NAME}': jsonObject['name'],
'{#ID}': jsonObject['id'],
'{#URL}': jsonSource + '/' + str(jsonObject['id'])
} )
print json.dumps(lld)
Item GET Script: get a specific field of a specific item (will become obsolete with http agent item from Zabbix 4.0):
import requests
import json
import sys, argparse
parser = argparse.ArgumentParser()
parser.add_argument('-i', required=True, metavar='User ID')
parser.add_argument('-f', required=True, metavar='\"Requested JSON Field\"')
args = parser.parse_args()
jsonSource = "https://jsonplaceholder.typicode.com/users/" + args.i
session = requests.Session()
response = session.get(jsonSource)
print (response.json()[args.f])
Command line usage sample:
$ jsonLLD.py
{"data": [{"{#ID}": 1, "{#URL}": "https://jsonplaceholder.typicode.com/users/1", "{#NAME}": "Leanne Graham"}, {"{#ID}": 2, "{#URL}": "https://jsonplaceholder.typicode.com/users/2", "{#NAME}": "Ervin Howell"},
[cut]
$ jsonGet.py -i 10 -f phone
024-648-3804
$ jsonGet.py -i 10 -f name
Clementina DuBuque
Then you have to set it up into Zabbix:
create a new template
create a Discovery rule of "Zabbix agent" type and set it to run system.run[/usr/bin/jsonLLD.py] (mind the path!)
create an item prototype for each json field you want to work on (ie: Item name: {#NAME} telephone number, Item key system.run[/usr/bin/jsonGet.py -i {#ID} -f phone] )
create trigger prototypes accordingly
associate an host to the template
In your situation I'd use the Zabbix server itself as host, and install the scripts in its /usr/bin.
Watch the Zabbix Agent's log to see the discovery and item gathering process:
1972:20180519:121849.052 Executing command '/usr/bin/jsonGet.py -i 1 -f phone'
1971:20180519:121850.054 Executing command '/usr/bin/jsonGet.py -i 2 -f phone'
1974:20180519:121851.055 Executing command '/usr/bin/jsonGet.py -i 3 -f phone'
1974:20180519:121852.073 Executing command '/usr/bin/jsonGet.py -i 4 -f phone'
1974:20180519:121853.076 Executing command '/usr/bin/jsonGet.py -i 5 -f phone'
1973:20180519:121854.077 Executing command '/usr/bin/jsonGet.py -i 6 -f phone'
1972:20180519:121855.079 Executing command '/usr/bin/jsonGet.py -i 7 -f phone'
[cut]

How do I get field from HTTP GET JSON result to file?

I am trying to make a HTTP GET request to an API service and push one of the returned fields in the JSON result to a txt file.
Based on this previously asked question: (Getting JSON value from cURL in Linux Bash)
...I have a bash script as follows...
TOKEN_FILE="/myhome/project/resources/auto_token.txt"
AUTH_RESULT=$(curl -i -H "Content-Type: application/json" "https://access.mywebservice.com/access/oauth/token?grant_type=client_credentials&client_id=123456&client_secret=MySecretPassword");
RESULT_FIELDS=$( cat <<EOF | json_reformat | \
sed -rne '/:/s#^\s+"(\w+)":\s+"([^"]+)",?#json_\1="\2"#gp'
[$AUTH_RESULT]
EOF
)
if [ -f "$TOKEN_FILE" ]
then
echo "$RESULT_FIELDS" > "$TOKEN_FILE"
fi
The expected JSON result looks like this (copied from Postman):
{
"access_token": "eyJ5bGciOiJSUzI1NiJ6.eyJzY29wZSI6WyJDUl7iLCJNQVAiLCJQVFkiLCJ8R1QiLCJTVFMiLCJUVEwiXSwiaXNzIjoiaHR0cHM6Ly9hY2Nlc3MtdWF0LWFwaS5jb3JlbG9naWMuYXNpYSIsImVudl9hx2Nlc3NfcmVzdHJpY3QiOmZhbHNlLCJleHAiOjE0NjcyODMwODcsImNsaWVudF9pZCI6IjhhOTY4OGJjIn0.F2iQfVsi9zntOxKYrNRukSIwuQ_LGSi_WMIXKII2A3GOEaqs-WmFTi7az9rvvfDsOl9rHy_s_66A6PiCpPftyw21Fl0aZZRoFcKv2H_zDUHuxOEs8V36jHeLghV7pjHwYI_nG68CIGvfuRWFNzQuiMFWc_i8oB3n5noSd8fQqa4",
"token_type": "bearer",
"expires_in": 43199,
"scope": "PROD1 PROD2 PROD3",
"iss": "https://access.mywebservice.com",
"env_access_restrict": false
}
I get the following errors returned...
bash-4.1$ ./token_renewal_test_05.sh
: command not foundt_05.sh: line 2:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
115 576 0 576 0 0 2266 0 --:--:-- --:--:-- --:--:-- 30315
: command not foundt_05.sh: line 3:
: command not foundt_05.sh: line 4:
./token_renewal_test_05.sh: line 14: warning: here-document at line 10 delimited by end-of-file (wanted `EOF')
./token_renewal_test_05.sh: line 13: warning: here-document at line 9 delimited by end-of-file (wanted `EOF')
: command not foundt_05.sh: line 13:
lexical error: invalid char in json text.
sed -rne '/:/s#^\s+"(\w+)":\s+"
(right here) ------^
: command not foundt_05.sh: line 10:
./token_renewal_test_05.sh: line 16: syntax error: unexpected end of file
I'm a bit new to bash and despite what appears to be a direct pointer to the issue am having problems resolving this one (note this is version 5)!
Can anyone offer any assistance with this one?
PS: I do not have jq either.
Thanks!
Regards,
Chris
Caveat emptor as per this comment on Parsing JSON with UNIX tools.
A working solution for your format:
eval $(cat <<EOF | \
sed -re 's/(,|\{|\})//g' | \
sed -re 's/"(\w+)":\s*"?([^"]*)"?$/json_\1='\''\2'\''/'
$JSON
EOF
)
set | grep '^json_'
json_access_token=eyJ5bGciOiJSUzI1NiJ6.eyJzY29wZSI6WyJDUl7iLCJNQVAiLCJQVFkiLCJ8R1QiLCJTVFMiLCJUVEwiXSwiaXNzIjoiaHR0cHM6Ly9hY2Nlc3MtdWF0LWFwaS5jb3JlbG9naWMuYXNpYSIsImVudl9hx2Nlc3NfcmVzdHJpY3QiOmZhbHNlLCJleHAiOjE0NjcyODMwODcsImNsaWVudF9pZCI6IjhhOTY4OGJjIn0.F2iQfVsi9zntOxKYrNRukSIwuQ_LGSi_WMIXKII2A3GOEaqs-WmFTi7az9rvvfDsOl9rHy_s_66A6PiCpPftyw21Fl0aZZRoFcKv2H_zDUHuxOEs8V36jHeLghV7pjHwYI_nG68CIGvfuRWFNzQuiMFWc_i8oB3n5noSd8fQqa4
json_env_access_restrict=false
json_expires_in=43199
json_iss=https://access.mywebservice.com
json_scope='PROD1 PROD2 PROD3'
json_token_type=bearer
Thanks again Chepner and Drew
I was having too many issues with Sed (probably due to my lack of exprience). As it turns out, I tried using a lookbehind. Sed doesn't have this but grep does so knowing the strcuture of my JSON response will never chance, I was able to get my token extracted using the following with grep instead...
grep -o -P '(?<="access_token":").*(?=","token_type")'

stat missing operand error when run from tcl file using exec

I have the following in script.tcl:
#!/usr/bin/env tclsh
set disk(free) [exec -- stat -f -c 'scale=3;(%a*%S)/1024/1024/1024' / | bc ]
When I execute the script, I get the following output: (translated from Hungarian)
stat: missing operand
For more information execute the „stat --help” command.
while executing
"exec -- stat -f -c 'scale=3"
invoked from within
"set disk(free) [exec -- stat -f -c 'scale=3;(%a*%S)/1024/1024/1024' / | bc ]"
(file "~/script.tcl" line 2)
What am I doing wrong? Running the command on it's own works just fine.
You have to brace your expressions instead of single quotes.
% exec stat -f -c {scale=3;(%a*%S)/1024/1024/1024} / | bc
137.916
%