Lines with common words in outputs of 2 scripts - output

I need to run two linux shell scripts and get lines from the second script that contain same words as lines in the output from the first (not the whole line is the same). For example:
Script #1 output:
Router 1: Ip address 10.0.0.1
Router 2: Ip address 10.0.1.1
Router 3: Ip address 10.0.2.1
Script #2 output:
Router 1: Model: Cisco 2960
Router 2: Model: Juniper MX960
Router 5: Model: Huwei S3300
So, finally I need a list of routers that are present in both outputs, but only lines from the second script, i.e. lines with model.

Considering the above two script output is stored/redirected to tmp1 and tmp2 respectively.
Below script will output the common Router X present in both the files.
#!/bin/bash
tmp1="$1"
tmp2="$2"
while read -r line
do
routerName=$(echo "$line" | cut -d ":" -f 1)
if grep -q "$routerName" "$tmp2"
then
# Instead of printing you can add any logic
echo $routerName
fi
done < "$tmp1"
save the above script as filename.sh and pass the arguments
./filename.sh tmpScript1output_file tmpScript2output_file

Related

hard-coded output without expansion in Snakefile

I have Snakefile as following:
SAMPLES, = glob_wildcards("data/{sample}_R1.fq.gz")
rule all:
input:
expand("samtools_sorted_out/{sample}.raw.snps.indels.g.vcf", sample=SAMPLES),
expand("samtools_sorted_out/combined_gvcf")
rule combine_gvcf:
input: "samtools_sorted_out/{sample}.raw.snps.indels.g.vcf"
output:directory("samtools_sorted_out/combined_gvcf")
params: gvcf_file_list="gvcf_files.list",
gatk4="/storage/anaconda3/envs/exome/share/gatk4-4.1.0.0-0/gatk-package-4.1.0.0-local.jar"
shell:"""
java -DGATK_STACKTRACE_ON_USER_EXCEPTION=true \
-jar {params.gatk4} GenomicsDBImport \
-V {params.gvcf_file_list} \
--genomicsdb-workspace-path {output}
"""
When I test it with dry run, I got error:
RuleException in line 335 of /data/yifangt/exomecapture/Snakefile:
Wildcards in input, params, log or benchmark file of rule combine_gvcf cannot be determined from output files:
'sample'
There are two places that I need some help:
The {output} is a folder that will be created by the shell part;
The {output} folder was hard-coded manually required by the command line (and the contents are unknown ahead of time).
The problem seems to be with the {output} without expansion as compared with the {input} which does.
How should I handle with this situation? Thanks a lot!

Search in large csv files

The problem
I have thousands of csv files in a folder. Every file has 128,000 entries with four columns in each line.
From time to time (two times a day) I need to compare a list (10,000 entries) with all csv files. If one of the entries is identical with the third or fourth column of one of the csv files I need to write the whole csv row to an extra file.
Possible solutions
Grep
#!/bin/bash
getArray() {
array=()
while IFS= read -r line
do
array+=("$line")
done < "$1"
}
getArray "entries.log"
for e in "${array[#]}"
do
echo "$e"
/bin/grep $e ./csv/* >> found
done
This seems to work, but it lasts forever. After almost 48 hours the script checked only 48 entries of about 10,000.
MySQL
The next try was to import all csv files to a mysql database. But there I had problems with my table at around 50,000,000 entries.
So I wrote a script which created a new table after 49,000,000 entries and so I was able to import all csv files.
I tried to create an index on the second column but it always failed (timeout). To create the index before the import process wasn't possible, too. It slowed down the import to days instead of only a few hours.
The select statement was horrible, but it worked. Much faster than the "grep" solution but still to slow.
My question
What else can I try to search within the csv files?
To speed things up I copied all csv files to an ssd. But I hope there are other ways.
This is unlikely to offer you meaningful benefits, but some improvements to your script
use the built-in mapfile to slurp a file into an array:
mapfile -t array < entries.log
use grep with a file of patterns and appropriate flags.
I assume you want to match items in entries.log as fixed strings, not as regex patterns.
I also assume you want to match whole words.
grep -Fwf entries.log ./csv/*
This means you don't have to grep the 1000's of csv files 1000's of times (once for each item in entries.log). Actually this alone should give you a real meaningful performance improvement.
This also removes the need to read entries.log into an array at all.
In awk assuming all the csv files change, otherwise it would be wise to keep track of the already checked files. But first some test material:
$ mkdir test # the csvs go here
$ cat > test/file1 # has a match in 3rd
not not this not
$ cat > test/file2 # no match
not not not not
$ cat > test/file3 # has a match in 4th
not not not that
$ cat > list # these we look for
this
that
Then the script:
$ awk 'NR==FNR{a[$1];next} ($3 in a) || ($4 in a){print >> "out"}' list test/*
$ cat out
not not this not
not not not that
Explained:
$ awk ' # awk
NR==FNR { # process the list file
a[$1] # hash list entries to a
next # next list item
}
($3 in a) || ($4 in a) { # if 3rd or 4th field entry in hash
print >> "out" # append whole record to file "out"
}' list test/* # first list then the rest of the files
The script hashes all the list entries to a and reads thru the csv files looking for 3rd and 4th field entries in the hash outputing when there is a match.
If you test it, let me know how long it ran.
You can build a patterns file and then use xargs and grep -Ef to search for all patterns in batches of csv files, rather than one pattern at a time as in your current solution:
# prepare patterns file
while read -r line; do
printf '%s\n' "^[^,]+,[^,]+,$line,[^,]+$" # find value in third column
printf '%s\n' "^[^,]+,[^,]+,[^,]+,$line$" # find value in fourth column
done < entries.log > patterns.dat
find /path/to/csv -type f -name '*.csv' -print0 | xargs -0 grep -hEf patterns.dat > found.dat
find ... - emits a NUL-delimited list of all csv files found
xargs -0 ... - passes the file list to grep, in batches

Sending email via mysql bash

I have a query that sends the results to an email. I would like not to send an email if the query has NO results. How can i do that ?
heres the code
mysql -umy -hmysql1.com -P2 -pmysq <<<" Select * from Data.data "| mail -aFrom:test#test.com -s 'test' test#gmail.com
Not every task can be done easily in a single command pipeline. Trying to force it into a one-liner can make it hard to code and hard to maintain.
Feel free to write some statements in a script:
result=`mysql -umy -hmysql1.com -P2 -pmysq -e " Select * from Data.data "`
if [ -n "$result" ]
then
echo "$result" | mail -aFrom:test#test.com -s 'test' test#gmail.com
fi
The -n test is for strings being nonzero length. Read http://linuxcommand.org/lc3_man_pages/testh.html for more details on that.
Re your comment:
The statements I showed above are things you could type at the command-line in bash. Bash supports variables and "if/then/else" constructs and a lot more.
Writing a bash script is easy. Anything you can type at the command-line can be in a file. Open a text editor and write the lines I showed above. Save the file. For example it could be called "mailmyquery.sh" (the .sh extension is only customary, it's not required).
Exit the text editor. Then run:
bash mailmyquery.sh
And it runs the statements in the file as if you had written them yourself at the command-line.
VoilĂ ! You are now a shell script programmer!

Export blob from mysql to file in bash script

I'm trying to write a bash script that, among other things, extracts information from a mysql database. I tried the following to extract a file from entry 20:
mysql -se "select file_column_name from table where id=20;" >file.txt
That gave me a file.txt with the file name, not the file contents. How would I get the actual blob into file.txt?
Turn the value in file.txt into a variable and then use it as you need to? i.e.
blobFile=$(cat file.txt)
echo "----- contents of $blobFile ---------"
cat $blobFile
# copy the file somewhere else
scp $blobFile user#Remote /path/to/remote/loc/for/blobFile
# look for info in blobfile
grep specialInfo $blobFile
# etc ...
Is that what you want/need to do?
I hope this helps.

Processing MySQL result in bash

I'm currently having a already a bash script with a few thousand lines which sends various queries MySQL to generate applicable output for munin.
Up until now the results were simply numbers which weren't a problem, but now I'm facing a challenge to work with a more complex query in the form of:
$ echo "SELECT id, name FROM type ORDER BY sort" | mysql test
id name
2 Name1
1 Name2
3 Name3
From this result I need to store the id and name (and their respective association) and based on the IDs need to perform further queries, e.g. SELECT COUNT(*) FROM somedata WHERE type = 2 and later output that result paired with the associated name column from the first result.
I'd know easily how to do it in PHP/Ruby , but I'd like to spare to fork another process especially since it's polled regularly, but I'm complete lost where to start with bash.
Maybe using bash is the wrong approach anyway and I should just fork out?
I'm using GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu).
My example is not Bash, but I'd like to point out my parameters at invoking the mysql command, they surpress the boxing and the headers.
#!/bin/sh
mysql dbname -B -N -s -e "SELECT * FROM tbl" | while read -r line
do
echo "$line" | cut -f1 # outputs col #1
echo "$line" | cut -f2 # outputs col #2
echo "$line" | cut -f3 # outputs col #3
done
You would use a while read loop to process the output of that command.
echo "SELECT id, name FROM type ORDER BY sort" | mysql test | while read -r line
do
# you could use an if statement to skip the header line
do_something "$line"
done
or store it in an array:
while read -r line
do
array+=("$line")
done < <(echo "SELECT id, name FROM type ORDER BY sort" | mysql test)
That's a general overview of the technique. If you have more specific questions post them separately or if they're very simple post them in a comment or as an edit to your original question.
You're going to "fork out," as you put it, to the mysql command line client program anyhow. So either way you're going to have process-creation overhead. With your approach of using a new invocation of mysql for each query you're also going to incur the cost of connecting to and authenticating to the mysqld server multiple times. That's expensive, but the expense may not matter if this app doesn't scale up.
Making it secure against sql injection is another matter. If you prompt a user for her name and she answers "sally;drop table type;" she's laughing and you're screwed.
You might be wise to use a language that's more expressive in the areas that are important for data-base access for some of your logic. Ruby, PHP, PERL are all good choices. PERL happens to be tuned and designed to run snappily under shell script control.