AWK or GREP 1 instance of repeated output - csv

so what I have here is some output from a cisco switch and I need to capture the host name and use that to populate a csv file.
basically I run a show mac address-table and pull mac addresses and populate them into a csv file. that I got however I cant figure out how to grab the host name so that I can put that in a separate column.
I have done this:
awk '/#/{print $1}'
but that will print every line that has '#' in it. I only need 1 to populate a variable so I can re use it. the end result needs to look like this: (the CSV file has MAC address, port number , hostname. I use commas to indicate the column seperation
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#

Without knowing what the input file looks like, the exact solution that is required will be uncertain. However, as an example, given an input file like the requested output (which I've called switch.txt):
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#
0011.2233.4455,Gi1/1,Switch3#
0011.2233.4488,Gi1/2,Switch2#
0011.2233.4498,Gi1/3,Switch3#
... a list of the unique values of the first field (comma-separated) can be obtained from:
$ awk -F, '{print $1}' <switch.txt | sort | uniq
0011.2233.4455
0011.2233.4488
0011.2233.4498
0011.2233.4499
An approach like this might help with extracting unique values from the actual input file.

Related

How to avoid string value returned as part of sql query output being split into different fields in array in bash/shell script

The output of my sql query has multiple columns and contains string values which contains space. I need to write bash script where in i have to read values into variable and then use it further in the script also insert them into another database.
When i store output into an array the string value gets split based on space and stored into different indexes in array. How can i handle this situation in bash script.
CMD="SELECT * FROM upload where upload_time>='2020-11-18 00:19:48' LIMIT 1;"
output=($(mysql $DBCONNECT --database=uploads -N --execute="$CMD"))
echo ${output[9]}
Output:
version test_id upload_time parser_result 25 567 2020-11-18 00:19:48 <p1>box crashed with exit status 0</p1>
The upload time "2020-11-18 00:19:48" gets stored in two indexes.
The more problematic is 'parser_result' value which is string. '<p1>box crashed with exit status 0</p1>' gets stored in different indexes splitting based on space.
${output[8]} contains '<p1>box'
${output[9]} contains 'crashed'
Database is very huge and i need to parse every row in it.
Since string value can be anything i am unable come up with generic code. What is the best way to handle this scenario. Am using bash scripting for the first time!! I have to use bash script since this script will run as a cron job inside docker container.
The fields are separated by TAB. Use that as your $IFS to parse the result.
IFS=$'\t' output=($(mysql $DBCONNECT --database=uploads -N --execute="$CMD"))
echo "${output[9]}"
If $DBCONNECT contains options separated with spaces, you need to do this in two steps, since it's using $IFS to split that as well.
result=$(mysql $DBCONNECT --database=uploads -N --execute="$CMD")
IFS=$'\t' output=($result)
echo "${ouptut[9]}"

Delete rows of a CSV file based off a column value on command line

I have a large file that I cannot open on my computer. I am trying to delete rows of information that are unneeded.
My file looks like this:
NODE,107983_gene,382,666,-,cd10161,8,49,9.0E-100,49.4,0.52,domain
NODE,107985_gene,24,659,-,PF09699.9,108,148,6.3E-500,22.5,0.8571428571428571,domain
NODE,33693_gene,213,1433,-,PF01966.21,92,230,9.0E-10,38.7,0.9344262295081968,domain
NODE,33693_gene,213,1433,-,PRK04926,39,133,1.0E-8,54.5,0.19,domain
NODE,33693_gene,213,1433,-,cd00077,88,238,4.0E-6,44.3,0.86,domain
NODE,33693_gene,213,1433,-,smart00471,88,139,9.0E-7,41.9,0.42,domain
NODE,33694_gene,1430,1912,-,cd16326,67,135,4.0E-50,39.5,0.38,domain
I am trying to remove all lines that have an evalue more than 1.0E-10. This information in located in column 9. I have tried on command line:
awk '$9 >=1E-10' file name > outputfile
This has given me a smaller file but the evalues are all over the place and are not actually removing anything above 1E-10. I want small E-values only.
Does anyone have any suggestions?
almost there, you need to specify the field delimiter
$ awk -F, '$9<1E-10' file > small.values

Get lines from file that match strings in another file using AWK

I have file named key and another csv file named val.csv. As you can imagine, the file named key looks something like this:
123
012
456
The file named val.csv has multiple columns and corresponding values. It looks like this:
V1,V2,V3,KEY,V5,V6
1,2,3,012,X,t
9,0,0,452,K,p
1,2,2,000,L,x
I would like get the subset of lines from val.csv whose value in the KEY column matches the values in the KEY file. Using the above example, I would like to get an output like this:
V1,V2,V3,KEY,V5,V6
1,2,3,012,X,t
Obviously these are just toy examples. The real KEY file I am using has nearly 500,000 'keys' and the val.csv file has close to 5 million lines in them. Thanks.
$ awk -F, 'FNR==NR{k[$1]=1;next;} FNR==1 || k[$4]' key val.csv
V1,V2,V3,KEY,V5,V6
1,2,3,012,X,t
How it works
FNR==NR { k[$1]=1;next; }
This saves the values of all keys read from the first file, key.
The condition is FNR==NR. FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, if FNR==NR, we are still reading the first file.
When reading the first file, key, this saves the value of key in associative array k. This then skips the rest of the commands and starts over on the next line.
FNR==1 || k[$4]
If we get here, we are working on the second file.
This condition is true either for the first line of the file, FNR==1, or for lines whose fourth field is in array k. If the condition is true, awk performs the default action which is to print the line.

DB load CSV into multiple tables

UPDATE: added an example to clarify the format of the data.
Considering a CSV with each line formatted like this:
tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,[tbl2.col1:tbl2.col2]+
where [tbl2.col1:tbl2.col2]+ means that there could be any number of these pairs repeated
ex:
tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2
The tables would relate to eachother using the line number as a key which would have to be created in addition to any columns mentioned above.
Is there a way to use mysql load
data infile to load the data into
two separate tables?
If not, what Unix command line tools
would be best suited for this?
no, not directly. load data can only insert into one table or partitioned table.
what you can do is load the data into a staging table, then use insert into to select the individual columns into the 2 final tables. you may also need substring_index if you're using different delimiters for tbl2's values. the line number is handled by an auto incrementing column in the staging table (the easiest way is to make the auto column last in the staging table definition).
the format is not exactly clear, and is best done w/perl/php/python, but if you really want to use shell tools:
cut -d , -f 1-5 file | awk -F, '{print NR "," $0}' > table1
cut -d , -f 6- file | sed 's,\:,\,,g' | \
awk -F, '{i=1; while (i<=NF) {print NR "," $(i) "," $(i+1); i+=2;}}' > table2
this creates table1 and table 2 files with these contents:
1,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
2,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
3,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
and
1,tbl2.col1,tbl2.col2
1,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
As you say, the problematic part is the unknown number of [tbl2.col1:tbl2.col2] pairs declared in each line. I would tempted to solve this through sed: split the one file into two files, one for each table. Then you can use load data infile to load each file into its corresponding table.

CSV field delimiter problem

This CSV file has a field delimiter of $
It looks like this:
14$"ALL0053"$$$"A"$$$"Direct Deposit in FOGSI A/c"$$"DR"$"DAS PRADIP ...
How can I view the file as columns, each field shown as in columns in a table.
I've tried many ways, none work. Any one knows how?
I am using Ubuntu
That's a weird CSV. Since a comma-separated file is usually separated by, well, commas. I think all you need to do is use a simple find/replace available in any text editor.
Open the file in Gnome Edit and look under Edit > Replace...
From there you can specify to replace all $s with ,s
Once your file is a real CSV, you can open it in Open Office Calc (spreadsheet), or really any other spreadsheet program for Ubuntu (GNOME).
cut -d $ -f 1,2,...x filename | sed 's/\$/ /g'
if you only want particular columns, and you don't want to see the $
or
sed 's/\$/ /g' filename
if you just want the $ to be replaced by a space
in ubuntu right-click on the file hit open with.. then OpenOffice Calc. then you should see a dialog box asking for delimiters etc. uncheck comma and and in the "other" field type a $. then hit okay and it will import it for you.
N
As a first attempt:
column -ts'$' path
but this doesn't handle empty fields well, so fix that with this ugly hack:
sed 's/\$\$/$ $/g' path | column -ts$