Delete rows of a CSV file based off a column value on command line - csv

I have a large file that I cannot open on my computer. I am trying to delete rows of information that are unneeded.
My file looks like this:
NODE,107983_gene,382,666,-,cd10161,8,49,9.0E-100,49.4,0.52,domain
NODE,107985_gene,24,659,-,PF09699.9,108,148,6.3E-500,22.5,0.8571428571428571,domain
NODE,33693_gene,213,1433,-,PF01966.21,92,230,9.0E-10,38.7,0.9344262295081968,domain
NODE,33693_gene,213,1433,-,PRK04926,39,133,1.0E-8,54.5,0.19,domain
NODE,33693_gene,213,1433,-,cd00077,88,238,4.0E-6,44.3,0.86,domain
NODE,33693_gene,213,1433,-,smart00471,88,139,9.0E-7,41.9,0.42,domain
NODE,33694_gene,1430,1912,-,cd16326,67,135,4.0E-50,39.5,0.38,domain
I am trying to remove all lines that have an evalue more than 1.0E-10. This information in located in column 9. I have tried on command line:
awk '$9 >=1E-10' file name > outputfile
This has given me a smaller file but the evalues are all over the place and are not actually removing anything above 1E-10. I want small E-values only.
Does anyone have any suggestions?

almost there, you need to specify the field delimiter
$ awk -F, '$9<1E-10' file > small.values

Related

How to split string in excel file with bash script?

Good Afternoon
I am trying to develop a bash script which fetches data from a database and then fills an csv file with said data.
So far i have managed to just that but the way the data is presented is not good: all the data is written in one single cell like so:
and i would like for the data to be presented like this:
Here is my bash script code so far:
#! /bin/bash
currentDate=`date`
mysql -u root -p -D cms -e 'SELECT * from bill' > test_"${currentDate}".csv
Can anyone of you tell me what bash commands i can use to achieve the desired result?
Running the cat command of the file gives the following result:
thank you in advance
Using sed, you can change the delimiter from the output displayed in your image (please use text in the future)
$ sed 's/ \+/,/g' test.csv
If happy with the output, you can then save the file in place.
$ sed -i 's/ \+/,/g' test.csv
You should now have the output in different cells when opened in excel
Data appears to be tab-delimited (cat -T test.csv should show a ^I between each column); I believe excel's default behavior when opening a .csv file is to parse the file based on a comma delimiter.
To override this default behavior and have excel parse the file based on a different delimiter (tab in this case):
open a clean/new worksheet
(menu) DATA -> From Text (file browser should pop up)
select test.csv and hit Import (new pop up asks for details on how to parse)
make sure Delimited radio button is chosen (the default), hit Next >
make sure Tab checkbox is selected (the default), hit Next >
verify the format in the Data preview window (# bottom of pop up) and if ok then hit 'Finish'
Alternatively, save the file as test.txt and upon opening the file with excel you should be prompted with the same pop ups asking for parsing details.
I'm not a big excel user so I'm not sure if there's a way to get excel to automatically parse your files based on tabs (a google/web search will likely provide more help at this point).

Windows batch command to remove last line from a csv file created from sqlplus spooling

I am creating a csv file from Oracle db using sqlplus spooling. The last line of the csv file contains spool summary of how many rows are selected, for e.g for a csv with 1641 rows in it (including header) the last line says
1641 rows selected.
I want to remove this line from the csv. Not sure if this can be achieved as a sqlplus parameter or by a windows batch script.
Appreciate any inputs to help me remove this last line (or not to create it at all) from the csv file.
I believe that in sql plus you would need to set feedback off
SET FEEDBACK OFF

Csv parser - Evaluate header for each file

I have multiple CSV files in a directory. They may have different column combinations, but I would like to COPY them all with a single command, as there is a lot of them and they all go into same table. But the FDelimitedParser only evaluates the header row for the first file, then rejects all rows that do not fit - ie. all rows from most of the other files. I've been using FDelimitedParser but anything else is fine.
1 - Is this expected behavior, and if so, why ?
2 - I want it to evaluate the headers for each file, is there a way ?
Thanks
(Vertica 7.2)
Looks like you need flexTable for that , see http://vertica-howto.info/2014/07/how-to-load-csv-files-into-flex-tables/
Here's a small workaround that I use when I need to load a bunch of files in at once. This assumes all your files have the same column order.
Download and run Cygwin
Navigate to folder with csv files
cd your_folder_name_with_csv_files
Combine all csv files into a new file
cat *.csv >> new_file_name.csv
Run a copy statement in Vertica from new file. If file headers are an issue, you can follow instructions on this link and run through Cygwin to remove the first line from every file.

AWK or GREP 1 instance of repeated output

so what I have here is some output from a cisco switch and I need to capture the host name and use that to populate a csv file.
basically I run a show mac address-table and pull mac addresses and populate them into a csv file. that I got however I cant figure out how to grab the host name so that I can put that in a separate column.
I have done this:
awk '/#/{print $1}'
but that will print every line that has '#' in it. I only need 1 to populate a variable so I can re use it. the end result needs to look like this: (the CSV file has MAC address, port number , hostname. I use commas to indicate the column seperation
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#
Without knowing what the input file looks like, the exact solution that is required will be uncertain. However, as an example, given an input file like the requested output (which I've called switch.txt):
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#
0011.2233.4455,Gi1/1,Switch3#
0011.2233.4488,Gi1/2,Switch2#
0011.2233.4498,Gi1/3,Switch3#
... a list of the unique values of the first field (comma-separated) can be obtained from:
$ awk -F, '{print $1}' <switch.txt | sort | uniq
0011.2233.4455
0011.2233.4488
0011.2233.4498
0011.2233.4499
An approach like this might help with extracting unique values from the actual input file.

CSV field delimiter problem

This CSV file has a field delimiter of $
It looks like this:
14$"ALL0053"$$$"A"$$$"Direct Deposit in FOGSI A/c"$$"DR"$"DAS PRADIP ...
How can I view the file as columns, each field shown as in columns in a table.
I've tried many ways, none work. Any one knows how?
I am using Ubuntu
That's a weird CSV. Since a comma-separated file is usually separated by, well, commas. I think all you need to do is use a simple find/replace available in any text editor.
Open the file in Gnome Edit and look under Edit > Replace...
From there you can specify to replace all $s with ,s
Once your file is a real CSV, you can open it in Open Office Calc (spreadsheet), or really any other spreadsheet program for Ubuntu (GNOME).
cut -d $ -f 1,2,...x filename | sed 's/\$/ /g'
if you only want particular columns, and you don't want to see the $
or
sed 's/\$/ /g' filename
if you just want the $ to be replaced by a space
in ubuntu right-click on the file hit open with.. then OpenOffice Calc. then you should see a dialog box asking for delimiters etc. uncheck comma and and in the "other" field type a $. then hit okay and it will import it for you.
N
As a first attempt:
column -ts'$' path
but this doesn't handle empty fields well, so fix that with this ugly hack:
sed 's/\$\$/$ $/g' path | column -ts$