I have a large file that I cannot open on my computer. I am trying to delete rows of information that are unneeded.
My file looks like this:
NODE,107983_gene,382,666,-,cd10161,8,49,9.0E-100,49.4,0.52,domain
NODE,107985_gene,24,659,-,PF09699.9,108,148,6.3E-500,22.5,0.8571428571428571,domain
NODE,33693_gene,213,1433,-,PF01966.21,92,230,9.0E-10,38.7,0.9344262295081968,domain
NODE,33693_gene,213,1433,-,PRK04926,39,133,1.0E-8,54.5,0.19,domain
NODE,33693_gene,213,1433,-,cd00077,88,238,4.0E-6,44.3,0.86,domain
NODE,33693_gene,213,1433,-,smart00471,88,139,9.0E-7,41.9,0.42,domain
NODE,33694_gene,1430,1912,-,cd16326,67,135,4.0E-50,39.5,0.38,domain
I am trying to remove all lines that have an evalue more than 1.0E-10. This information in located in column 9. I have tried on command line:
awk '$9 >=1E-10' file name > outputfile
This has given me a smaller file but the evalues are all over the place and are not actually removing anything above 1E-10. I want small E-values only.
Does anyone have any suggestions?
almost there, you need to specify the field delimiter
$ awk -F, '$9<1E-10' file > small.values
I have a very large sql file (14GB). Currently, i am not able to open this file on my browser or VS code because it is too huge, keeps crashing and would take so long. However, there is a single table that i want in this huge sql file.
Is there a way of splitting the sql file to get the specific table that i am searching for ? Any helpful answer please ?
You can do:
Step 1: grep ${YourTableName} -rni path/to/you/file
In output you'll see string which is matching ${YourTableName} and line number.
Step 2: tail -25 path/to/you/file > path/to/your/fileChunk (where -25 must be replaced with number from grep command), now in file path/to/your/fileChunk you will have at top stuff related to your table.
Step 3 (optional): Your file path/to/your/fileChunk at top contains stuff related to your table but in the middle and in the bottom of file you may have stuff related to other tables so please repeat step 1 & 2 on file path/to/your/fileChunk and delete needless info.
PS: It's only idea how to split your huge file into chunks, but you have to adapt this commands to your values.
I am creating a csv file from Oracle db using sqlplus spooling. The last line of the csv file contains spool summary of how many rows are selected, for e.g for a csv with 1641 rows in it (including header) the last line says
1641 rows selected.
I want to remove this line from the csv. Not sure if this can be achieved as a sqlplus parameter or by a windows batch script.
Appreciate any inputs to help me remove this last line (or not to create it at all) from the csv file.
I believe that in sql plus you would need to set feedback off
SET FEEDBACK OFF
I have txt file which has 1400 columns and 3.1M rows.
I want to convert this file into csv.
I tried doing it from excel - Data - from text option.
The file was made but it had only 120k rows and all 1400 columns.
I am not sure how i should convert this whole file into csv?
It would be great to have help on this.
Thanks
I see you selected "notepad" tag. You should try: gVim ( https://gvim.en.softonic.com/ ). I used it to open 2gb files and it worked like a charm.
You can find more programs that allow you to open big files here: https://stackoverflow.com/a/159537/1564840
On the other hand, I suggest you to split that big txt file in multiple txt files. Then you can convert the smaller txt files one by one.
I have a folder containing a number of csv files, e.g. "leeds dz.csv", "leeds gh.csv", "leeds fr.csv". The first part of the file names is constant (i.e. always "leeds").
I want to import each to Stata individually, convert to .dta file and save it. Currently I have this code:
cd "etcetc"
clear
local myfilelist : dir . files"*.csv"
foreach file of local myfilelist {
drop _all
insheet using `file', comma
local outfile = subinstr("`file'",".csv","",.)
save "`outfile'", replace
}
The code works fine if I rename all the .csv files manually to delete the "leeds" part, ie if each .csv is named "dz.csv" instead of "leeds dz.csv" etc.
However, if I do not do this deletion I receive the error "invalid 'dz.csv' "
I'm guessing this has something to do with my 3rd line of code, in particular the "*.csv". But I'm unsure how to adapt the code/ why it won't allow me to import files with a space in the name?
The line
insheet using `file', comma
will be problematic with any filename containing spaces.
Try
insheet using "`file'", comma
The help for insheet is quite explicit on this:
If filename is specified without an extension, .raw is assumed. If your
filename contains embedded spaces, remember to enclose it in double
quotes.