I ve been trying to use uniq -u to print the unique lines of a file with a single column
uniq -u samples3
but I get nothing! No output! The original file is pretty big (ca. 40mil lines) but still it makes no sense that i get no output...
I have already sorted it, so, what do I do wrong?
example file (only part of the original one):
https://filebin.net/okmotq7t6p2g67o3
this does the trick perfectly
awk '!a[$1]++' < samples3
Related
I have a large file that I cannot open on my computer. I am trying to delete rows of information that are unneeded.
My file looks like this:
NODE,107983_gene,382,666,-,cd10161,8,49,9.0E-100,49.4,0.52,domain
NODE,107985_gene,24,659,-,PF09699.9,108,148,6.3E-500,22.5,0.8571428571428571,domain
NODE,33693_gene,213,1433,-,PF01966.21,92,230,9.0E-10,38.7,0.9344262295081968,domain
NODE,33693_gene,213,1433,-,PRK04926,39,133,1.0E-8,54.5,0.19,domain
NODE,33693_gene,213,1433,-,cd00077,88,238,4.0E-6,44.3,0.86,domain
NODE,33693_gene,213,1433,-,smart00471,88,139,9.0E-7,41.9,0.42,domain
NODE,33694_gene,1430,1912,-,cd16326,67,135,4.0E-50,39.5,0.38,domain
I am trying to remove all lines that have an evalue more than 1.0E-10. This information in located in column 9. I have tried on command line:
awk '$9 >=1E-10' file name > outputfile
This has given me a smaller file but the evalues are all over the place and are not actually removing anything above 1E-10. I want small E-values only.
Does anyone have any suggestions?
almost there, you need to specify the field delimiter
$ awk -F, '$9<1E-10' file > small.values
I have a real odd one...
I'm outputting a table from a local MySQL database into a text file that has INSERT statements for each record (it's part of a much larger script and is the most efficient way to load data into an Aurora table).
All is working well except for one bug-bear. The first insert adds odd characters in the very first field inserted, but no others.
My insert statement:
.\mysqldump.exe -h localhost -u $localuser --password=$localpass --default-character-set=utf8 --extended-insert=FALSE --add-drop-table abcdatabase exporttable | Out-File $dataoutfile
The first insert statement saya "INSERT INTO exporttable VALUES ('13150',..."
Any idea what those first three characters are and, more importantly, how I get rid of them?
Thanks in advance
Ok, so I solved it with thanks to this post link near the bottom a comment by JBurace. I added -Encoding default to the end of the Out-File statement and the problem went away. Bizzare it was only one part of one field, but hey!
grep -n "Table Structure" dumpfile.sql
returns
XXXXXX:-- Table structure for table `table_name_1`
XXXXXX:-- Table structure for table `table_name_2`
XXXXXX:-- Table structure for table `table_name_3`
But after this point, it breaks. Not sure why ?
AND also
For retrieving a single table from huge dump file (Around 489GB), I used:
sed -n -e '/Table Structure 'table_name'/p' dump_file_name.sql > extracted_file.sql
But it is not able to locate the table_name.
So my question here is. How can all the tables be accessed ? Or why is it after certain table, it is not able to find the table.
Please If anyone can help me with this. It will be a greatest deed !
You have two problems with your sed command.
First, you're using single quotes inside the string that's delimited by single quotes. That won't work, because the inside quotes will just end the shell string, not be included literally.
Second, the quotes in the dump file are backticks, not single quotes.
Also, you're missing for table in your pattern, and the s in structure should be lowercase.
sed -n -e '/Table structure for table `table_name`/p' dump_file_name.sql > extracted_file.sql
But you can just use grep for this, you don't need sed:
grep 'Table structure for table `table_name`' dump_file_name.sql > extracted_file.sql
so what I have here is some output from a cisco switch and I need to capture the host name and use that to populate a csv file.
basically I run a show mac address-table and pull mac addresses and populate them into a csv file. that I got however I cant figure out how to grab the host name so that I can put that in a separate column.
I have done this:
awk '/#/{print $1}'
but that will print every line that has '#' in it. I only need 1 to populate a variable so I can re use it. the end result needs to look like this: (the CSV file has MAC address, port number , hostname. I use commas to indicate the column seperation
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#
Without knowing what the input file looks like, the exact solution that is required will be uncertain. However, as an example, given an input file like the requested output (which I've called switch.txt):
0011.2233.4455,Gi1/1,Switch1#
0011.2233.4488,Gi1/2,Switch1#
0011.2233.4499,Gi1/3,Switch1#
0011.2233.4455,Gi1/1,Switch3#
0011.2233.4488,Gi1/2,Switch2#
0011.2233.4498,Gi1/3,Switch3#
... a list of the unique values of the first field (comma-separated) can be obtained from:
$ awk -F, '{print $1}' <switch.txt | sort | uniq
0011.2233.4455
0011.2233.4488
0011.2233.4498
0011.2233.4499
An approach like this might help with extracting unique values from the actual input file.
What's the easiest way to get the data for a single table, delete a single table or break up the whole dump file into files each containing individual tables? I usually end up doing a lot of vi regex munging, but I bet there are easier ways to do these things with awk/perl, etc. The first page of Google results brings back a bunch of non-working perl scripts.
When I need to pull a single table from an sql dump, I use a combination of grep, head and tail.
Eg:
grep -n "CREATE TABLE" dump.sql
This then gives you the line numbers for each one, so if your table is on line 200 and the one after is on line 269, I do:
head -n 268 dump.sql > tophalf.sql
tail -n 69 tophalf.sql > yourtable.sql
I would imagine you could extend upon those principles to knock up a script that would split the whole thing down into one file per table.
Anyone want a go doing it here?
Another bit that might help start a bash loop going:
grep -n "CREATE TABLE " dump.sql | tr ':`(' ' ' | awk '{print $1, $4}'
That gives you a nice list of line numbers and table names like:
200 FooTable
269 BarTable
Save yourself a lot of hassle and use mysqldump -T if you can.
From the documentation:
--tab=path, -T path
Produce tab-separated data files. For each dumped table, mysqldump
creates a tbl_name.sql file that contains the CREATE TABLE statement
that creates the table, and a tbl_name.txt file that contains its
data. The option value is the directory in which to write the files.
By default, the .txt data files are formatted using tab characters
between column values and a newline at the end of each line. The
format can be specified explicitly using the --fields-xxx and
--lines-terminated-by options.
Note This option should be used only when mysqldump is run on the
same machine as the mysqld server. You must have the FILE privilege,
and the server must have permission to write files in the directory
that you specify.
This shell script will grab the tables you want and pass them to splitted.sql.
It’s capable of understanding regular expressions as I’ve added a sed -r option.
Also MyDumpSplitter can split the dump into individual table dumps.
Maatkit seems quite appropriate for this with mk-parallel-dump and mk-parallel-restore.
I am a bit late on that one, but if it can help anyone, I had to split a huge SQL dump file in order to import the data to another Mysql server.
what I ended up doing was splitting the dump file using the system command.
split -l 1000 import.sql splited_file
The above will split the sql file every 1000 lines.
Hope this helps someone