Remove trailing commas at the end of the string using Perl - csv

I'm parsing a CSV file in which each line look something as below.
10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
There seems to be trailing commas at the end of each line.
I want to get the first term, in this case "10998" and get the number of GO terms related to it.
So my output in this case should be,
Output:
10998,7
But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?
Thanks!
My Code:
use strict;
use warnings;
open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my #genes = ();
my $mf;
foreach my $line (<$IN>) {
chomp $line;
my #array = split(/,/, $line);
my #GO = splice(#array, 4);
my $GO = join(',', #GO);
$mf = count($GO);
print CSV "$array[0],$mf\n";
}
sub count {
my $go = shift #_;
my $count = my #go = split(/,/, $go);
return $count;
}

I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.
$line =~ s/,+$//;

I suggest this more concise way of coding your program.
Note that the line my #data = split /,/, $line discards trailing empty fields (#data has only 11 fields with your sample data) so will produce the same result whether or not trailing commas are removed beforehand.
use strict;
use warnings;
open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";
foreach my $line (<$in>) {
chomp $line;
my #data = split /,/, $line;
printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, #data;
}

You can apply grep to #array
my $mf = grep { /^GO:/ } #array;
assuming $array[0] never matches /^GO:/

For each your line:
foreach my $line (<$IN>) {
my ($first_term) = ($line =~ /(\d+),/);
my #tmp = split('GO', " $line ");
my $nr_of_GOs = #tmp - 1;
print CSV "$first_term,$nr_of_GOs\n";
}

Related

How to read a csv using Perl?

I want to read a csv using perl excluding the first row. Further, col 2 and col3 variables need to be stored in another file and the row read must be deleted.
Edit : Following code has worked. I just want the deletion part.
use strict;
use warnings;
my ($field1, $field2, $field3, $line);
my $file = 'D:\Patching_test\ptch_file.csv';
open( my $data, '<', $file ) or die;
while ( $line = <$data> ) {
next if $. == 1;
( $field1, $field2, $field3 ) = split ',', $line;
print "$field1 : $field2 : $field3 ";
my $filename = 'D:\DB_Patch.properties';
unlink $filename;
open( my $sh, '>', $filename )
or die "Could not open file '$filename' $!";
print $sh "Patch_id=$field2\n";
print $sh "Patch_Name=$field3";
close($sh);
close($data);
exit 0;
}
OPs problem poorly presented for processing
no input data sample provided
no desired output data presented
no modified input file after processing presented
Based on problem description following code provided
use strict;
use warnings;
use feature 'say';
my $input = 'D:\Patching_test\ptch_file.csv';
my $output = 'D:\DB_Patch.properties';
my $temp = 'D:\script_temp.dat';
open my $in, '<', $input
or die "Couldn't open $input";
open my $out, '>', $output
or die "Couldn't open $output";
open my $tmp, '>', $temp
or die "Couldn't open $temp";
while ( <$in> ) {
if( $. == 1 ) {
say $tmp $_;
} else {
my($patch_id, $patch_name) = (split ',')[1,2];
say $out "Patch_id=$patch_id";
say $out "Patch_Name=$patch_name";
}
}
close $in;
close $out;
close $tmp;
rename $temp,$input;
exit 0;

How to extract row data from a csv file using perl?

I have a csv file like
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,,Susceptible,Resistant,Resistant
AB2,,Susceptible,Susceptible,Susceptible
AB3,,Resistant,Resistant,NA
I need to fill 2nd column i.e. Resistant phenotype with MDR, XDR and susceptible. for which I have to match antibiotic resistance profile like if in first row gentamycin & antreanam both are resistant the 2nd column will be filled with MDR and in 3nd row if all 3 are susceptible the 2nd column of 3rd row will be filled with susceptible.
I have written a code mentioned below which only display columns of the csv file. I got stuck what to do further.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'text.csv';
my #data;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my #fields = split(/,/, $line);
print $fields[0], "\n";
#print $fields[1], "\n";
}
close $file;
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,MDR,Susceptible,Resistant,Resistant
AB2,Susceptible,Susceptible,Susceptible,Susceptible
AB3,MDR,Resistant,Resistant,NA
Use the Text::CSV_XS module. Read a line, assign the right value to the that column, then print it again. In your sample code, you were only writing one column instead of all of them; the module will handle all of that for you:
use Text::CSV_XS;
my $csv = Text::CSV_XS->new;
# replace *DATA and *STDOUT with whatever filehandles you want
# to read then write.
while( my $row = $csv->getline(*DATA) ) {
$row->[1] = 'Some value';
$csv->say( *STDOUT, $row );
}
__DATA__
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,,Susceptible,Resistant,Resistant
AB2,,Susceptible,Susceptible,Susceptible
AB3,,Resistant,Resistant,NA
The output is:
"Genome Name","Some value",Amikacin,Gentamycin,Aztreonam
AB1,"Some value",Susceptible,Resistant,Resistant
AB2,"Some value",Susceptible,Susceptible,Susceptible
AB3,"Some value",Resistant,Resistant,NA

Corrupted JSON encoding in Perl (missign comma)

My custom code (on Perl) give next wrong JSON, missing comma between blocks:
{
"data": [{
"{#LOGFILEPATH}": "/tmp/QRZ2007.tcserverlogs",
"{#LOGFILE}": "QRZ2007"
} **missing comma** {
"{#LOGFILE}": "ARZ2007",
"{#LOGFILEPATH}": "/tmp/ARZ2007.tcserverlogs"
}]
}
My terrible code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use utf8;
use JSON;
binmode STDOUT, ":utf8";
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
opendir(DIR, $dir) or die $!;
print '{"data": [';
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file");
next unless ($file =~ m/\.tcserverlogs$/);
my $fullPath = "$dir/$file";
my $filenameshort = basename($file, ".tcserverlogs");
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
print $json->encode($data_to_json);
}
print ']}'."\n";
closedir(DIR);
exit 0;
Dear Team i am not a programmer, please any idea how fix it, thank you!
If you do not print a comma, you will not get a comma.
You are trying to build your own JSON string from pre-encoded pieces of smaller data structures. That will not work unless you tell Perl when to put commas. You could do that, but it's easier to just collect all the data into a Perl data structure that is equivalent to the JSON string you want to produce, and encode the whole thing in one go when you're done.
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
my #data;
opendir( DIR, $dir ) or die $!;
while ( my $file = readdir(DIR) ) {
next unless ( -f "$dir/$file" );
next unless ( $file =~ m/\.tcserverlogs$/ );
my $fullPath = "$dir/$file";
my $filenameshort = basename( $file, ".tcserverlogs" );
my $data_to_json = { "{#LOGFILEPATH}" => $fullPath, "{#LOGFILE}" => $filenameshort };
push #data, $data_to_json;
}
closedir(DIR);
print $json->encode( { data => \#data } );

JSON: dies on decoding when created file with pretty print

Why do I get this error, when I use the pretty print version?
'"' expected, at character offset 2 (before "(end of string)") at ./perl.pl line 29.
#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Data::Dumper;
use JSON;
my $json = JSON->new->utf8;
my $hashref = {
'muster, hanß' => {
'hello' => {
year => 2000,
color => 'green'
}
}
};
my $utf8_encoded_json_text = $json->pretty->encode( $hashref ); # leads to a die
#my $utf8_encoded_json_text = $json->encode( $hashref ); # works
open my $fh, '>', 'testfile.json' or die $!;
print $fh $utf8_encoded_json_text;
close $fh;
open $fh, '<', 'testfile.json' or die $!;
$utf8_encoded_json_text = readline $fh;
close $fh;
$hashref = decode_json( $utf8_encoded_json_text );
say Dumper $hashref;
Because when you read the file back in, you're using readline, and only reading the first line of the file. When pretty is off, the entire output is on one line. When pretty is on, the JSON is spread out over multiple lines, so you're passing invalid truncated JSON to decode_json.
Read the entire content by using local $/ = undef; or slurp or whatever else you want.

How can I extract data from HTML tables in Perl?

I'm trying to use regular expressions in Perl to parse a table with the following structure. The first line is as follows:
<tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</td><td width="1%"></td><td>Title</td><td>Label</td></tr>
Here I wish to take out "Time Played", "Artist", "Title", and "Label", and print them to an output file.
I've tried many regular expressions such as:
$lines =~ / (<td>) /
OR
$lines =~ / <td>(.*)< /
OR
$lines =~ / >(.*)< /
My current program looks like so:
#!perl -w
open INPUT_FILE, "<", "FIRST_LINE_OF_OUTPUT.txt" or die $!;
open OUTPUT_FILE, ">>", "PLAYLIST_TABLE.txt" or die $!;
my $lines = join '', <INPUT_FILE>;
print "Hello 2\n";
if ($lines =~ / (\S.*\S) /) {
print "this is 1: \n";
print $1;
if ($lines =~ / <td>(.*)< / ) {
print "this is the 2nd 1: \n";
print $1;
print "the word was: $1.\n";
$Time = $1;
print $Time;
print OUTPUT_FILE $Time;
} else {
print "2ND IF FAILED\n";
}
} else {
print "THIS FAILED\n";
}
close(INPUT_FILE);
close(OUTPUT_FILE);
Do NOT use regexps to parse HTML. There are a very large number of CPAN modules which do this for you much more effectively.
Can you provide some examples of why it is hard to parse XML and HTML with a regex?
Can you provide an example of parsing HTML with your favorite parser?
HTML::Parser
HTML::TreeBuilder
HTML::TableExtract
Use HTML::TableExtract. Really.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
my $file = 'Table3.htm';
unless ( -e $file ) {
my $rc = getstore(
'http://www.ntsb.gov/aviation/Table3.htm',
$file);
die "Failed to download document\n" unless $rc == 200;
}
my #headers = qw( Year Fatalities );
my $te = HTML::TableExtract->new(
headers => \#headers,
attribs => { id => 'myTable' },
);
$te->parse_file($file);
my ($table) = $te->tables;
print join("\t", #headers), "\n";
for my $row ($te->rows ) {
print join("\t", #$row), "\n";
}
This is what I meant in another post by "task-specific" HTML parsers.
You could have saved a lot of time by directing your energy to reading some documentation rather than throwing regexes at the wall and seeing if any stuck.
That's an easy one:
my $html = '<tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</td><td width="1%"></td><td>Title</td><td>Label</td></tr>';
my #stuff = $html =~ />([^<]+)</g;
print join (", ", #stuff), "\n";
See http://codepad.org/qz9d5Bro if you want to try running it.