How to extract row data from a csv file using perl? - csv

I have a csv file like
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,,Susceptible,Resistant,Resistant
AB2,,Susceptible,Susceptible,Susceptible
AB3,,Resistant,Resistant,NA
I need to fill 2nd column i.e. Resistant phenotype with MDR, XDR and susceptible. for which I have to match antibiotic resistance profile like if in first row gentamycin & antreanam both are resistant the 2nd column will be filled with MDR and in 3nd row if all 3 are susceptible the 2nd column of 3rd row will be filled with susceptible.
I have written a code mentioned below which only display columns of the csv file. I got stuck what to do further.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'text.csv';
my #data;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my #fields = split(/,/, $line);
print $fields[0], "\n";
#print $fields[1], "\n";
}
close $file;
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,MDR,Susceptible,Resistant,Resistant
AB2,Susceptible,Susceptible,Susceptible,Susceptible
AB3,MDR,Resistant,Resistant,NA

Use the Text::CSV_XS module. Read a line, assign the right value to the that column, then print it again. In your sample code, you were only writing one column instead of all of them; the module will handle all of that for you:
use Text::CSV_XS;
my $csv = Text::CSV_XS->new;
# replace *DATA and *STDOUT with whatever filehandles you want
# to read then write.
while( my $row = $csv->getline(*DATA) ) {
$row->[1] = 'Some value';
$csv->say( *STDOUT, $row );
}
__DATA__
Genome Name,Resistance_phenotype,Amikacin,Gentamycin,Aztreonam
AB1,,Susceptible,Resistant,Resistant
AB2,,Susceptible,Susceptible,Susceptible
AB3,,Resistant,Resistant,NA
The output is:
"Genome Name","Some value",Amikacin,Gentamycin,Aztreonam
AB1,"Some value",Susceptible,Resistant,Resistant
AB2,"Some value",Susceptible,Susceptible,Susceptible
AB3,"Some value",Resistant,Resistant,NA

Related

Perl CSV Parsing with Header Access - header on row 3 and only some rows instead of all rows

I am trying to parse a given CSV File, stream on regular base.
My requirement is to Access the Data via ColumName (Header).
The ColumNames are not given in row 1. The ColumNames are given in row 2.
The CSV does have 100 rows but I only need 2 data rows to import.
The separator is a tab.
The following script works for header at row 1 and for all rows in the file
I failed to modify it to header at row 2 and to use only 2 rows or a number of rows.
script:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
my $fh = Tie::Handle::CSV->new ($file, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($file, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $myheader;
while (my $csv_line = <$fh>)
{
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}
The Data.csv could look like:
This is a silly sentence on the first line
Name GivenName Birthdate Number
Meier hans 18.03.1999 1
Frank Thomas 27.1.1974 2
Karl Franz 1.1.2000 3
Here could be something silly again
Thanks for any hint.
best regards
Use Text::CSV_XS instead of Tie::Handle::CSV (Which depends on the module so you have it installed already), read and throw away the first line, use the second line to set column names, and then read the rest of the data:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ sep => ",", # Using CSV because TSV doesn't play well with SO formatting
binary => 1});
# Read and discard the first line
$_ = <DATA>;
# Use the next line as the header and set column names
$csv->column_names($csv->getline(*DATA));
# Read some rows and access columns by name instead of position
my $nr = 0;
while (my $record = $csv->getline_hr(*DATA)) {
last if ++$nr == 4;
say "Row $nr: $record->{GivenName} was born on $record->{Birthdate}";
}
__DATA__
This is a silly sentence on the first line
Name,GivenName,Birthdate,Number
Meier,hans,18.03.1999,1
Frank,Thomas,27.1.1974,2
Karl,Franz,1.1.2000,3
Here could be something silly again
Tie::Handle::CSV accepts a filehandle instead of a filename. You can skip the first line by reading one line from it before you pass the filehandle to Tie::Handle::CSV:
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
my $hfh = Tie::Handle::CSV->new ($infile, header => 1, sep_char => "\t");
my #csv;
my $num_lines = 3;
while ($num_lines--){
my $line = <$hfh>;
push #csv, $line;
}
print Dumper \#csv;
thanks to you both.
To clarify more detail my requirements.
The original Data File does have maybe 100 Colums with dynamic unknown Names for me.
I will create a list of Colums/Attribute from a other Service for which this script should provide the data content of some rows.
Request is in Terms of the data example:
Please provide all Names and all Birthdates of the first 25 Rows.
The next Request could be all Names and Givennames of the first 10 rows.
That means from the content of 100 Columns I have to provide the content for two, four, five Columns only.
The output I use (foreach), is only to test the Access by ColumName to the content of rows.
I mixed up your solution and stayed with Tie::Handle::CSV.
At the moment I have to use the two filehandles- Maybe you have a hint to be more effective.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($infile, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $numberoflines = 2 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}
thanks got it running with "keys %$csv_line". I was not using because of missing knowlegde. ;-)
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
my $file = "data.csv";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $numberoflines = 3 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
my #Columns = keys %{ $csv_line } ;
foreach (#Columns )
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
print "-----------\n"
}
On last question:
The File I Read will be filled and modified by an other program.
What can I do to detect the File violation in case it makes a problem.
And I dont what the my script dies.
Thanks
regards

How to read a csv using Perl?

I want to read a csv using perl excluding the first row. Further, col 2 and col3 variables need to be stored in another file and the row read must be deleted.
Edit : Following code has worked. I just want the deletion part.
use strict;
use warnings;
my ($field1, $field2, $field3, $line);
my $file = 'D:\Patching_test\ptch_file.csv';
open( my $data, '<', $file ) or die;
while ( $line = <$data> ) {
next if $. == 1;
( $field1, $field2, $field3 ) = split ',', $line;
print "$field1 : $field2 : $field3 ";
my $filename = 'D:\DB_Patch.properties';
unlink $filename;
open( my $sh, '>', $filename )
or die "Could not open file '$filename' $!";
print $sh "Patch_id=$field2\n";
print $sh "Patch_Name=$field3";
close($sh);
close($data);
exit 0;
}
OPs problem poorly presented for processing
no input data sample provided
no desired output data presented
no modified input file after processing presented
Based on problem description following code provided
use strict;
use warnings;
use feature 'say';
my $input = 'D:\Patching_test\ptch_file.csv';
my $output = 'D:\DB_Patch.properties';
my $temp = 'D:\script_temp.dat';
open my $in, '<', $input
or die "Couldn't open $input";
open my $out, '>', $output
or die "Couldn't open $output";
open my $tmp, '>', $temp
or die "Couldn't open $temp";
while ( <$in> ) {
if( $. == 1 ) {
say $tmp $_;
} else {
my($patch_id, $patch_name) = (split ',')[1,2];
say $out "Patch_id=$patch_id";
say $out "Patch_Name=$patch_name";
}
}
close $in;
close $out;
close $tmp;
rename $temp,$input;
exit 0;

write into a csv file in multiple cells

I am coding in perl, how can you write into a csv file multiple variables and put each one in a separate cell in the same line.
this a part of my Code:
#!/usr/bin/perl
use feature qw(say);
use strict;
use warnings;
use constant BUFSIZE => 6;
my $year += 1900;
my $input_file = 'path\ZONE0.txt';
my $outputfile = 'path\outputfile.csv';
open (my $BIN, "<:raw", $input_file) or die "can't open the file $input_file: $!";
my $buffer;
open(FH, '>>', $outputfile) or die $!;
while (1) {
my $bytes_read = sysread $BIN, $buffer, BUFSIZE;
die "Could not read file $input_file: $!" if !defined $bytes_read;
last if $bytes_read <= 0;
my #decimal= map { unpack "C", $_ } split //, $buffer;
my $start= $decimal[0];
my $DevType = $decimal[1];
my #hexDevType = sprintf("0x%x", $DevType);
my #DevUID =($decimal[5], $decimal[4], $decimal[3], $decimal[2]);
my #hexDevUID = map { sprintf("0x%x",$_) } #DevUID;
print FH $start, ' ' , print FH $DevType,' ', #hexDevUID , "\n";
}
close $BIN;
this results in puting all the variable next to each other in one cell, which is not what I want. can you help me separate the variables.
CSV files don't have cells. I suspect you're opening the file in a spreadsheet program.
The secret of a CSV file is that the values are separated by commas. So you need to put commas between any values that you want to appear in separate cells in your spreadsheet.
It looks like your data is in #hexDevUID. The simplest way is to turn that into a comma-separated string using join():
join(',', #hexDevUID)
But the more robust approach will be to use Text::CSV_XS.
Bellow is modified OPs code which does not utilize any CVS modules for output.
Added error handling code for read error and insufficient number of read bytes for further processing.
use strict;
use warnings;
use feature 'say';
use constant BUFSIZE => 6;
my($buffer,$bytes_read);
my $infile = shift || 'path\ZONE0.txt';
my $outfile = 'path\outputfile.csv';
open my $in, '<:raw', $infile
or die "Can't open $infile: $!";
open my $out, '+>>', $outfile
or die "Can't open $outfile: $!";
do {
$bytes_read = sysread $in, $buffer, BUFSIZE;
die "Error: read from $infile: $!" unless defined $bytes_read;
error_handler($bytes_read) unless $bytes_read == 6;
my #decimal = map { ord } split //, $buffer;
my($start,$DevType) = #decimal[0,1];
my #hexDevUID = map { sprintf("0x%02x",$_) } #decimal[5,4,3,2];
say $out join(',',($start,$DevType,#hexDevUID));
} while ( $bytes_read );
sub error_handler {
my $bytes = shift;
close $out;
close $in;
say "
Error: called error_handler(\$read_bytes)
Action: Emergency file closure to preserve data
Cause: Read insufficient $bytes bytes
" unless $bytes == 0;
exit $bytes ? 1 : 0;
}
The loop can be rewritten with use of unpack like following
do {
$bytes_read = sysread $in, $buffer, BUFSIZE;
die "Error: read from $infile: $!" unless defined $bytes_read;
error_handler($bytes_read) unless $bytes_read == 6;
my($start,$DevType,#devUID) = unpack('CCC4',$buffer);
my #hexDevUID = reverse map { sprintf "0x%02x", $_ } #devUID;
say $out join(',',($start,$DevType,#hexDevUID));
} while ( $bytes_read );

Delete one character at End of File in PERL

So I have encountered a problem while programming with PERL. I use a foreach loop to get some data out of the hash, so it has to loop through it.
The Code:
foreach $title (keys %FilterSPRINTHASH) {
$openSP = $FilterSPRINTHASH{$title}{openSP};
$estSP = $FilterSPRINTHASH{$title}{estSP};
$line = "'$title':{'openSP' : $openSP, 'estSP' : $estSP}\n";
print $outfile "$line\n";
}
The thing is, that I am creating a seperate File with the PERL's writting to a file expression, which will be a JSONP text (later used for HTML).
Back to the problem:
As JSONP requires comma's "," after every line that is not the last one, i had to put a comma at the end of line, however when the last line comes in, I have to remove the comma.
I have tried with CHOP function, but not sure where to put it, since if I put it at the end of foreach, it will just chop the comma in $line, but this wont chop it in the new file I created.
I have also tried with while (<>) statement, with no success.
Any ideas appreaciated.
BR
Using JSON module is far less error prone; no need to reinvent the wheel
use JSON;
print $outfile encode_json(\%FilterSPRINTHASH), "\n";
You can check if it is the last iteration of the loop, then remove the comma from line.
So something like
my $count = keys %FilterSPRINTHASH; #Get number of keys (scalar context)
my $loop_count = 1; #Use a variable to count number of iteration
foreach $title (keys %FilterSPRINTHASH){
$openSP = $FilterSPRINTHASH{$title}{openSP};
$estSP = $FilterSPRINTHASH{$title}{estSP};
$line = "'$title':{'openSP' : $openSP, 'estSP' : $estSP}\n";
if($loop_count == $count){
#this is the last iteration, so remove the comma from line
$line =~ s/,+$//;
}
print $outfile "$line\n";
$loop_count++;
}
i would approach this by storing your output in an array and then joining that with the line separators you wish:
my #output; # storage for output
foreach $title (keys %FilterSPRINTHASH) {
# create each line
my $line = sprintf "'%s':{'openSP' : %s, 'estSP' : %s}", $title, $FilterSPRINTHASH{$title}{openSP}, $FilterSPRINTHASH{$title}{estSP};
# and put it in the output container
push #output, $line;
}
# join all outputlines with comma and newline and then output
print $outfile (join ",\n", #output);

Remove trailing commas at the end of the string using Perl

I'm parsing a CSV file in which each line look something as below.
10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
There seems to be trailing commas at the end of each line.
I want to get the first term, in this case "10998" and get the number of GO terms related to it.
So my output in this case should be,
Output:
10998,7
But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?
Thanks!
My Code:
use strict;
use warnings;
open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my #genes = ();
my $mf;
foreach my $line (<$IN>) {
chomp $line;
my #array = split(/,/, $line);
my #GO = splice(#array, 4);
my $GO = join(',', #GO);
$mf = count($GO);
print CSV "$array[0],$mf\n";
}
sub count {
my $go = shift #_;
my $count = my #go = split(/,/, $go);
return $count;
}
I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.
$line =~ s/,+$//;
I suggest this more concise way of coding your program.
Note that the line my #data = split /,/, $line discards trailing empty fields (#data has only 11 fields with your sample data) so will produce the same result whether or not trailing commas are removed beforehand.
use strict;
use warnings;
open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";
foreach my $line (<$in>) {
chomp $line;
my #data = split /,/, $line;
printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, #data;
}
You can apply grep to #array
my $mf = grep { /^GO:/ } #array;
assuming $array[0] never matches /^GO:/
For each your line:
foreach my $line (<$IN>) {
my ($first_term) = ($line =~ /(\d+),/);
my #tmp = split('GO', " $line ");
my $nr_of_GOs = #tmp - 1;
print CSV "$first_term,$nr_of_GOs\n";
}