How do I read a file's contents into a Perl scalar?

How do I read a file's contents into a Perl scalar? - html

what i am trying to do is get the contents of a file from another server. Since im not in tune with perl, nor know its mods and functions iv'e gone about it this way:
my $fileContents;
if( $md5Con =~ m/\.php$/g ) {
my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $#";
$ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
$ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
$fileContents = <FILE>;
close(FILE);
unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
$ftp->quit;
}
What i thought id do is get the file from the server, put on my local machine, edit the content, upload to where ever an then delete the temp file.
But I cannot seem to figure out how to get the contents of the file;
open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
$fileContents = <FILE>;
close(FILE);
keep getting error;
Use of uninitialized value $fileContents
Which im guessing means it isn't returning a value.
Any help much appreciated.
>>>>>>>>>> EDIT <<<<<<<<<<
my $fileContents;
if( $md5Con =~ m/\.php$/g ) {
my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $#";
$ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
$ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
my $file = "c:/perlscripts/" . md5_hex($md5Con) . "-code.php";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n";
my $fileContents = <FILE>;
#print $fileContents;
my $bodyContents;
my $headContents;
if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
print $0 . $1 . "\n";
$bodyContents = $dbh->quote($1);
}
if( $fileContents =~ m/^.*<\/head>/gi ) {
print $0 . $1 . "\n";
$headContents = $dbh->quote($1);
}
$bodyTable = $dbh->quote($bodyTable);
$headerTable = $dbh->quote($headerTable);
$dbh->do($createBodyTable) or die " error: Couldn't create body table: " . DBI->errstr;
$dbh->do($createHeadTable) or die " error: Couldn't create header table: " . DBI->errstr;
$dbh->do("INSERT INTO $headerTable ( headData, headDataOutput ) VALUES ( $headContents, $headContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
$dbh->do("INSERT INTO $bodyTable ( bodyData, bodyDataOutput ) VALUES ( $bodyContents, $bodyContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
$dbh->do("INSERT INTO page_names (linkFromRoot, linkTrue, page_name, table_name, navigation, location) VALUES ( $linkFromRoot, $linkTrue, $page_name, $table_name, $navigation, $location )") or die " error: Couldn't connect to database: " . DBI->errstr;
unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
}
$ftp->quit;
}
the above using print WILL print the whole file. BUT, for some reason the two regular expresions are returning false. Any idea why?
if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
print $0 . $1 . "\n";
$bodyContents = $dbh->quote($1);
}
if( $fileContents =~ m/^.*<\/head>/gi ) {
print $0 . $1 . "\n";
$headContents = $dbh->quote($1);
}

This is covered in section 5 of the Perl FAQ included with the standard distribution.
How can I read in an entire file all at once?
You can use the Path::Class::File::slurp module to do it in one step.
use Path::Class;
$all_of_it = file($filename)->slurp; # entire file in scalar
#all_lines = file($filename)->slurp; # one line per element
The customary Perl approach for processing all the lines in a file is to do so one line at a time:
open (INPUT, $file) || die "can't open $file: $!";
while (<INPUT>) {
chomp;
# do something with $_
}
close(INPUT) || die "can't close $file: $!";
This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often—if not almost always—the wrong approach. Whenever you see someone do this:
#lines = <INPUT>;
you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.
You can read the entire filehandle contents into a scalar.
{
local(*INPUT, $/);
open (INPUT, $file) || die "can't open $file: $!";
$var = <INPUT>;
}
That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:
$var = do { local $/; <INPUT> };
For ordinary files you can also use the read function.
read( INPUT, $var, -s INPUT );
The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.

Use Path::Class::File::slurp if you want to read all file contents in one go.
However, more importantly, use an HTML parser to parse HTML.

open FILE, "c:/perlscripts" . md5_hex($md5Con) . "-code.php" or die $!;
while (<FILE>) {
# each line is in $_
}
close(FILE);
will open the file and allow you to process it line-by-line (if that's what you want - otherwise investigate binmode). I think the problem is in your prepending the filename to open with >>. See this tutorial for more info.
I note you're also using regular expressions to parse HTML. Generally I would recommend using a parser to do this (e.g. see HTML::Parser). Regular expressions aren't suited to HTML due to HTML's lack of regularity, and won't work reliably in general cases.

Also, if you are in need of editing the contents of the files take a look at the CPAN module
Tie::File
This module relieves you from the need to creation of a temp file for editing the content
and writing it back to the same file.
EDIT:
What you are looking at is a way to slurp the file. May be you have to undefine
the record separator variable $/
The below code works fine for me:
use strict;
my $file = "test.txt";
{
local( $/ ); # undefine the record seperator
open FILE, "<", $file or die "Cannot open:$!\n";
my $lines =<FILE>;
print $lines;
}
Also see the section "Traditional Slurping" in this article.

BUT, for some reason the two regular expresions are returning false. Any idea why?
. in a regular expression by default matches any character except newline. Presumably you have newlines before the </head> tag and after the <body> tag. To make . match any character including newlines, use the //s flag.
I'm not sure what your print $0 . $1 ... code is about; you aren't capturing anything in your matches to be stored in $1, and $0 isn't a variable used for regular expression captures, it's something very different.

if you want to get the content of the file,
#lines = <FILE>;

Use File::Slurp::Tiny. As convenient as File::Slurp, but without the bugs.

Related

write into a csv file in multiple cells

I am coding in perl, how can you write into a csv file multiple variables and put each one in a separate cell in the same line.
this a part of my Code:
#!/usr/bin/perl
use feature qw(say);
use strict;
use warnings;
use constant BUFSIZE => 6;
my $year += 1900;
my $input_file = 'path\ZONE0.txt';
my $outputfile = 'path\outputfile.csv';
open (my $BIN, "<:raw", $input_file) or die "can't open the file $input_file: $!";
my $buffer;
open(FH, '>>', $outputfile) or die $!;
while (1) {
my $bytes_read = sysread $BIN, $buffer, BUFSIZE;
die "Could not read file $input_file: $!" if !defined $bytes_read;
last if $bytes_read <= 0;
my #decimal= map { unpack "C", $_ } split //, $buffer;
my $start= $decimal[0];
my $DevType = $decimal[1];
my #hexDevType = sprintf("0x%x", $DevType);
my #DevUID =($decimal[5], $decimal[4], $decimal[3], $decimal[2]);
my #hexDevUID = map { sprintf("0x%x",$_) } #DevUID;
print FH $start, ' ' , print FH $DevType,' ', #hexDevUID , "\n";
}
close $BIN;
this results in puting all the variable next to each other in one cell, which is not what I want. can you help me separate the variables.

CSV files don't have cells. I suspect you're opening the file in a spreadsheet program.
The secret of a CSV file is that the values are separated by commas. So you need to put commas between any values that you want to appear in separate cells in your spreadsheet.
It looks like your data is in #hexDevUID. The simplest way is to turn that into a comma-separated string using join():
join(',', #hexDevUID)
But the more robust approach will be to use Text::CSV_XS.

Bellow is modified OPs code which does not utilize any CVS modules for output.
Added error handling code for read error and insufficient number of read bytes for further processing.
use strict;
use warnings;
use feature 'say';
use constant BUFSIZE => 6;
my($buffer,$bytes_read);
my $infile = shift || 'path\ZONE0.txt';
my $outfile = 'path\outputfile.csv';
open my $in, '<:raw', $infile
or die "Can't open $infile: $!";
open my $out, '+>>', $outfile
or die "Can't open $outfile: $!";
do {
$bytes_read = sysread $in, $buffer, BUFSIZE;
die "Error: read from $infile: $!" unless defined $bytes_read;
error_handler($bytes_read) unless $bytes_read == 6;
my #decimal = map { ord } split //, $buffer;
my($start,$DevType) = #decimal[0,1];
my #hexDevUID = map { sprintf("0x%02x",$_) } #decimal[5,4,3,2];
say $out join(',',($start,$DevType,#hexDevUID));
} while ( $bytes_read );
sub error_handler {
my $bytes = shift;
close $out;
close $in;
say "
Error: called error_handler(\$read_bytes)
Action: Emergency file closure to preserve data
Cause: Read insufficient $bytes bytes
" unless $bytes == 0;
exit $bytes ? 1 : 0;
}
The loop can be rewritten with use of unpack like following
do {
$bytes_read = sysread $in, $buffer, BUFSIZE;
die "Error: read from $infile: $!" unless defined $bytes_read;
error_handler($bytes_read) unless $bytes_read == 6;
my($start,$DevType,#devUID) = unpack('CCC4',$buffer);
my #hexDevUID = reverse map { sprintf "0x%02x", $_ } #devUID;
say $out join(',',($start,$DevType,#hexDevUID));
} while ( $bytes_read );

Compare 2 CSV Huge CSV Files and print the differences to another csv file using perl

I have 2 csv files of multiple fields(approx 30 fields), and huge size ( approx 4GB ).
File1:
EmployeeName,Age,Salary,Address
Vinoth,12,2548.245,"140,North Street,India"
Vivek,40,2548.245,"140,North Street,India"
Karthick,10,10.245,"140,North Street,India"
File2:
EmployeeName,Age,Salary,Address
Vinoth,12,2548.245,"140,North Street,USA"
Karthick,10,10.245,"140,North Street,India"
Vivek,40,2548.245,"140,North Street,India"
I want to compare these 2 files and report the differences into another csv file. In the above example, Employee Vivek and Karthick details are present in different row numbers but still the record data is same, so it should be considered as match. Employee Vinoth record should be considered as a mismatch since there is a mismatch in the address.
Output diff.csv file can contain the mismatched record from the File1 and File 2 as below.
Diff.csv
EmployeeName,Age,Salary,Address
F1, Vinoth,12,2548.245,"140,North Street,India"
F2, Vinoth,12,2548.245,"140,North Street,USA"
I've written the code so far as below. After this I'm confused which option to choose whether a Binary Search or any other efficient way to do this. Could you please help me?
My approach
1. Load the File2 in memory as hashes of hashes.
2.Read line by line from File1 and match it with the hash of hashes in memory.
use strict;
use warnings;
use Text::CSV_XS;
use Getopt::Long;
use Data::Dumper;
use Text::CSV::Hashify;
use List::BinarySearch qw( :all );
# Get Command Line Parameters
my %opts = ();
GetOptions( \%opts, "file1=s", "file2=s", )
or die("Error in command line arguments\n");
if ( !defined $opts{'file1'} ) {
die "CSV file --file1 not specified.\n";
}
if ( !defined $opts{'file2'} ) {
die "CSV file --file2 not specified.\n";
}
my $file1 = $opts{'file1'};
my $file2 = $opts{'file2'};
my $file3 = 'diff.csv';
print $file2 . "\n";
my $csv1 =
Text::CSV_XS->new(
{ binary => 1, auto_diag => 1, sep_char => ',', eol => $/ } );
my $csv2 =
Text::CSV_XS->new(
{ binary => 1, auto_diag => 1, sep_char => ',', eol => $/ } );
my $csvout =
Text::CSV_XS->new(
{ binary => 1, auto_diag => 1, sep_char => ',', eol => $/ } );
open( my $fh1, '<:encoding(utf8)', $file1 )
or die "Cannot not open '$file1' $!.\n";
open( my $fh2, '<:encoding(utf8)', $file2 )
or die "Cannot not open '$file2' $!.\n";
open( my $fh3, '>:encoding(utf8)', $file3 )
or die "Cannot not open '$file3' $!.\n";
binmode( STDOUT, ":utf8" );
my $f1line = undef;
my $f2line = undef;
my $header1 = undef;
my $f1empty = 'false';
my $f2empty = 'false';
my $reccount = 0;
my $hash_ref = hashify( "$file2", 'EmployeeName' );
if ( $f1empty eq 'false' ) {
$f1line = $csv1->getline($fh1);
}
while (1) {
if ( $f1empty eq 'false' ) {
$f1line = $csv1->getline($fh1);
}
if ( !defined $f1line ) {
$f1empty = 'true';
}
if ( $f1empty eq 'true' ) {
last;
}
else {
## Read each line from File1 and match it with the File 2 which is loaded as hashes of hashes in perl. Need help here.
}
}
print "End of Program" . "\n";

Storing data of such magnitude in database is most correct approach to tasks of this kind. At minimum SQLlite is recommended but other databases MariaDB, MySQL, PostgreSQL will work quite well.
Following code demonstrates how desired output can be achieved without special modules, but it does not take in account possibly messed up input data. This script will report data records as different even if difference can be just one extra space.
Default output is into console window unless you specify option output.
NOTE: Whole file #1 is read into memory, please be patient processing big files can take a while.
use strict;
use warnings;
use feature 'say';
use Getopt::Long qw(GetOptions);
use Pod::Usage;
my %opt;
my #args = (
'file1|f1=s',
'file2|f2=s',
'output|o=s',
'debug|d',
'help|?',
'man|m'
);
GetOptions( \%opt, #args ) or pod2usage(2);
print Dumper(\%opt) if $opt{debug};
pod2usage(1) if $opt{help};
pod2usage(-exitval => 0, -verbose => 2) if $opt{man};
pod2usage(1) unless $opt{file1};
pod2usage(1) unless $opt{file2};
unlink $opt{output} if defined $opt{output} and -f $opt{output};
compare($opt{file1},$opt{file2});
sub compare {
my $fname1 = shift;
my $fname2 = shift;
my $hfile1 = file2hash($fname1);
open my $fh, '<:encoding(utf8)', $fname2
or die "Couldn't open $fname2";
while(<$fh>) {
chomp;
next unless /^(.*?),(.*)$/;
my($key,$data) = ($1, $2);
if( !defined $hfile1->{$key} ) {
my $msg = "$fname1 $key is missing";
say_msg($msg);
} elsif( $data ne $hfile1->{$key} ) {
my $msg = "$fname1 $key,$hfile1->{$key}\n$fname2 $_";
say_msg($msg);
}
}
}
sub say_msg {
my $msg = shift;
if( $opt{output} ) {
open my $fh, '>>:encoding(utf8)', $opt{output}
or die "Couldn't to open $opt{output}";
say $fh $msg;
close $fh;
} else {
say $msg;
}
}
sub file2hash {
my $fname = shift;
my %hash;
open my $fh, '<:encoding(utf8)', $fname
or die "Couldn't open $fname";
while(<$fh>) {
chomp;
next unless /^(.*?),(.*)$/;
$hash{$1} = $2;
}
close $fh;
return \%hash;
}
__END__
=head1 NAME
comp_cvs - compares two CVS files and stores differense
=head1 SYNOPSIS
comp_cvs.pl -f1 file1.cvs -f2 file2.cvs -o diff.txt
Options:
-f1,--file1 input CVS filename #1
-f2,--file2 input CVS filename #2
-o,--output output filename
-d,--debug output debug information
-?,--help brief help message
-m,--man full documentation
=head1 OPTIONS
=over 4
=item B<-f1,--file1>
Input CVS filename #1
=item B<-f2,--file2>
Input CVS filename #2
=item B<-o,--output>
Output filename
=item B<-d,--debug>
Print debug information.
=item B<-?,--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> accepts B<input> and processes to B<output> with purpose of achiving some goal.
=head1 EXIT STATUS
The section describes B<EXIT STATUS> codes of the program
=head1 ENVIRONMENT
The section describes B<ENVIRONMENT VARIABLES> utilized in the program
=head1 FILES
The section describes B<FILES> which used for program's configuration
=head1 EXAMPLES
The section demonstrates some B<EXAMPLES> of the code
=head1 REPORTING BUGS
The section provides information how to report bugs
=head1 AUTHOR
The section describing author and his contanct information
=head1 ACKNOWLEDGMENT
The section to give credits people in some way related to the code
=head1 SEE ALSO
The section describing related information - reference to other programs, blogs, website, ...
=head1 HISTORY
The section gives historical information related to the code of the program
=head1 COPYRIGHT
Copyright information related to the code
=cut
Output for test files
file1.cvs Vinoth,12,2548.245,"140,North Street,India"
file2.cvs Vinoth,12,2548.245,"140,North Street,USA"

#!/usr/bin/env perl
use Data::Dumper;
use Digest::MD5;
use 5.01800;
use warnings;
my %POS;
my %chars;
open my $FILEA,'<',q{FileA.txt}
or die "Can't open 'FileA.txt' for reading! $!";
open my $FILEB,'<',q{FileB.txt}
or die "Can't open 'FileB.txt' for reading! $!";
open my $OnlyInA,'>',q{OnlyInA.txt}
or die "Can't open 'OnlyInA.txt' for writing! $!";
open my $InBoth,'>',q{InBoth.txt}
or die "Can't open 'InBoth.txt' for writing! $!";
open my $OnlyInB,'>',q{OnlyInB.txt}
or die "Can't open 'OnlyInB.txt' for writing! $!";
<$FILEA>,
$POS{FILEA}=tell $FILEA;
<$FILEB>,
$POS{FILEB}=tell $FILEB;
warn Data::Dumper->Dump([\%POS],[qw(*POS)]),' ';
{ # Scan for first character of the records involved
while (<$FILEA>) {
$chars{substr($_,0,1)}++;
};
while (<$FILEB>) {
$chars{substr($_,0,1)}--;
};
# So what characters do we need to deal with?
warn Data::Dumper->Dump([\%chars],[qw(*chars)]),' ';
};
my #chars=sort keys %chars;
{
my %_h;
# For each of the characters in our character set
for my $char (#chars) {
warn Data::Dumper->Dump([\$char],[qw(*char)]),' ';
# Beginning of data sections
seek $FILEA,$POS{FILEA},0;
seek $FILEB,$POS{FILEB},0;
%_h=();
my $pos=tell $FILEA;
while (<$FILEA>) {
next
unless (substr($_,0,1) eq $char);
# for each record save the lengthAndMD5 as the key and its start as the value
$_h{lengthAndMD5(\$_)}=$pos;
$pos=tell $FILEA;
};
my $_s;
while (<$FILEB>) {
next
unless (substr($_,0,1) eq $char);
if (exists $_h{$_s=lengthAndMD5(\$_)}) { # It's a duplicate
print {$InBoth} $_;
delete $_h{$_s};
}
else { # (Not in FILEA) It's only in FILEB
print {$OnlyInB} $_;
}
};
# only in FILEA
warn Data::Dumper->Dump([\%_h],[qw(*_h)]),' ';
for my $key (keys %_h) { # Only in FILEA
seek $FILEA,delete $_h{$key},0;
print {$OnlyInA} scalar <$FILEA>;
};
# Should be empty
warn Data::Dumper->Dump([\%_h],[qw(*_h)]),' ';
};
};
close $OnlyInB
or die "Could NOT close 'OnlyInB.txt' after writing! $!";
close $InBoth
or die "Could NOT close 'InBoth.txt' after writing! $!";
close $OnlyInA
or die "Could NOT close 'OnlyInA.txt' after writing! $!";
close $FILEB
or die "Could NOT close 'FileB.txt' after reading! $!";
close $FILEA
or die "Could NOT close 'FileA.txt' after reading! $!";
exit;
sub lengthAndMD5 {
return sprintf("%8.8lx-%32.32s",length(${$_[0]}),Digest::MD5::md5_hex(${$_[0]}));
};
__END__

Perl file I/O issue

I'm developing a Perl script that's supposed to generate an HTML file from numerical values from other file. The idea is to read the file that has these values and then list them in a separate HTML file. The file that contains the numerical values is updated every a certain period of time, and those changes should be seen on the HTML.
Even though these values are correctly read (I've tested it) they are not printed in the HTML. Whats-more, the HTML tags are not even printed. This is the code I've written:
#!/usr/bin/perl
use IO::Handle;
use CGI qw(:standard);
print "Status: 200 OK", "\n";
print "Content-type: text/plain", "\n\n";
for(;;) {
open (my $input_file, "<", "/path/to/input/file/input_file.txt") || die "Unable to open the file: $!";
open (my $html_file, ">", "/path/to/html/file/index.html") || die "Unable to open the HTML file: $!";
print $html_file "<html><head><title>title</title><META HTTP-QUIV='refresh' CONTENT='10'></head><body>";
#lines = <$input_file>;
foreach my $line (#lines) {
print $html_file "<p>$line</p>";
}
print $html_file "</body></html>";
sleep 1;
close $input_file || die;
close $html_file || die;
}
The script only works in the first for iteration. What I mean is that the HTML tags and the numerical values are correctly printed in the output file. Then, from iteration 2 to N, the file remains literally empty. I can not see what I'm missing here. Why does it work in the first iteration but not in the following ones?
Thanks in advance

You need to close the file before the sleep. As it stands, the data is flushed to the file by the close and then immediately overwritten by the next open, and left empty for one second
You also need to write
close $html_file or die $!
as the code you have is equivalent to
close($html_file || die)
so your program will never die as long as $html_file is true

Error use of uninitialized value

So I made this small script today in perl.
For some reason it doesn't seem to be downloading anything, and an error keeps popping up saying
Use of uninitialized value $id in concatenation (.) or string at room.pl line 18.
Use of uninitialized value $id in concatenation (.) or string at room.pl line 18.
Could someone help me fix this code?
Also is using File::Path okay? and this is the json http://media1.clubpenguin.com/play/en/web_service/game_configs/rooms.json
use strict;
use warnings;
use File::Path;
mkpath "rooms/";
use JSON;
use LWP::Simple qw(mirror);
open FILE, 'rooms.json' or die "Could not open file inputfile: $!";
sysread(FILE, my $result, -s FILE);
close FILE or die "Could not close file: $!";
my $json = decode_json($result);
foreach $item ($json) {
my $id = $item->{room_key};
mirror "http://media1.clubpenguin.com/play/v2/content/global/rooms/$id.swf" => "rooms/$id.swf";
}

foreach my $item ... at line 16 should do the trick!
As its a hash-ref you have to loop over it this way:
...
foreach my $item (sort keys %$json) {
my $id = $json->{$item}->{room_key};
print $id . "\n";
#mirror "http://media1.clubpenguin.com/play/v2/content/global/rooms/$id.swf" => "rooms/$id.swf";
}

Get results to write to CSV using Perl

The following Perl script cureently reads in an html file and strips off what I don't need. It also opens up a csv document which is blank.
My problem being is I want to import the stripped down results into the CSV's 3 fields using Name as field 1, Lives in as field 2 and commented as field 3.
The results are getting displayed in the cmd prompt but not in the CSV.
use warnings;
use strict;
use DBI;
use HTML::TreeBuilder;
use Text::CSV;
open (FILE, 'file.htm');
open (F1, ">file.csv") || die "couldn't open the file!";
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 })
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'file.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2', 'field3');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s Field3: %s\n",
$l->{'field1'}, $l->{'field2'}, $1->{'field3'}
}
close $fh;
my $tree = HTML::TreeBuilder->new_from_content( do { local $/; <FILE> } );
for ( $tree->look_down( 'class' => 'postbody' ) ) {
my $location = $_->look_down
( 'class' => 'posthilit' )->as_trimmed_text;
my $comment = $_->look_down( 'class' => 'content' )->as_trimmed_text;
my $name = $_->look_down( '_tag' => 'h3' )->as_text;
$name =~ s/^Re:\s*//;
$name =~ s/\s*$location\s*$//;
print "Name: $name\nLives in: $location\nCommented: $comment\n";
}
An example of the html is:
<div class="postbody">
<h3><a href "foo">Re: John Smith <span class="posthilit">England</span></a></h3>
<div class="content">Is C# better than Visula Basic?</div>
</div>

You don't actually write anything to the CSV file. Firstly, it isn't clear why you open the file for writing and then later for reading. You then read from the (now empty) file. Then you read from the HTML, and display the contents you want.
Surely you will need to write to the CSV file somewhere if you want data to appear in it!
Also, it's best to avoid barewords for file handles if you want to use them through Text::CSV.
Maybe you need something like:
my $csv = Text::CSV->new();
$csv->column_names('field1', 'field2', 'field3');
open $fh, ">", "file.csv" or die "new.csv: $!";
...
# As you handle the HTML
$csv->print ($fh, [$name, $location, $comment]);
...
close $fh or die "$!";

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How do I read a file's contents into a Perl scalar? - html

Use Path::Class::File::slurp if you want to read all file contents in one go. However, more importantly, use an HTML parser to parse HTML.

if you want to get the content of the file, #lines = <FILE>;

Use File::Slurp::Tiny. As convenient as File::Slurp, but without the bugs.

Related

write into a csv file in multiple cells

Compare 2 CSV Huge CSV Files and print the differences to another csv file using perl

Perl file I/O issue

Error use of uninitialized value

Get results to write to CSV using Perl

Categories

Resources