my input file date format changes on daily basis

my input file date format changes on daily basis - html

I need help please, my $inputfile changes on daily basis which gets generated and store under /tmp directory. File format date as follows.
/tmp
570572 Sep 13 21:02 sessions_record_2021-09-13_210052.csv
570788 Sep 14 09:01 sessions_record_2021-09-14_090041.csv
I'm not sure how to pick it up as an input file instead of hardcoded in my script
#!/usr/bin/perl
use strict; use warnings;
use Tie::Array::CSV;
use Data::Dumper;
use Date::Parse;
use POSIX qw(strftime);
my $hours = 1;
my $timenow = time;
my $inputfile = "sessions_record_2021-09-14_090041.csv";
tie my #sessions_record, 'Tie::Array::CSV', $inputfile, {
tie_file => { recsep => "\r\n" },
text_csv => { binary => 1 }
};
tie my #incidentidlist, 'Tie::Array::CSV', 'incidentidlist.csv';
#incidentidlist = map {
([$$_[4] =~ /\A([^\s]+)/, $$_[4], $$_[18], ($timenow -
str2time($$_[18])) / 60 / 60])
} grep {
$$_[0] =~ /^ServiceINC/ && ($timenow - str2time($$_[18])) / 60 / 60 > $hours
} #sessions_record;

Perl sort function on glob will produce sorted array and you interested in last element which can be addressed with index -1.
use strict;
use warnings;
use feature 'say';
my $in_file = (sort glob('/tmp/sessions_record_*.csv'))[-1];
say $in_file;
If you interested in today's file localtime can be an assistance to form a filename $fname.
use strict;
use warnings;
use feature 'say';
my($mask,$fname);
my($mday,$mon,$year) = (localtime)[3..5];
$year += 1900;
$mon += 1;
$mask = sprintf('/tmp/sessions_record_%4d-%02d-%02d_*.csv', $year, $mon, $mday);
$fname = (glob($mask))[0];
say 'File: ' . $fname;
say '-' x 45;
open my $fh, '<', $fname
or die "Couldn't open $fname";
print while <$fh>;
close $fh;

You can use opendir to open a directory and readdir to read it. For each file accessed you can check if it has the correct format (as per simbabque's comment) and add it to an array.
Then you can sort your array.
Due to the naming convention the latest file will always sort as the 'largest' value in your sort.
You can red more about sorting (if you need to) at https://www.perltutorial.org/perl-sort/

Related

Perl CSV Parsing with Header Access - header on row 3 and only some rows instead of all rows

I am trying to parse a given CSV File, stream on regular base.
My requirement is to Access the Data via ColumName (Header).
The ColumNames are not given in row 1. The ColumNames are given in row 2.
The CSV does have 100 rows but I only need 2 data rows to import.
The separator is a tab.
The following script works for header at row 1 and for all rows in the file
I failed to modify it to header at row 2 and to use only 2 rows or a number of rows.
script:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
my $fh = Tie::Handle::CSV->new ($file, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($file, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $myheader;
while (my $csv_line = <$fh>)
{
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}
The Data.csv could look like:
This is a silly sentence on the first line
Name GivenName Birthdate Number
Meier hans 18.03.1999 1
Frank Thomas 27.1.1974 2
Karl Franz 1.1.2000 3
Here could be something silly again
Thanks for any hint.
best regards

Use Text::CSV_XS instead of Tie::Handle::CSV (Which depends on the module so you have it installed already), read and throw away the first line, use the second line to set column names, and then read the rest of the data:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ sep => ",", # Using CSV because TSV doesn't play well with SO formatting
binary => 1});
# Read and discard the first line
$_ = <DATA>;
# Use the next line as the header and set column names
$csv->column_names($csv->getline(*DATA));
# Read some rows and access columns by name instead of position
my $nr = 0;
while (my $record = $csv->getline_hr(*DATA)) {
last if ++$nr == 4;
say "Row $nr: $record->{GivenName} was born on $record->{Birthdate}";
}
__DATA__
This is a silly sentence on the first line
Name,GivenName,Birthdate,Number
Meier,hans,18.03.1999,1
Frank,Thomas,27.1.1974,2
Karl,Franz,1.1.2000,3
Here could be something silly again

Tie::Handle::CSV accepts a filehandle instead of a filename. You can skip the first line by reading one line from it before you pass the filehandle to Tie::Handle::CSV:
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
my $hfh = Tie::Handle::CSV->new ($infile, header => 1, sep_char => "\t");
my #csv;
my $num_lines = 3;
while ($num_lines--){
my $line = <$hfh>;
push #csv, $line;
}
print Dumper \#csv;

thanks to you both.
To clarify more detail my requirements.
The original Data File does have maybe 100 Colums with dynamic unknown Names for me.
I will create a list of Colums/Attribute from a other Service for which this script should provide the data content of some rows.
Request is in Terms of the data example:
Please provide all Names and all Birthdates of the first 25 Rows.
The next Request could be all Names and Givennames of the first 10 rows.
That means from the content of 100 Columns I have to provide the content for two, four, five Columns only.
The output I use (foreach), is only to test the Access by ColumName to the content of rows.
I mixed up your solution and stayed with Tie::Handle::CSV.
At the moment I have to use the two filehandles- Maybe you have a hint to be more effective.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($infile, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $numberoflines = 2 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}

thanks got it running with "keys %$csv_line". I was not using because of missing knowlegde. ;-)
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
my $file = "data.csv";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $numberoflines = 3 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
my #Columns = keys %{ $csv_line } ;
foreach (#Columns )
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
print "-----------\n"
}
On last question:
The File I Read will be filled and modified by an other program.
What can I do to detect the File violation in case it makes a problem.
And I dont what the my script dies.
Thanks
regards

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

I am getting a "csv file" from a vendor (using their API), but what they do is just spew the whole thing into their response. It wouldn't be a significant problem except that, of course, some of those pesky humans entered the data and put in "features" like line breaks. What I am doing now is creating a file for the raw data and then reopening it to read the data:
open RAW, ">", "$rawfile" or die "ERROR: Could not open $rawfile for write: $! \n";
print RAW $response->content;
close RAW;
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
open my $fh, "<", "$rawfile" or die "ERROR: Could not open $rawfile for read: $! \n";
while ( $line = $csv->getline ($fh) ) { ...
Somehow this seems ... inelegant. It seems that I ought to be able to just read the data from the $response->content (multiline string) as if it were a file. But I'm drawing a total blank on how do this.
A pointer would be greatly appreciated.
Thanks,
Paul

You could use a string filehandle:
my $data = $response->content;
open my $fh, "<", \$data or croak "unable to open string filehandle : $!";
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
while ( $line = $csv->getline ($fh) ) { ... }

Yes, you can use Text::CSV_XS on a string, via its functional interface
use warnings;
use strict;
use feature 'say';
use Text::CSV_XS qw(csv); # must use _XS version
my $csv = qq(a,line\nand,another);
my $aoa = csv(in => \$csv)
or die Text::CSV->error_diag;
say "#$_" for #aoa;
Note that this indeed needs Text::CSV_XS (normally Text::CSV works but not with this).
I don't know why this isn't available in the OO interface (or perhaps is but is not documented).
While the above parses the string directly as asked, one can also lessen the "inelegant" aspect in your example by writing content directly to a file as it's acquired, what most libraries support like with :content_file option in LWP::UserAgent::get method.
Let me also note that most of the time you want the library to decode content, so for LWP::UA to use decoded_content (see HTTP::Response).

I cooked up this example with Mojo::UserAgent. For the CSV input I used various data sets from the NYC Open Data. This is also going to appear in the next update for Mojo Web Clients.
I build the request without making the request right away, and that gives me the transaction object, $tx. I can then replace the read event so I can immediately send the lines into Text::CSV_XS:
#!perl
use v5.10;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content->unsubscribe('read')->on(read => sub {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new;
};
state $buffer;
state $reader = do {
open my $r, '<:encoding(UTF-8)', \$buffer;
$r;
};
my ($content, $bytes) = #_;
$buffer .= $bytes;
while (my $row = $csv->getline($reader) ) {
say join ':', $row->#[2,4];
}
});
$tx = $ua->start($tx);
That's not as nice as I'd like it to be because all the data still show up in the buffer. This is slightly more appealing, but it's fragile in the ways I note in the comments. I'm too lazy at the moment to make it any better because that gets hairy very quickly as you figure out when you have enough data to process a record. My particular code isn't as important as the idea that you can do whatever you like as the transactor reads data and passes it into the content handler:
use v5.10;
use strict;
use warnings;
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content
->unsubscribe('read')
->on( read => process_bytes_factory() );
$tx = $ua->start($tx);
sub process_bytes_factory {
return sub ( $content, $bytes ) {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new( { decode_utf8 => 1 } );
};
state $buffer = '';
state $line_no = 0;
$buffer .= $bytes;
# fragile if the entire content does not end in a
# newline (or whatever the line ending is)
my $last_line_incomplete = $buffer !~ /\n\z/;
# will not work if the format allows embedded newlines
my #lines = split /\n/, $buffer;
$buffer = pop #lines if $last_line_incomplete;
foreach my $line ( #lines ) {
my $status = $csv->parse($line);
my #row = $csv->fields;
say join ':', $line_no++, #row[2,4];
}
};
}

Escape special characters in JSON string

I have Perl script which contains variable $env->{'arguments'}, this variable should contain a JSON object and I want to pass that JSON object as argument to my other external script and run it using backticks.
Value of $env->{'arguments'} before escaping:
$VAR1 = '{"text":"This is from module and backslash \\ should work too"}';
Value of $env->{'arguments'} after escaping:
$VAR1 = '"{\\"text\\":\\"This is from module and backslash \\ should work too\\"}"';
Code:
print Dumper($env->{'arguments'});
escapeCharacters(\$env->{'arguments'});
print Dumper($env->{'arguments'});
my $command = './script.pl '.$env->{'arguments'}.'';
my $output = `$command`;
Escape characters function:
sub escapeCharacters
{
#$env->{'arguments'} =~ s/\\/\\\\"/g;
$env->{'arguments'} =~ s/"/\\"/g;
$env->{'arguments'} = '"'.$env->{'arguments'}.'"';
}
I would like to ask you what is correct way and how to parse that JSON string into valid JSON string which I can use as argument for my script.

You're reinventing a wheel.
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote('./script.pl', $env->{arguments});
my $output = `$cmd`;
Alternatively, there's a number of IPC:: modules you could use instead of qx. For example,
use IPC::System::Simple qw( capturex );
my $output = capturex('./script.pl', $env->{arguments});
Because you have at least one argument, you could also use the following:
my $output = '';
open(my $pipe, '-|', './script.pl', $env->{arguments});
while (<$pipe>) {
$output .= $_;
}
close($pipe);
Note that current directory isn't necessarily the directory that contains the script that executing. If you want to executing script.pl that's in the same directory as the currently executing script, you want the following changes:
Add
use FindBin qw( $RealBin );
and replace
'./script.pl'
with
"$RealBin/script.pl"

Piping it to your second program rather than passing it as an argument seems like it would make more sense (and be a lot safer).
test1.pl
#!/usr/bin/perl
use strict;
use JSON;
use Data::Dumper;
undef $/;
my $data = decode_json(<>);
print Dumper($data);
test2.pl
#!/usr/bin/perl
use strict;
use IPC::Open2;
use JSON;
my %data = ('text' => "this has a \\backslash", 'nums' => [0,1,2]);
my $json = JSON->new->encode(\%data);
my ($chld_out, $chld_in);
print("Executing script\n");
my $pid = open2($chld_out, $chld_in, "./test1.pl");
print $chld_in "$json\n";
close($chld_in);
my $out = do {local $/; <$chld_out>};
waitpid $pid, 0;
print(qq~test1.pl output =($out)~);

Perl converting time based on text match

I am using Perl to read in variables from a json file and handle them accordingly. The spot I need help with is when I read a time in from the file that could look like the following:
"StartTime":"2015-07-08T03:38:08Z",
"EndTime":"2015-07-10T03:38:08Z"
This is easy to handle, however here is the tricky part:
"StartTime":"now-10",
"EndTime":"now+10"
I have a function which gets these variables from the json file and checks if the string contains the word "now". But after that, I'm not sure what to do. I'm trying to convert "now" to localtime(time), but it's getting ugly fast. Here is my code:
my $_StartTime = getFromJson("StartTime");
my $_EndTime = getFromJson("EndTime");
if($_StartTime =~ /now/) {
(my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday, my $yday, my $isdst) = localtime(time);
my $now = sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year+1900, $mon+1, $mday, $hour, $min, $sec);
}
# end time is handled the same way
Am I on the right track? And if so, how can I add the "+/-10" after the "now" in the file? (Note: assume the +/-10 always refers to hours)

There are lots of good modules on the CPAN that could help in this instance. You don't need to use them but it's worth knowing about them nonetheless.
Firstly, JSON might make your life easier when parsing the JSON files as it has easy methods for converting the JSON into native Perl structures.
Secondly, the DateTime family of modules might make it easier to parse and manipulate the dates. Specifically, instead of using sprintf, you could use DateTime::Format::ISO8601 to parse the date:
my $dt = DateTime::Format::ISO8601->parse_datetime( $_StartTime );
DateTime has methods for accessing the day, year, month and so on. These are documented on the main module page.
You could then keep your special case for the now input and do something like:
# work out if it's addition or subtraction and grab the amount
# then use the appropriate DateTime function:
my $dt = DateTime->now()->add( seconds => 10 );
# or
my $dt = DateTime->now()->subtract( seconds => 10 );

Using POSIX::strftime will make your life easier.
use POSIX 'strftime';
my #test_times = qw[now+10 now now-10];
foreach my $start_time (#test_times) {
if (my ($adjust) = $start_time =~ /^now([-+]\d+)?/) {
$adjust //= 0;
$adjust *= 60 * 60; # Convert hours to seconds
my $time = strftime '%Y-%m-%dT%H:%M:%SZ', gmtime(time + $adjust);
say $time;
}
}
Thinking about it further, I think I'd prefer to use Time::Piece. The principle is almost identical.
use Time::Piece;
my #test_times = qw[now+10 now now-10];
foreach my $start_time (#test_times) {
if (my ($adjust) = $start_time =~ /^now([-+]\d+)?/) {
$adjust //= 0;
$adjust *= 60 * 60; # Convert hours to seconds
my $time = gmtime(time + $adjust);
say $time->strftime('%Y-%m-%dT%H:%M:%SZ');
}
}

I would change this to:
my $_StartTime = getFromJson("StartTime");
my $_EndTime = getFromJson("EndTime");
if($_StartTime =~ s/now//) {
my $time = time;
if ($_StartTime =~ /^([-+]?)([0-9]+)/) {
my ($sign, $number) = ($1, $2);
$time += ($sign eq '-' ? -1 : 1) * $number * 3_600;
}
(my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday, my $yday, my $isdst) = localtime($rime);
$_StartTime = sprintf("%04d-%02d-%02dT%02d:%02d:%02dZ", $year+1900, $mon+1, $mday, $hour, $min, $sec);
}

You give little information about the format of the original data, and what result you want from this. I assume the code you show is to convert the times formatted with now to one that you recognize so that you can go on from there. But it's best to handle both formats in one place to generate the same final result regardless of the input
This program uses an imaginary JSON data structure and processes all elements inside it. The core is the use of the Time::Piece module, which will parse and format times for you and do date/time arithmetic
I have encapsulated the code that processes both sorts of time values in a subroutine convert_time which returns a Time::Piece object. The code just uses the module's own stringify method to make the value readable, but you can generate any form of string you want using the object's methods
use strict;
use warnings 'all';
use feature 'say';
use JSON 'from_json';
use Time::Piece;
use Time::Seconds 'ONE_HOUR';
my $json = <<END;
[
{
"StartTime": "2015-07-08T03:38:08Z",
"EndTime": "2015-07-10T03:38:08Z"
},
{
"StartTime": "now-10",
"EndTime": "now+10"
}
]
END
my $data = from_json($json);
for my $item ( #$data ) {
for my $key ( keys %$item ) {
my $time = $item->{$key};
say "$key $time";
my $ans = convert_time($time);
print $ans, "\n\n";
}
}
sub convert_time {
my ($time) = #_;
if ( $time =~ /now([+-]\d+)/ ) {
return localtime() + $1 * ONE_HOUR;
}
else {
return Time::Piece->strptime($time, '%Y-%m-%dT%H:%M:%SZ');
}
}
output
StartTime 2015-07-08T03:38:08Z
Wed Jul 8 03:38:08 2015
EndTime 2015-07-10T03:38:08Z
Fri Jul 10 03:38:08 2015
StartTime now-10
Wed Jan 6 05:57:04 2016
EndTime now+10
Thu Jan 7 01:57:04 2016

LWP, HTML::TableExtract and the output with Text::CSV - how to add attributes here?

I have a little parser that parses a site - with 6150 records. But I need to have this in a CSV-format.
First of all see here the target site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750
I need all the data - with separation in the filed of
number
schoolnumber
school-name
Adress
Street
Postal Code
phone
fax
School-type
website
Well - I have a script: I am very interested what you think about this. Not all the fields are gained yet - I need more of them!
#!/usr/bin/perl
use strict;
use HTML::TableExtract;
use LWP::Simple;
use Cwd;
use POSIX qw(strftime);
my $total_records = 0;
my $alpha = "x";
my $results = 50;
my $range = 0;
my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
my $processdir = "processing";
my $counter = 50;
my $percent = 0;
workDir();
chdir $processdir;
processURL();
print "\nPress <enter> to continue\n";
<>;
my $displaydate = strftime('%Y%m%d%H%M%S', localtime);
open my $outfile, '>', "webdata_for_$alpha\_$displaydate.txt" or die 'Unable to create file';
processData();
close $outfile;
print "Finished processing $total_records records...\n";
print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$alpha\_$displaydate.txt\n";
unlink 'processing.html';
sub processURL() {
print "\nProcessing $url_to_process$alpha&a=$results&s=$range\n";
getstore("$url_to_process$alpha&a=$results&s=$range", 'tempfile.html') or die 'Unable to get page';
while( <tempfile.html> ) {
open( FH, "$_" ) or die;
while( <FH> ) {
if( $_ =~ /^.*?(Treffer \<b\>)(\d+)( - )(\d+)(<\/b> \w+ \w+ \<b\>)(\d+).*/ ) {
$total_records = $6;
print "Total records to process is $total_records\n";
}
}
close FH;
}
unlink 'tempfile.html';
}
sub processData() {
while ( $range <= $total_records) {
my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]);
getstore("$url_to_process$alpha&a=$results&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
cleanup(#$row);
# Add a table column delimiter in this case ||
print $outfile join("||", #$row)."\n";
}
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
}
}
sub cleanup() {
for ( #_ ) {
s/\s+/ /g;
}
}
sub workDir() {
# Use home directory to process data
chdir or die "$!";
if ( ! -d $processdir ) {
mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
}
}
with the following output:
1||9752||Deutsche Schule Alamogordo USA Alamogorde - New Mexico || ||Deutschsprachige Auslandsschule||
2||9931||Deutsche Schule der Borromäerinnen Alexandrien ET Alexandrien - Ägypten || ||Begegnungsschule (Auslandsschuldienst)||
3||1940||Max-Keller-Schule, Berufsfachschule f.Musik Alt- ötting d.Berufsfachschule für Musik Altötting e.V. Kapellplatz 36 84503 Altötting ||08671/1735 08671/84363||Berufsfachschulen f. Musik|| www.max-keller-schule.de
4||0006||Max-Reger-Gymnasium Amberg Kaiser-Wilhelm-Ring 7 92224 Amberg ||09621/4718-0 09621/4718-47||Gymnasien|| www.mrg-amberg.de
With the || being the delimiter.
My problem is that I need to have more fields - I need to have the following divided - see an example:
name: Volksschule Abenberg (Grundschule)
street: Güssübelstr. 2
postal-code and town: 91183 Abenberg
fax and telephone: 09178/215 09178/905060
type of school: Volksschulen
website: home.t-online.de/home/vs-abenberg
How to add more fields? This obviously has to be done in this line here, doesn't it!?
my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]);
But how? I tried out several things, but I always got bad results.
I played around - and tried another solution - but here I have good CSV-data - but unfortunatly no spider logic...
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;
my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d; # strip the carriage returns
$html =~ s/ / /g; # expand the spaces
my $te = new HTML::TableExtract();
$te->parse($html);
my #cols = qw(
rownum
number
name
phone
type
website
);
my #fields = qw(
rownum
number
name
street
postal
town
phone
fax
type
website
);
my $csv = Text::CSV->new({ binary => 1 });
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
# trim leading/trailing whitespace from base fields
s/^s+//, s/\s+$// for #$row;
# load the fields into the hash using a "hash slice"
my %h;
#h{#cols} = #$row;
# derive some fields from base fields, again using a hash slice
#h{qw/name street postal town/} = split /n+/, $h{name};
#h{qw/phone fax/} = split /n+/, $h{phone};
# trim leading/trailing whitespace from derived fields
s/^s+//, s/\s+$// for #h{qw/name street postal town/};
$csv->combine(#h{#fields});
print $csv->string, "\n";
}
}
Well - with this I tried another solution - but here I have good CSV-data - but unfortunately no spider logic.
How to add the spider-logic here!?
Well I need some help - either in the first or in the second script!

The website uses br tags to separate the sub-fields within each cell, very much like you want to divide the data. HTML::TableExtract turns these into newlines by default In your first program, but your cleanup routine throws this information away.
In your first program, add something like s/\n/||/sg; (assuming the same separator) before you flatten the rest of the whitespace.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

my input file date format changes on daily basis - html

Related

Perl CSV Parsing with Header Access - header on row 3 and only some rows instead of all rows

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

Escape special characters in JSON string

Perl converting time based on text match

LWP, HTML::TableExtract and the output with Text::CSV - how to add attributes here?

Categories

Resources