Edit: modified code and output to make it more clear
Edit 2: Added example input for reproduction
I have a JSON file and a CSV file and I am running comparisons on the two. The problem is that $asset_ip is correctly defined in the outer foreach loop, but when in the nested loop $asset_ip becomes undefined.
Why is $asset_ip becoming undefined?
#!/usr/bin/perl
# perl -e'use CPAN; install "Text::CSV"'
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
use Text::CSV;
my $csv = Text::CSV->new( { sep_char => ',' } );
my $csv_source = "servers.csv";
my $json_source = "assets.json";
my $dest = "servers_for_upload.csv";
# defined these here as I need to use them in foreach loop and if statement:
my $csv_ip;
my #fields;
open( my $csv_fh, '<', $csv_source ) or die "$! error trying to read";
open( my $dest_fh, '>', $dest ) or die "$! error trying to read";
my $json = read_file($json_source);
my $json_array = decode_json $json;
foreach my $item (#$json_array) {
my $id = $item->{id};
my $asset_ip = $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
# test the data is there:
if ( defined $asset_ip ) {
print "id: " . $id . "\nip: " . $asset_ip . "\n";
}
while (my $line = <$csv_fh>) {
chomp $line;
if ( $csv->parse($line) ) {
#fields = $csv->fields();
$csv_ip = $fields[0];
}
else {
warn "Line could not be parsed: $line\n";
}
if ( $csv_ip eq $asset_ip ) {
# preppend id to csv array and write these lines to new file
unshift( #fields, $id );
print $dest_fh join( ", ", #fields );
}
}
}
close $csv_fh;
Output:
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 1.
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 2.
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 3.
id: 1003
ip: 192.168.0.2
id: 1004
ip: 192.168.0.3
id: 1005
ip: 192.168.0.4
assets.json:
[{"id":1001,"interfaces":[]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]},{"id":1004,"interfaces":[{"ip_addresses":[{"value":"192.168.0.3"}]}]},{"id":1005,"interfaces":[{"ip_addresses":[{"value":"192.168.0.4"}]}]}]
Note, that for the first iteration, $asset_ip will be undefined. I will therefore alter the code to only run the eq comparison if $asset_ip is defined. However, for this example I am not doing the check because all iterations are undefined.
servers.csv:
192.168.0.3,Brian,Germany
192.168.0.4,Billy,UK
192.168.0.5,Ben,UK
I think your problem will be this:
foreach my $line (<$csv_fh>) {
You execute this within our outer loop. But when you do this, your $csv_fh ends up at the end of file.
Once you have done this, subsequent iterations of your outer loop will not execute this inner loop, because there's nothing left for it to read from $csv_fh.
An easy test if this is your problem is to add a seek e.g. seek ( $csv_fh, 0, 0 );.
But this isn't an efficient thing to do, because then you'll be looping through the file multiple times - you should instead read it into a data structure and use that.
Edit: Here is your problem:
[{"id":1001,"interfaces":[]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]},{"id":1004,"interfaces":[{"ip_addresses":[{"value":"192.168.0.3"}]}]},{"id":1005,"interfaces":[{"ip_addresses":[{"value":"192.168.0.4"}]}]}]
And specifically:
[{"id":1001,"interfaces":[]}
Your first element in that array doesn't have a $asset_ip defined.
This means - on your first pass - $asset_ip is undefined and generates the errors. (no line is printed because of your if defined test).
But then - the code proceeds to traverse $csv_fh - reading to the end of file - looking for matches (and fails 3 times, generating 3 error messages.
Second iteration - for id 1002 - the IP isn't in the file anyway, but $csv_fh has already been read to end-of-file (EOF) - so that foreach loop doesn't execute at all.
This can be made workable by:
adding else next; after that if defined.
adding seek to after the while loop.
But really - a rewrite would be in order so you're not re-reading a file over and over anyway.
Very crudely:
#!/usr/bin/perl
# perl -e'use CPAN; install "Text::CSV"'
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
use Text::CSV;
my $csv = Text::CSV->new( { sep_char => ',' } );
my $csv_source = "servers.csv";
my $json_source = "assets.json";
my $dest = "servers_for_upload.csv";
# defined these here as I need to use them in foreach loop and if statement:
my $csv_ip;
my #fields;
open( my $csv_fh, '<', $csv_source ) or die "$! error trying to read";
open( my $dest_fh, '>', $dest ) or die "$! error trying to read";
my $json = read_file($json_source);
my $json_array = decode_json $json;
foreach my $item (#$json_array) {
my $id = $item->{id};
my $asset_ip = $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
# test the data is there:
if ( defined $asset_ip ) {
print "id: " . $id . "\nip: " . $asset_ip . "\n";
}
else {
print "asset_ip undefined for id $id\n";
next;
}
while ( my $line = <$csv_fh> ) {
chomp $line;
if ( $csv->parse($line) ) {
#fields = $csv->fields();
$csv_ip = $fields[0];
}
else {
warn "Line could not be parsed: $line\n";
}
if ( $csv_ip eq $asset_ip ) {
# preppend id to csv array and write these lines to new file
unshift( #fields, $id );
print {$dest_fh} join( ", ", #fields ),"\n";
}
}
seek( $csv_fh, 0, 0 );
}
close $csv_fh;
I would suggest this also needs:
change of while so you're not re-reading the file each time
You're using Text::CSV so using a print join ( ","... doesn't seem a consistent choice. If your data warrants Text::CSV it's worth keeping it for output too.
Related
I am trying to parse a given CSV File, stream on regular base.
My requirement is to Access the Data via ColumName (Header).
The ColumNames are not given in row 1. The ColumNames are given in row 2.
The CSV does have 100 rows but I only need 2 data rows to import.
The separator is a tab.
The following script works for header at row 1 and for all rows in the file
I failed to modify it to header at row 2 and to use only 2 rows or a number of rows.
script:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
my $fh = Tie::Handle::CSV->new ($file, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($file, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $myheader;
while (my $csv_line = <$fh>)
{
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}
The Data.csv could look like:
This is a silly sentence on the first line
Name GivenName Birthdate Number
Meier hans 18.03.1999 1
Frank Thomas 27.1.1974 2
Karl Franz 1.1.2000 3
Here could be something silly again
Thanks for any hint.
best regards
Use Text::CSV_XS instead of Tie::Handle::CSV (Which depends on the module so you have it installed already), read and throw away the first line, use the second line to set column names, and then read the rest of the data:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ sep => ",", # Using CSV because TSV doesn't play well with SO formatting
binary => 1});
# Read and discard the first line
$_ = <DATA>;
# Use the next line as the header and set column names
$csv->column_names($csv->getline(*DATA));
# Read some rows and access columns by name instead of position
my $nr = 0;
while (my $record = $csv->getline_hr(*DATA)) {
last if ++$nr == 4;
say "Row $nr: $record->{GivenName} was born on $record->{Birthdate}";
}
__DATA__
This is a silly sentence on the first line
Name,GivenName,Birthdate,Number
Meier,hans,18.03.1999,1
Frank,Thomas,27.1.1974,2
Karl,Franz,1.1.2000,3
Here could be something silly again
Tie::Handle::CSV accepts a filehandle instead of a filename. You can skip the first line by reading one line from it before you pass the filehandle to Tie::Handle::CSV:
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
my $hfh = Tie::Handle::CSV->new ($infile, header => 1, sep_char => "\t");
my #csv;
my $num_lines = 3;
while ($num_lines--){
my $line = <$hfh>;
push #csv, $line;
}
print Dumper \#csv;
thanks to you both.
To clarify more detail my requirements.
The original Data File does have maybe 100 Colums with dynamic unknown Names for me.
I will create a list of Colums/Attribute from a other Service for which this script should provide the data content of some rows.
Request is in Terms of the data example:
Please provide all Names and all Birthdates of the first 25 Rows.
The next Request could be all Names and Givennames of the first 10 rows.
That means from the content of 100 Columns I have to provide the content for two, four, five Columns only.
The output I use (foreach), is only to test the Access by ColumName to the content of rows.
I mixed up your solution and stayed with Tie::Handle::CSV.
At the moment I have to use the two filehandles- Maybe you have a hint to be more effective.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
use Data::Dumper;
my $file = "data.csv";
open (my $infile, '<',$file) or die "can't open file $file: $!\n";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$infile>; # skip first line
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $hfh = Tie::Handle::CSV->new ($infile, header => 0, sep_char => "\t");
my $line = <$hfh>;
my $numberoflines = 2 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
foreach(#{$line})
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
}
thanks got it running with "keys %$csv_line". I was not using because of missing knowlegde. ;-)
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
my $file = "data.csv";
open (my $secfile, '<',$file) or die "can't open file $file: $!\n";
<$secfile>;
my $fh = Tie::Handle::CSV->new ($secfile, header => 1, sep_char => "\t");
my $numberoflines = 3 ;
while ($numberoflines-- )
{
my $csv_line = <$fh> ;
my #Columns = keys %{ $csv_line } ;
foreach (#Columns )
{
if ( $_ ne "" )
{
print $_ . "=" . $csv_line->{$_} . "\n" ;
}
}
print "-----------\n"
}
On last question:
The File I Read will be filled and modified by an other program.
What can I do to detect the File violation in case it makes a problem.
And I dont what the my script dies.
Thanks
regards
The following codes check for Duplicates in CSV file where TO Column is “USD”. I need your help to figure out how do I compare the resulted duplicate value, if the duplicate value has same value like in the below case, Perl should not give any warning, if the value is same. Perl file name is Source, just change the directory and run it.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils qw/ uniq /;
my %seen = ();
my #uniq = ();
my %uniq;
my %data;
my %dupes;
my #rows;
my $csv = Text::CSV->new ()
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'D:\Longview\ENCDEVD740\DataServers\ENCDEVD740\lvaf\inbound\data\enc_meroll_fxrate_soa_load.csv' or die "Cannot use CSV: $!";
while ( my $row = $csv->getline( $fh ) ) {
# insert row into row list
push #rows, $row;
# join the unique keys with the
# perl 'multidimensional array emulation'
# subscript character
my $key = join( $;, #{$row}[0,1] );
# if it was just one field, just use
# my $key = $row->[$keyfieldindex];
# if you were checking for full line duplicates (header lines):
# my $key = join($;, #$row);
# if %data has an entry for the record, add it to dupes
#print "#{$row}\n ";
if (exists $data{$key}) { # duplicate
# if it isn't already duplicated
# add this row and the original
if (not exists $dupes{$key}) {
push #{$dupes{$key}}, $data{$key};
}
# add the duplicate row
push #{$dupes{$key}}, $row;
} else {
$data{ $key } = $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
# print out duplicates:
warn "Duplicate Values:\n";
warn "-----------------\n";
foreach my $key (keys %dupes) {
my #keys = split($;, $key);
if (($keys[1] ne 'USD') or ($keys[0] eq 'FROMCURRENCY')){
#print "Rejecting record since duplicate records are for Outofscope currencies\n";
#print "\$keys[0] = $keys[0]\n";
#print "\$keys[1] = $keys[1]\n";
next;
}
else {
print "Key: #keys\n";
foreach my $dupe (#{$dupes{$key}}) {
print "\tData: #$dupe\n";
}
}
}
Source - CSV File
Query
CSV File
Sample data:
FROMCURRENCY,TOCURRENCY,RATE
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
AUD,USD,0.68635
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
Different Values for duplicates
Like #Håkon wrote it seems like all your duplicates are in fact the same rate so they should not be considered duplicates. However, it could be an idea to store the rate in a hash mapped to each from and to currency. That way you don't need to check for duplicates every iteration and can rely on the uniqueness of the hash.
It's great that you use proper CSV parsers but here's an example using a single hash to keep track of duplicates by just splitting by , since the data seems reliable.
#!/usr/bin/env perl
use warnings;
use strict;
my $result = {};
my $format = "%-4s | %-4s | %s\n";
while ( my $line = <DATA> ) {
chomp $line;
my ( $from, $to, $rate ) = split( /,/x, $line );
$result->{$from}->{$to}->{$rate} = 1;
}
printf( $format, "FROM", "TO", "RATES" );
printf( "%s\n", "-" x 40 );
foreach my $from ( keys %$result ) {
foreach my $to ( keys %{ $result->{$from} } ) {
my #rates = keys %{ $result->{$from}->{$to} };
next if #rates < 2;
printf( $format, $from, $to, join( ", ", #rates ) );
}
}
__DATA__
AED,USD,0.272257011
ANG,USD,0.557584545
ANG,USD,1.557584545
ARS,USD,0.01421147
ARS,USD,0.01421147
ARS,USD,0.01421147
AUD,USD,0.68635
AUD,USD,1.68635
AUD,USD,2.68635
I change the test data to contain duplicates with the same rate and with different rates and the result would print.
FROM | TO | RATES
----------------------------------------
ANG | USD | 1.557584545, 0.557584545
AUD | USD | 1.68635, 0.68635, 2.68635
I'm trying to write a program to fetch a big MySQL table, rename some fields and write it to JSON. Here is what I have for now:
use strict;
use JSON;
use DBI;
# here goes some statement preparations and db initialization
my $rowcache;
my $max_rows = 1000;
my $LIMIT_PER_FILE = 100000;
while ( my $res = shift( #$rowcache )
|| shift( #{ $rowcache = $sth->fetchall_arrayref( undef, $max_rows ) } ) ) {
if ( $cnt % $LIMIT_PER_FILE == 0 ) {
if ( $f ) {
print "CLOSE $fname\n";
close $f;
}
$filenum++;
$fname = "$BASEDIR/export-$filenum.json";
print "OPEN $fname\n";
open $f, ">$fname";
}
$res->{some_field} = $res->{another_field}
delete $res->{another_field}
print $f $json->encode( $res ) . "\n";
$cnt++;
}
I used the database row caching technique from
Speeding up the DBI
and everything seems good.
The only problem I have for now is that on $res->{some_field} = $res->{another_field}, the row interpreter complains and says that $res is Not a HASH reference.
Please could anybody point me to my mistakes?
If you want fetchall_arrayref to return an array of hashrefs, the first parameter should be a hashref. Otherwise, an array of arrayrefs is returned resulting in the "Not a HASH reference" error. So in order to return full rows as hashref, simply pass an empty hash:
$rowcache = $sth->fetchall_arrayref({}, $max_rows)
I have a little parser that parses a site - with 6150 records. But I need to have this in a CSV-format.
First of all see here the target site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750
I need all the data - with separation in the filed of
number
schoolnumber
school-name
Adress
Street
Postal Code
phone
fax
School-type
website
Well - I have a script: I am very interested what you think about this. Not all the fields are gained yet - I need more of them!
#!/usr/bin/perl
use strict;
use HTML::TableExtract;
use LWP::Simple;
use Cwd;
use POSIX qw(strftime);
my $total_records = 0;
my $alpha = "x";
my $results = 50;
my $range = 0;
my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
my $processdir = "processing";
my $counter = 50;
my $percent = 0;
workDir();
chdir $processdir;
processURL();
print "\nPress <enter> to continue\n";
<>;
my $displaydate = strftime('%Y%m%d%H%M%S', localtime);
open my $outfile, '>', "webdata_for_$alpha\_$displaydate.txt" or die 'Unable to create file';
processData();
close $outfile;
print "Finished processing $total_records records...\n";
print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$alpha\_$displaydate.txt\n";
unlink 'processing.html';
sub processURL() {
print "\nProcessing $url_to_process$alpha&a=$results&s=$range\n";
getstore("$url_to_process$alpha&a=$results&s=$range", 'tempfile.html') or die 'Unable to get page';
while( <tempfile.html> ) {
open( FH, "$_" ) or die;
while( <FH> ) {
if( $_ =~ /^.*?(Treffer \<b\>)(\d+)( - )(\d+)(<\/b> \w+ \w+ \<b\>)(\d+).*/ ) {
$total_records = $6;
print "Total records to process is $total_records\n";
}
}
close FH;
}
unlink 'tempfile.html';
}
sub processData() {
while ( $range <= $total_records) {
my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]);
getstore("$url_to_process$alpha&a=$results&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
cleanup(#$row);
# Add a table column delimiter in this case ||
print $outfile join("||", #$row)."\n";
}
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
}
}
sub cleanup() {
for ( #_ ) {
s/\s+/ /g;
}
}
sub workDir() {
# Use home directory to process data
chdir or die "$!";
if ( ! -d $processdir ) {
mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
}
}
with the following output:
1||9752||Deutsche Schule Alamogordo USA Alamogorde - New Mexico || ||Deutschsprachige Auslandsschule||
2||9931||Deutsche Schule der Borromäerinnen Alexandrien ET Alexandrien - Ägypten || ||Begegnungsschule (Auslandsschuldienst)||
3||1940||Max-Keller-Schule, Berufsfachschule f.Musik Alt- ötting d.Berufsfachschule für Musik Altötting e.V. Kapellplatz 36 84503 Altötting ||08671/1735 08671/84363||Berufsfachschulen f. Musik|| www.max-keller-schule.de
4||0006||Max-Reger-Gymnasium Amberg Kaiser-Wilhelm-Ring 7 92224 Amberg ||09621/4718-0 09621/4718-47||Gymnasien|| www.mrg-amberg.de
With the || being the delimiter.
My problem is that I need to have more fields - I need to have the following divided - see an example:
name: Volksschule Abenberg (Grundschule)
street: Güssübelstr. 2
postal-code and town: 91183 Abenberg
fax and telephone: 09178/215 09178/905060
type of school: Volksschulen
website: home.t-online.de/home/vs-abenberg
How to add more fields? This obviously has to be done in this line here, doesn't it!?
my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]);
But how? I tried out several things, but I always got bad results.
I played around - and tried another solution - but here I have good CSV-data - but unfortunatly no spider logic...
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;
my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d; # strip the carriage returns
$html =~ s/ / /g; # expand the spaces
my $te = new HTML::TableExtract();
$te->parse($html);
my #cols = qw(
rownum
number
name
phone
type
website
);
my #fields = qw(
rownum
number
name
street
postal
town
phone
fax
type
website
);
my $csv = Text::CSV->new({ binary => 1 });
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
# trim leading/trailing whitespace from base fields
s/^s+//, s/\s+$// for #$row;
# load the fields into the hash using a "hash slice"
my %h;
#h{#cols} = #$row;
# derive some fields from base fields, again using a hash slice
#h{qw/name street postal town/} = split /n+/, $h{name};
#h{qw/phone fax/} = split /n+/, $h{phone};
# trim leading/trailing whitespace from derived fields
s/^s+//, s/\s+$// for #h{qw/name street postal town/};
$csv->combine(#h{#fields});
print $csv->string, "\n";
}
}
Well - with this I tried another solution - but here I have good CSV-data - but unfortunately no spider logic.
How to add the spider-logic here!?
Well I need some help - either in the first or in the second script!
The website uses br tags to separate the sub-fields within each cell, very much like you want to divide the data. HTML::TableExtract turns these into newlines by default In your first program, but your cleanup routine throws this information away.
In your first program, add something like s/\n/||/sg; (assuming the same separator) before you flatten the rest of the whitespace.
I'm using Module::Build to perform build, test, testpod, html, & install actions on my Perl module that I'm developing. The HTML files that are generated are okay, but I'd be much happier if I could somehow configure Module::Build to use the perltidy -html formatting utility instead of its own HTML formatter.
Anyone know of a way I can replace the HTML formatter that comes with Module::Build with the prettier perltidy HTML formatter?
Addendum: When I said "replace" above, that was probably misleading. I don't really want to write code to replace the html formatter that comes with Module::Build. I really want to know if Module::Build has any other HTML formatter options. The HTML it generates is so plain and generic looking. It's so boring. I like perltidy's output a lot.
Here is how I got it working right now in a build script that I wrote, but it's totally a hack ... falling out to the command line perltidy script:
use strict;
use warnings;
# get list of files in directory
my $libLocation = "lib/EDF";
opendir( DIR, $libLocation );
my #filenameArray = readdir(DIR);
# iterate over all files to find *.pm set
for my $file (#filenameArray) {
if ( $file =~ m/ # matching regex
\. # literal period character
pm # the pm file extenstion
/x # end of regex
)
{
my $return = `perl D:/Perl/site/bin/perltidy -q --indent-columns=4 --maximum-line-length=80 -html -opath blib/libhtml2 -toc $libLocation/$file`;
if ($return eq "") {
print "HTMLized " . $file . "\n";
}
else {
print "Error: " . $return . "\n";
}
}
}
But I was really hoping there was a way to use Module::Build and just tell it with a flag or an argument or whatever to tell it to use a different HTML formatter. I guess that's a pipe dream, though:
use strict;
use warnings;
use Module::Build;
my $build = Module::Build->resume (
properties => {
config_dir => '_build',
},
);
$build->dispatch('build');
$build->dispatch('html', engine => 'perltidy');
or maybe:
$build->dispatch('htmltidy');
Well, the action is implemented in
htmlify_pods
in Module::Build::Base.
It should be possible to override that method.
Much Later ...
Here is my attempt (tested only once):
package My::Builder;
use strict;
use warnings;
use base 'Module::Build';
sub htmlify_pods {
my $self = shift;
my $type = shift;
my $htmldir = shift || File::Spec->catdir($self->blib, "${type}html");
require Module::Build::Base;
require Module::Build::PodParser;
require Perl::Tidy;
$self->add_to_cleanup('pod2htm*');
my $pods = $self->_find_pods(
$self->{properties}{"${type}doc_dirs"},
exclude => [ Module::Build::Base::file_qr('\.(?:bat|com|html)$') ] );
return unless %$pods; # nothing to do
unless ( -d $htmldir ) {
File::Path::mkpath($htmldir, 0, oct(755))
or die "Couldn't mkdir $htmldir: $!";
}
my #rootdirs = ($type eq 'bin') ? qw(bin) :
$self->installdirs eq 'core' ? qw(lib) : qw(site lib);
my $podpath = join ':',
map $_->[1],
grep -e $_->[0],
map [File::Spec->catdir($self->blib, $_), $_],
qw( script lib );
foreach my $pod ( keys %$pods ) {
my ($name, $path) = File::Basename::fileparse($pods->{$pod},
Module::Build::Base::file_qr('\.(?:pm|plx?|pod)$'));
my #dirs = File::Spec->splitdir( File::Spec->canonpath( $path ) );
pop( #dirs ) if $dirs[-1] eq File::Spec->curdir;
my $fulldir = File::Spec->catfile($htmldir, #rootdirs, #dirs);
my $outfile = File::Spec->catfile($fulldir, "${name}.html");
my $infile = File::Spec->abs2rel($pod);
next if $self->up_to_date($infile, $outfile);
unless ( -d $fulldir ){
File::Path::mkpath($fulldir, 0, oct(755))
or die "Couldn't mkdir $fulldir: $!";
}
my $path2root = join( '/', ('..') x (#rootdirs+#dirs) );
my $htmlroot = join( '/',
($path2root,
$self->installdirs eq 'core' ? () : qw(site) ) );
my $fh = IO::File->new($infile) or die "Can't read $infile: $!";
my $abstract = Module::Build::PodParser->new(fh => $fh)->get_abstract();
my $title = join( '::', (#dirs, $name) );
$title .= " - $abstract" if $abstract;
my %opts = (
argv => join(" ",
qw( -html --podflush ),
"--title=$title",
'--podroot='.$self->blib,
"--htmlroot=$htmlroot",
"--podpath=$podpath",
),
source => $infile,
destination => $outfile,
);
if ( eval{Pod::Html->VERSION(1.03)} ) {
$opts{argv} .= ' --podheader';
$opts{argv} .= ' --backlink=Back to Top';
if ( $self->html_css ) {
$opts{argv} .= " --css=$path2root/" . $self->html_css;
}
}
$self->log_info("HTMLifying $infile -> $outfile\n");
$self->log_verbose("perltidy %opts\n");
Perl::Tidy::perltidy(%opts); # or warn "pod2html #opts failed: $!";
}
}
1;
** To use it .. **
#!/usr/bin/perl
use strict;
use warnings;
use My::Builder;
my $builder = My::Builder->new(
module_name => 'My::Test',
license => 'perl',
);
$builder->create_build_script;
It's very easy to define new Module::Build actions that you can call with dispatch, and there are plenty of examples in the Module::Build documentation. Define an action to handle your new step:
sub ACTION_htmltidy
{
my( $self ) = #_;
$self->depends_on( ...other targets... );
require Perl::Tidy;
...do your damage...
}
If you want another action to use yours, you can extend it so you can make the dependency:
sub ACTION_install
{
my( $self ) = #_;
$self->depends_on( 'htmltidy' );
$self->SUPER::install;
}