I'm trying to follow the example in the synopsis of HTML::PrettyPrinter. I corrected the typo to create a FileHandle:
my $fh = new FileHandle ">E:\\test.html";
Now the file gets created but I'm getting another error:
Can't call method "isa" on an undefined value at C:/Strawberry/perl/site/lib/HTML/PrettyPrinter.pm line 414.
Here is the code I have so far:
use HTML::TreeBuilder;
# generate a HTML syntax tree
my $tree = new HTML::TreeBuilder;
$tree->parse_file("E:\\file.html");
# modify the tree if you want
use HTML::PrettyPrinter;
my $hpp = new HTML::PrettyPrinter ('linelength' => 130,'quote_attr' => 1);
# configure
$hpp->set_force_nl(1,qw(body head)); # for tags
$hpp->set_force_nl(1,qw(#SECTIONS)); # as above
$hpp->set_nl_inside(0,'default!'); # for all tags
# format the source
my $linearray_ref = $hpp->format($tree);
print #$linearray_ref;
# alternative: print directly to filehandle
use FileHandle;
my $fh = new FileHandle ">E:\\test.html";
if (defined $fh) {
$hpp->select($fh);
$hpp->format();
undef $fh;
$hpp->select(undef),
}
This line is causing the error:
$hpp->format();
HTML::PrettyPrinter::format attempts to call isa on the first argument:
411 sub format {
412 my ($self, $element, $indent, $lar) = #_;
413 # $lar = line array ref
414 confess "Need an HTML::Element" unless $element->isa('HTML::Element');
...
Which causes the error you're getting if it is undef. Passing $tree (which isa HTML::Element) as the first argument populates the file correctly:
$hpp->format($tree);
Related
I am trying to parse a csv file and iterate through it with curl. The following is my data set:
Act No. 2,Sep/1900/28
Act No. 3,Sep/1900/28
Act No. 10,Oct/1900/28
I have followed this Stackoverflow question: CSV into hash to basically create hash for my data set.
Here is my code:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use IO::File;
use WWW::Curl::Easy;
my $url = "https://elibrary.judiciary.gov.ph/thebookshelf/docmonth/";
#my $filestoprocess = 'list_acts.csv';
# Usage example:
my $hash_ref = csv_file_hashref('toharvest_og_sourcing.csv');
foreach my $key (sort keys %{$hash_ref}){
my $urlcomplete = "$url"."#{$hash_ref->{$key}}";
#start the curl
my $user_agent = "Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20140319 Firefox/24.0 Iceweasel/24.4.0";
my $curl = WWW::Curl::Easy->new;
$curl->setopt(CURLOPT_HEADER,1);
$curl->setopt(CURLOPT_USERAGENT, $user_agent);
$curl->setopt(CURLOPT_FOLLOWLOCATION, 1);
#$curl->setopt(CURLOPT_SSL_VERIFYPEER, 1L);
#$curl->curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
$curl->setopt(CURLOPT_SSL_VERIFYPEER, 0);
$curl->setopt(CURLOPT_URL, $urlcomplete);
# A filehandle, reference to a scalar or reference to a typeglob can be used here.
my $response_body;
$curl->setopt(CURLOPT_WRITEDATA,\$response_body);
# Starts the actual request
my $retcode = $curl->perform;
# Looking at the results...
if ($retcode == 0) {
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
my $curledurldate = $response_body;
our ($issuancelink) = $curledurldate =~ /a href='(https.*?)'>.*?<STRONG>$key/s;
#print "$issuancelink\n";
if (defined $issuancelink) {
my $user_agent = "Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20140319 Firefox/24.0 Iceweasel/24.4.0";
#my $curl = WWW::Curl::Easy->new;
$curl->setopt(CURLOPT_HEADER,1);
$curl->setopt(CURLOPT_USERAGENT, $user_agent);
$curl->setopt(CURLOPT_FOLLOWLOCATION, 1);
#$curl->setopt(CURLOPT_SSL_VERIFYPEER, 1L);
#$curl->curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
$curl->setopt(CURLOPT_SSL_VERIFYPEER, 0);
$curl->setopt(CURLOPT_URL, $issuancelink);
# A filehandle, reference to a scalar or reference to a typeglob can be used here.
my $response_body;
$curl->setopt(CURLOPT_WRITEDATA,\$response_body);
# Starts the actual request
my $retcode = $curl->perform;
# Looking at the results...
if ($retcode == 0) {
# print("Transfer went ok\n");
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
my $curledsource = $response_body;
our ($ogsourcing) = $curledsource =~ /<br>\s+(\w+.*?)\s+?<CENTER>.*?H2/s;
my $filename = 'ogsourcingharvested.txt';
open (FH, '>>', $filename) or die("Could not open file. $!");
#print "Error processing ".$fh."$_\n";
print FH $ogsourcing."|"."{$key}\n";
close (FH);
}
else {
# Error code, type of error, error message
print("An error happened: $retcode ".$curl->strerror($retcode)." ".$curl->errbuf."\n");
}
} else {
# Error code, type of error, error message
print("An error happened: $retcode ".$curl->strerror($retcode)." ".$curl->errbuf."\n");
}
}
}
# Implementation:
sub csv_file_hashref {
my ($filename) = #_;
my $csv_fh = IO::File->new($filename, 'r');
my $csv = Text::CSV_XS->new ();
my %output_hash;
while(my $colref = $csv->getline ($csv_fh))
{
$output_hash{shift #{$colref}} = $colref;
}
return \%output_hash;
}
Basically, the code iterates through the second column, add that to the end of a URL, then that URL is curled. Afterwards, the content of the curled URL is searched for a particular content:
our ($issuancelink) = $curledurldate =~ /a href='(https.*?)'>.*?<STRONG>$key/s;
When that link shows up in search, that is put into a variable ($issuancelink) and then that variable $issuancelink is curled. Then a particular text in the curled file is searched, after which that particular text is captured and saved to a text file. However, my code is good if the second column (Sep/1900/28, Oct/1900/28 in this case) aren't repeated. However, if it's repeated, that is where I am having a problem, it seems the first iteration is the one being captured. So in my case, the link for Act No. 3 which has same originating URL (https://elibrary.judiciary.gov.ph/thebookshelf/docmonth/Sep/1900/28) as Act No. 2 (https://elibrary.judiciary.gov.ph/thebookshelf/docmonth/Sep/1900/28), the link for Act No. 2 is instead the one captured.
Thanks in advance!
However, my code is good if the second column (Sep/1900/28, Oct/1900/28 in this case) aren't repeated.
When you store values in a hash, the hash keys are unique. That means that if you have identical key names, they will overwrite each other.
This part of your code:
while(my $colref = $csv->getline ($csv_fh))
{
$output_hash{shift #{$colref}} = $colref;
}
Seems to be responsible. What you can do is to save the values in an array instead of a scalar (in this case, in an array ref).
I would do something like this:
while(my $colref = $csv->getline ($csv_fh))
{
my ($key, $value) = #$colref;
push #{$output_hash{$key}}, $value; # store values in array
}
Another benefit of this is that the values are copied. In your code, the array ref is copied. You are saved from problems by the limited scope of your variable my $colref, but generally speaking copying the values will save you from problems.
To access the array values you will probably need to loop for each hash key. Something like
for my $key (sort keys %$hash_ref) {
for my $values (#{$hash_ref{$key}}) {
# do stuff...
Here is my code that I try to open the file to get data and change it to UTF-8, then read each line and store it in variable my $abstract_text and send it back in JSON structure.
my $fh;
if (!open($fh, '<:encoding(UTF-8)',$path))
{
returnApplicationError("Cannot read abstract file: $path ($!)\nERRORCODE|111|\n");
}
printJsonHeader;
my #lines = <$fh>;
my $abstract_text = '';
foreach my $line (#lines)
{
$abstract_text .= $line;
}
my $json = encode_json($abstract_text);
close $fh;
print $json;
By using that code, I get this error;
hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this)
error message also point out that the problem is in this line;
my $json = encode_json($abstract_text);
I want to send the data back as a string (which is in UTF-8). Please help.
I assume you're using either JSON or JSON::XS.
Both allow for non-reference data, but not via the procedural encode_json routine.
You'll need to use the object-oriented approach:
use strict; # obligatory
use warnings; # obligatory
use JSON::XS;
my $encoder = JSON::XS->new();
$encoder->allow_nonref();
print $encoder->encode('Hello, world.');
# => "Hello, world."
I'm using json file in a Perl program. I'm unable to parse the json file.
It is giving following error:
garbage after JSON object, at character offset 2326471 (before "{"response":{"numFou...") at /usr/local/share/perl5/JSON.pm line 171, <$f> line 1.
Here is the code:
print "input json";
open(my $f, "<", "$ARGV[1]");
my $content=<$f>;
my $structured;
eval {
$structured = from_json($content, {utf8 => 1});
};
if ($#) {
$content =~ s/\n/ /g;
my $errMsg = $#;
$errMsg =~ s/\n/ /g;
WriteInfo("Unparseable result for url=$url, error: $errMsg\n") ;
};
How can I fix this error?
How can I fix this error?
You can't fix JSON data automatically. There could be many "fixes" that will get the data through the parser, but it may be difficult to tell which of them is the correct one. You should talk to the source of the data and ask for a correct version
It may be possible to fix the data manually, but you should only attempt this if there is no correct version of the data available. Finding the error in a 2.2MB+ text file by hand isn't a trivial job, and the character position 2326471 is only where the parser found an error, not where the correction should be made
garbage after JSON object ...
This implies that from_json has found the end of the JSON data -- i.e. the final closing brace } or bracket ] -- but there is data in the string after that character. It may be that the file has been written correctly, but there really is spurious data after the end of the JSON. If so then that should be obvious just by examining the data file
Note
Unless you have redefined the $/ variable, these lines
open(my $f, "<", "$ARGV[1]");
my $content = <$f>;
will read just the first line of the file into $content. It may be that the file contains just a single very long line of tex (i.e. it contains no newline characters) but this line in your error handler
$content =~ s/\n/ /g;
implies that ther are newlines in there.
Reading only the first line of a multi-line JSON file wouldn't cause the error that you're seeing, but it is best to read the entire file into memory before decoding it as JSON data, just in case unexpected newlines have crept into the data
Here is a better way of writing your code segment
print "Input JSON\n";
my $content - do {
open my $fh, '<', $ARGV[1] or die qq{Unable to open "$ARGV[1]" for input: $!};
local $/;
<$fh>;
};
my $structured = eval {
from_json( $content, {utf8 => 1} );
};
if ( my $err_msg = $# ) {
$content =~ tr/\n/ /;
$err_msg =~ tr/\n/ /;
WriteInfo("Unparsable result for URL=$url, error: $err_msg\n") ;
};
how to decode the json file if a value has more than one line
a.json file:
{
"sv1" : {
"output" : "Hostname: abcd
asdkfasfjsl",
"exp_result" : "xyz"
}
}
when I try to read the above json file, I am hitting with an error "invalid character encountered while parsing JSON string, at character offset 50 (before "\n ...")"
code to read the above json file:
#!/volume/perl/bin/perl -w
use strict;
use warnings;
use JSON;
local $/;
open(AA,"<a.json") or die "can't open json file : $!\n";
my $json = <AA>;
my $data = decode_json($json);
print "reading output $data->{'sv1'}->{'output'}\n";
print "reading output $data->{'sv1'}->{'exp_result'}\n";
close AA;
Besides from whether the JSON is valid or not (see comments on question), you're reading only the first line from the file.
my $json = <AA>;
This is a scalar variable and receives only one line.
Use an array to get all lines:
my #json = <AA>;
my $json = join "\n", #json;
or even better: use File::Slurp::read_file to get the whole content of the file with one simple command.
use File::Slurp qw/read_file/;
my $json = read_file( "a.json" );
I'm using Chart module to generate charts in PNG format from CSV data:
It works well, the charts look okay, but I get warnings for the undef values (there are 3 such values at the end of the above diagram):
# ~/txv3.pl "./L*TXV3*.csv" > /var/www/html/x.html
Generating chart: L_B17_C0_TXV3LIN_PA3_TI1_CI1
Use of uninitialized value $label in length at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3477, <> line 69.
Use of uninitialized value in subroutine entry at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3478, <> line 69.
Use of uninitialized value $label in length at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3477, <> line 69.
Use of uninitialized value in subroutine entry at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3478, <> line 69.
Use of uninitialized value $label in length at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3477, <> line 69.
Use of uninitialized value in subroutine entry at /usr/share/perl5/vendor_perl/Chart/Base.pm line 3478, <> line 69.
I need to get rid of these warnings as they are useless here and they make the log of my Hudson-job unreadable.
So I've tried (using perl 5.10.1 on CentOS 6.4 / 64 bit):
#!/usr/bin/perl -w
use strict;
....
$pwrPng->set(%pwrOptions);
$biasPng->set(%biasOptions);
my $pwrPngFile = File::Spec->catfile(PNG_DIR, "${csv}_PWR.png");
my $biasPngFile = File::Spec->catfile(PNG_DIR, "${csv}_BIAS.png");
{
no warnings;
$pwrPng->png($pwrPngFile, $pwrData);
$biasPng->png($biasPngFile, $biasData);
}
But the warnings are still printed.
Any suggestions please?
Usually it is best not to ignore warnings.
Why don't you just handle the undef values first, before charting?
Either replace them with something sensible, or skip plotting those rows:
data.csv
RGI,BIAS,LABEL
20,130,"1146346307 #20"
21,135,"1146346307 #21"
22,140,
--
use Scalar::Util qw( looks_like_number );
my $fname = "data.csv";
open $fh, "<$fname"
or die "Unable to open $fname : $!";
my $data = [];
while (<$fh>) {
chomp;
my ($rgi, $bias, $label) = split /,/; # Better to use Text::CSV
next unless looks_like_number($rgi);
next unless looks_like_number($bias);
$label ||= "Unknown Row $."; # Rownum
# Create whatever structure you need.
push #$data, { rgi => $rgi, bias => $bias, label => $label };
}
# Now draw chart
In your Hudson-job, install a handler for the warn signal that filters warnings so the ones you know about won't show up.
BEGIN {
$SIG{'__WARN__'} = sub { my $w = shift; warn $w if $w !~ m|/Chart/Base.pm| };
}