html::tagparser not working - html

my $url = "\'http://".$server.":4080/cgi-bin/gen_graph.pl?view=5&SUBSYS=\'";
my $html = HTML::TagParser->new( $url );
my #list = $html->getElementsByTagName( "pre" );
print $list[0];
foreach my $elem ( #list ) {
if($elem->innerText =~ /APIs/){
my $text = $elem->innerText;
if ( $text eq "" ) {
} else {
#API_list = split(/\s+/, $text);
print $API_list[1];
}
}
}
return \#API_list;
}
here the line my #list = $html->getElementsByTagName( "pre" ); not working. if i do this as a seperate script it is working well.. but if i include it in another script there is no value in #list. can anyone help me?

Are you getting an error message? If so, what is it?
Have you thought to check the return value of HTML::TagParser->new()? If it's failing, it may be doing so silently, and you only find out later when you try to use your $html object.
I do think the URL you're handing to it looks odd.
"\'http://".$server.":4080/cgi-bin/gen_graph.pl?view=5&SUBSYS=\'"
Why the two layers of quotes? (double quotes, and then escaped single quotes). Wouldn't this work:
my $url = 'http://'
. $server
. ':4080/cgi-bin-gen_graph.pl?view=5&SUBSYS=';
(Extra whitespace added to make it easier to read the concatenation operator.)

Related

Regex find comment word and replace comment1

I am looking for a regular expression that would allow me to find and replace all instances with id="comment" with adding incremental number. Like id="comment1", id="comment2" and so on. I have about 400+ instances in an HTML document.
Should I use notepad++? What is the easiest way to do this?
Any help would be appreciated. Thanks in advance
I'm not sure what might be the easiest way, but I highly doubt that'd be any regular expression.
Maybe, with a for loop we could do that. For instance, it would look something like this in PHP:
<?php
$str = file_get_contents('path/to/html/file/filename.html');
$arr = preg_split('/id="comment"/', $str);
foreach ($arr as $key => $value) {
$count = $key + 1;
if ($key == sizeof($arr) - 1) {
$new_str .= $value;
} else {
$new_str .= $value . ' id="comment' . $count . '"';
}
}
$str = file_put_contents('path/to/html/file/filename_modified.html', $new_str);

Corrupted JSON encoding in Perl (missign comma)

My custom code (on Perl) give next wrong JSON, missing comma between blocks:
{
"data": [{
"{#LOGFILEPATH}": "/tmp/QRZ2007.tcserverlogs",
"{#LOGFILE}": "QRZ2007"
} **missing comma** {
"{#LOGFILE}": "ARZ2007",
"{#LOGFILEPATH}": "/tmp/ARZ2007.tcserverlogs"
}]
}
My terrible code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use utf8;
use JSON;
binmode STDOUT, ":utf8";
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
opendir(DIR, $dir) or die $!;
print '{"data": [';
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file");
next unless ($file =~ m/\.tcserverlogs$/);
my $fullPath = "$dir/$file";
my $filenameshort = basename($file, ".tcserverlogs");
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
print $json->encode($data_to_json);
}
print ']}'."\n";
closedir(DIR);
exit 0;
Dear Team i am not a programmer, please any idea how fix it, thank you!
If you do not print a comma, you will not get a comma.
You are trying to build your own JSON string from pre-encoded pieces of smaller data structures. That will not work unless you tell Perl when to put commas. You could do that, but it's easier to just collect all the data into a Perl data structure that is equivalent to the JSON string you want to produce, and encode the whole thing in one go when you're done.
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
my #data;
opendir( DIR, $dir ) or die $!;
while ( my $file = readdir(DIR) ) {
next unless ( -f "$dir/$file" );
next unless ( $file =~ m/\.tcserverlogs$/ );
my $fullPath = "$dir/$file";
my $filenameshort = basename( $file, ".tcserverlogs" );
my $data_to_json = { "{#LOGFILEPATH}" => $fullPath, "{#LOGFILE}" => $filenameshort };
push #data, $data_to_json;
}
closedir(DIR);
print $json->encode( { data => \#data } );

How to convert tag names and values from XML into HTML using Perl

Is there any way to convert a simple XML document into HTML using Perl that would give me a table of tag names and tag values?
The XML file output.xml is like this
<?xml version="1.0"?>
<doc>
<GI-eSTB-MIB-NPH>
<eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
<eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
<eSTBGeneralPlatformID.0>INTEGER: 2076</eSTBGeneralPlatformID.0>
<eSTBGeneralFamilyID.0>INTEGER: 25</eSTBGeneralFamilyID.0>
<eSTBGeneralModelID.0>INTEGER: 60436</eSTBGeneralModelID.0>
<eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
<eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
</GI-eSTB-MIB-NPH>
</doc>
I am trying to create HTML which looks like this
1. eSTBGeneralPlatformID.0 - INTEGER: 2076
2. eSTBGeneralFamilyID.0 - INTEGER: 25
3.
I was trying to use code from the web but I am really having a hard time understanding how to generate the required format for HTML tags.
What I was trying was this
#!/usr/bin/perl
use strict;
use warnings;
use XML::Parser;
use XML::LibXML;
#Add TagNumberConversion.pl here
my $parser = XML::Parser->new();
$parser->setHandlers(
Start => \&start,
End => \&end,
Char => \&char,
Proc => \&proc,
);
my $header = &getXHTMLHeader();
print $header;
$parser->parsefile( '20150630104826.xml' );
my $currentTag = "";
sub start() {
my ( $parser, $name, %attr ) = #_;
$currentTag = $name;
if ( $currentTag eq 'doc' ) {
print "<head><title>"
. "Output of snmpwalk for cpeIP4"
. "</title></head>";
print "<body><h2>" . "Output of snmpwalk for cpeIP4" . "</h2>";
print '<table summary="'
. "Output of snmpwalk for cpeIP4"
. '"><tr><th>Tag Name</th><th>Tag Value</th></tr>';
}
elsif ( $currentTag eq 'GI-eSTB-MIB-NPH' ) {
print "<tr>";
}
elsif ( $currentTag =~ /^eSTB/ ) {
print "<tr>";
}
else {
print "<td>";
}
}
sub end() {
my ( $parser, $name, %attr ) = #_;
$currentTag = $name;
if ( $currentTag eq 'doc' ) {
print "</table></body></html>";
}
elsif ( $currentTag eq 'GI-eSTB-MIB-NPH' ) {
print "</tr>";
}
elsif ( $currentTag =~ /^eSTB/ ) {
print "</tr>";
}
else {
print "</td>";
}
}
sub char() {
my ( $parser, $data ) = #_;
print $data;
}
sub proc() {
my ( $parser, $target, $data ) = #_;
if ( lc( $target ) eq 'perl' ) {
$data = eval( $data );
print $data;
}
}
sub getXHTMLHeader() {
my $header = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">';
return $header;
}
This is code in progress, but I realize that this will be overkill for my requirement.
So I am trying to figure out if there is any quick way to do it using Perl.
Please give me some pointers if there is indeed any quick way.
The quick and dirty way is to just use a regular expression. However it comes with the risk of missing some data and getting burned by edge cases. But since you asked for it...
#!/usr/bin/env perl
use strict;
open my $fh, 'filename.xml'
or die "unable to open filename.xml : $!";
my $count = 1;
print "<head><title>'Output of snmpwalk for cpeIP4'</title></head>\n";
print "<body><h2>'Output of snmpwalk for cpeIP4'</h2>\n";
print "<table summary='Output of snmpwalk for cpeIP4'><tr><th>Tag Name</th><th>Tag Value</th></tr>\n";
while (my $line = <$fh>) {
next unless $line =~ m|<eSTB|;
# Store into into $tag and $value
# the result of matching whitespace, followed by '<'
# followed by anything (store into $tag)
# followed by '>'
# followed by anything (store into $value)
# followed by '<'
my ($tag, $value) = $line =~ m|\s+<(.+?)>(.+?)<|;
print "<tr><td>" . $count++ . ". $tag</td><td>$value</td></tr>\n";
}
print "</table></body></html>\n";
Produces the following:
<head><title>'Output of snmpwalk for cpeIP4'</title></head>
<body><h2>'Output of snmpwalk for cpeIP4'</h2>
<table summary='Output of snmpwalk for cpeIP4'><tr><th>Tag Name</th><th>Tag Value</th></tr>
<tr><td>1. eSTBGeneralErrorCode.0</td><td>INTEGER: 0</td></tr>
<tr><td>2. eSTBGeneralConnectedState.0</td><td>INTEGER: true(1)</td></tr>
<tr><td>3. eSTBGeneralPlatformID.0</td><td>INTEGER: 2076</td></tr>
<tr><td>4. eSTBGeneralFamilyID.0</td><td>INTEGER: 25</td></tr>
<tr><td>5. eSTBGeneralModelID.0</td><td>INTEGER: 60436</td></tr>
<tr><td>6. eSTBMoCAMACAddress.0</td><td>STRING: 0:0:0:0:0:0</td></tr>
<tr><td>7. eSTBMoCANumberOfNodes.0</td><td>INTEGER: 0</td></tr>
</table></body></html>
Firstly, I think you're using the wrong tool for this. I always find XML::LibXML far easier to use than XML::Parser. You load XML::LibXML, but you never make use of it.
Secondly, I think you'll find your live is easier if you think of this as two stages - one to extract the data and one to output the new data.
Here's the first stage, which stores the data you need in an array.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use XML::LibXML;
use Data::Dumper;
my $file = shift || die "Must give XML file\n";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my #tags;
# Find the nodes using an XPath expression
foreach ($doc->findnodes('//GI-eSTB-MIB-NPH/*')) {
push #tags, { name => $_->nodeName, content => $_->textContent };
}
# Just here to show the intermediate data structure
say Dumper \#tags;
You then need to use #tags to generate your output. For over fifteen years we've know that it's a terrible idea to include hard-coded HTML in amongst your Perl code, so I'd highly recommend looking at a templating system like the Template Toolkit.
I created a xml.tt file like this:
<html>
<head>
<title>Output of snmpwalk for cpeIP4</title>
</head>
<body><h2>Output of snmpwalk for cpeIP4</h2>
<table summary='Output of snmpwalk for cpeIP4'>
<tr>
<th>Tag Name</th><th>Tag Value</th><
/tr>
[% FOREACH tag IN tags -%]
<tr><td>[% loop.count %]. [% tag.name %]</td><td>[% tag.content %]</td></tr>
[% END -%]
</table>
</body>
</html>
And then the second half of my program looks like this:
use Template;
my $tt = Template->new;
$tt->process('xml.tt', { tags => \#tags });
I hope you agree that all looks a lot simpler than your approach.

Perl: HTML::PrettyPrinter - Handling self-closing tags

I am a newcomer to Perl (Strawberry Perl v5.12.3 on Windows 7), trying to write a script to aid me with a repetitive HTML formatting task. The files need to be hand-edited in future and I want them to be human-friendly, so after processing using the HTML package (HTML::TreeBuilder etc.), I am writing the result to a file using HTML::PrettyPrinter. All of this works well and the output from PrettyPrinter is very nice and human-readable. However, PrettyPrinter is not handling self-closing tags well; basically, it seems to be treat the slash as an HTML attribute. With input like:
<img />
PrettyPrinter returns:
<img /="/" >
Is there anything I can do to avoid this other than preprocessing with a regex to remove the backslash?
Not sure it will be helpful, but here is my setup for the pretty printing:
my $hpp = HTML::PrettyPrinter->new('linelength' => 120, 'quote_attr' => 1);
$hpp->allow_forced_nl(1);
my $output = new FileHandle ">output.html";
if (defined $output) {
$hpp->select($output);
my $linearray_ref = $hpp->format($internal);
undef $output;
$hpp->select(undef),
}
You can print formatted human readable html with TreeBuilder method:
$h = HTML::TreeBuilder->new_from_content($html);
print $h->as_HTML('',"\t");
but if you still prefer this bugged prettyprinter try to remove problem tags, no idea why someone need ...
$h = HTML::TreeBuilder->new_from_content($html);
while(my $n = $h->look_down(_tag=>img,'src'=>undef)) { $n->delete }
UPD:
well... then we can fix the PrettyPrinter. It's pure perl module so lets see...
No idea where on windows perl modules are for me it's /usr/local/share/perl/5.10.1/HTML/PrettyPrinter.pm
maybe not an elegant solution, but will work i hope.
this sub parse attribute/value pairs, a little fix and it will add single '/' at the end
~line 756 in PrettyPrinter.pm
I've marked the stings that i added with ###<<<<<< at the end
#
# format the attributes
#
sub _attributes {
my ($self, $e) = #_;
my #result = (); # list of ATTR="value" strings to return
my $self_closing = 0; ###<<<<<<
my #attrs = $e->all_external_attr(); # list (name0, val0, name1, val1, ...)
while (#attrs) {
my ($a,$v) = (shift #attrs,shift #attrs); # get current name, value pair
if($a eq '/') { ###<<<<<<
$self_closing=1; ###<<<<<<
next; ###<<<<<<
} ###<<<<<<
# string for output: 1. attribute name
my $s = $self->uppercase? "\U$a" : $a;.
# value part, skip for boolean attributes if desired
unless ($a eq lc($v) &&
$self->min_bool_attr &&.
exists($HTML::Tagset::boolean_attr{$e->tag}) &&
(ref($HTML::Tagset::boolean_attr{$e->tag}).
? $HTML::Tagset::boolean_attr{$e->tag}{$a}.
: $HTML::Tagset::boolean_attr{$e->tag} eq $a)) {
my $q = '';
# quote value?
if ($self->quote_attr || $v =~ tr/a-zA-Z0-9.-//c) {
# use single quote if value contains double quotes but no single quotes
$q = ($v =~ tr/"// && $v !~ tr/'//) ? "'" : '"'; # catch emacs ");
}
# add value part
$s .= '='.$q.(encode_entities($v,$q.$self->entities)).$q;
}
# add string to resulting list
push #result, $s;
}
push #result,'/' if $self_closing; ###<<<<<<
return #result; # return list ('attr="val"','attr="val"',...);
}

How can I merge CSS definitions in files into inline style attributes, using Perl?

Many email clients don't like linked CSS stylesheets, or even the embedded <style> tag, but rather want the CSS to appear inline as style attributes on all your markup.
BAD: <link rel=stylesheet type="text/css" href="/style.css">
BAD: <style type="text/css">...</style>
WORKS: <h1 style="margin: 0">...</h1>
However this inline style attribute approach is a right pain to manage.
I've found tools for Ruby and PHP that will take a CSS file and some separate markup as input and return you the merged result - a single file of markup with all the CSS converted to style attributes.
I'm looking for a Perl solution to this problem, but I've not found one on CPAN or by searching Google. Any pointers? Alternatively, are there CPAN modules one could combine to achieve the same result?
Ruby http://premailer.dialect.ca/
PHP http://www.pelagodesign.com/sidecar/emogrifier/
Perl ?
I do not know of a complete, pre-packaged solution.
CSS::DOM's compute_style is subject to pretty much the same caveats as emogrifier above. That module, in conjunction with HTML::TokeParser ought to be usable to cook up something.
Update: Here is a buggy mish-mash of things:
#!/usr/bin/perl
use strict;
use warnings;
use CSS::DOM;
use File::Slurp;
use HTML::DOM;
use HTML::TokeParser;
die "convert html_file css_file" unless #ARGV == 2;
my ($html_file, $css_file) = #ARGV;
my $html_parser = HTML::TokeParser->new($html_file)
or die "Cannot open '$html_file': $!";
my $sheet = CSS::DOM::parse( scalar read_file $css_file );
while ( my $token = $html_parser->get_token ) {
my $type = $token->[0];
my $text = $type eq 'T' ? $token->[1] : $token->[-1];
if ( $type eq 'S' ) {
unless ( skip( $token->[1] ) ) {
$text = insert_computed_style($sheet, $token);
}
}
print $text;
}
sub insert_computed_style {
my ($sheet, $token) = #_;
my ($tag, $attr, $attrseq) = #$token[1 .. 3];
my $doc = HTML::DOM->new;
my $element = $doc->createElement($tag);
for my $attr_name ( #$attrseq ) {
$element->setAttribute($attr_name, $attr->{$attr_name});
}
my $style = CSS::DOM::compute_style(
element => $element, user_sheet => $sheet
);
my #attrseq = (style => grep { lc $_ ne 'style' } #$attrseq );
$attr->{style} = $style->cssText;
my $text .= join(" ",
"<$tag",
map{ qq/$_='$attr->{$_}'/ } #attrseq );
$text .= '>';
return $text;
}
sub skip {
my ($tag) = #_;
$tag = lc $tag;
return 1 if $tag =~ /^(?:h(?:ead|tml)|link|meta|script|title)$/;
}
You can use CPAN Perl module CSS::Inliner https://metacpan.org/release/CSS-Inliner