How can I convert a file to an HTML table using Perl? - html

I am trying to write a simple Perl CGI script that:
runs a CLI script
reads the resulting .out file and converts the data in the file to an HTML table.
Here is some sample data from the .out file:
10.255.202.1 2472327594 1720341
10.255.202.21 2161941840 1484352
10.255.200.0 1642646268 1163742
10.255.200.96 1489876452 1023546
10.255.200.26 1289738466 927513
10.255.202.18 1028316222 706959
10.255.200.36 955477836 703926
Any help would be much appreciated.

The following is untested and probably needs a lot of polishing
but it gives a rough idea:
use CGI qw/:standard *table/;
print
start_html('clicommand results'),
start_table;
open(my $csvh, 'clicommand |');
while (<$csvh>) {
print Tr(map { td($_) } split);
}
close($csvh);
print
end_table,
end_html;

This doesn't directly answer your question, but is it possible to use AWK instead? It shouldn't be too difficult to wrap the whole content, then each column entry with the appropriate html tags to create a basic table.

You'll very likely want to make the HTML prettier by using a CSS stylesheet or adding borders to the table, but here's a simple start.
#!/usr/bin/perl
use strict;
use warnings;
my $output = `cat dat`;
my #lines = split /\n/, $output;
my #data;
foreach my $line (#lines) {
chomp $line;
my #d = split /\s+/, $line;
push #data, \#d;
}
print <<HEADER;
<html>
<table>
HEADER
foreach my $d (#data) {
print "\t", "<tr>";
print map { "<td>$_</td>" } #$d;
print "</tr>", "\n";
}
print <<FOOTER;
</table>
</html>
FOOTER
This makes the following output:
<html>
<table>
<tr><td>10.255.202.1</td><td>2472327594</td><td>1720341</td></tr>
<tr><td>10.255.202.21</td><td>2161941840</td><td>1484352</td></tr>
<tr><td>10.255.200.0</td><td>1642646268</td><td>1163742</td></tr>
<tr><td>10.255.200.96</td><td>1489876452</td><td>1023546</td></tr>
<tr><td>10.255.200.26</td><td>1289738466</td><td>927513</td></tr>
<tr><td>10.255.202.18</td><td>1028316222</td><td>706959</td></tr>
<tr><td>10.255.200.36</td><td>955477836</td><td>703926</td></tr>
</table>
</html>
To understand how to modify the look of your HTML tables, the w3schools website entry on the table tag is a good start.

Related

as_html in HTML::TagParser

I'm working in perl
I would like to ask if there is something like
$value->as_html()
from HTML::TreeBuilder in HTML::TagParser;
I extracted tag which I needed in HTML::TagParser, but now the only option is:
$value->innerText();
which give me only text without HTML tags
Or maybe can I somehow connect result from HTML::TagParser with HTML::TreeBuilder, and take my HTML tags like this?
The HTML::TagParser does not only read the element content. It also keeps the element name and the attribute key/value pairs for each selected element. Therefore you can easily reproduce the complete HTML code of the element.
Actually, the HTML::TagParser CPAN page contains an example for this: The following code extracts all <a>nchor tags from a web page and reproduces them into an HTML fragment listing precisely these tags.
my $url = 'http://www.kawa.net/xp/index-e.html';
my $html = HTML::TagParser->new( $url );
my #list = $html->getElementsByTagName( "a" );
foreach my $elem ( #list ) {
my $tagname = $elem->tagName;
my $attr = $elem->attributes;
my $text = $elem->innerText;
print "<$tagname";
foreach my $key ( sort keys %$attr ) {
print " $key=\"$attr->{$key}\"";
}
if ( $text eq "" ) {
print " />\n";
} else {
print ">$text</$tagname>\n";
}
}
This works pretty well for simple element scanning. For more complex tasks (e.g. mixed inner HTML content) I would prefer to work with HTML::Parser.

MODX MIGX data in chunk

I have created similar table to this one
http://rtfm.modx.com/display/ADDON/MIGX.Simple+opening+hours+table
I have successfully exported data to resource, but i want to show it in a chunk so i can display it in getresources.
I use getresources to display resources and besides title and intro text i would like to show datesTV data.
I use template chunk for migx:
[[+date:notempty=`<td>[[+date:strtotime:date=`%d.%m.%Y, %H.%M`]]</td>`:default=`<td colspan="2">No show!</td>`]]
If i use this in other chunk for getresources [[+tv.datesTV]] i get this array out:
[{"MIGX_id":"1","date":"2012-10-28 21:00:00"},{"MIGX_id":"2","date":"2012-10-28 01:45:00"},{"MIGX_id":"3","date":"2012-10-30 02:45:00"}]
How can I display this data as it should be in a chunk.
Ok here you can se how my snippet looks like..
<?php
$strJSON = $modx->resource->getTVValue('spored');
$arrJSON = $modx->fromJSON($strJSON);
foreach($arrJSON as $arrJSONDataSet)
{
foreach($arrJSONDataSet as $key => $value)
{
echo $key . ' => ';
echo $value;
echo '<br />';
}
}
With MIGX you need a snippet to parse and format the raw TV data as it's stored as JSON.
For a rough example of how to do this, refer back to the link you mentioned and try the getImageList snippet:
http://rtfm.modx.com/display/ADDON/MIGX.Simple+opening+hours+table#MIGX.Simpleopeninghourstable-ParsingtheData
You'll need to include that snippet call in your getResources chunk which is going to be really inefficient; it would be better to code up a custom snippet to retrieve the necessary data.
But see how that goes first...

Perl parse links from HTML Table

I'm trying to get links from table in HTML. By using HTML::TableExtract, I'm able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link that involves in the table. For example,
<table id="AlphabetTable">
<tr>
<td>
Ability <span class="count">2650</span>
</td>
<td>
Abnormal <span class="count">26</span>
</td>
</table>
Is there a way to get link using HTML::TableExtract ? or other module that could possibly use in this situation. Thanks
part of my code:
$mech->get($link->url());
$te->parse($mech->content);
foreach $ts ($te->tables){
foreach $row ($ts->rows){
print #$row[0]; #it only prints text part
#but I want its link
}
}
HTML::LinkExtor, passing the extracted table text to its parse method.
my $le = HTML::LinkExtor->new();
foreach $ts ($te->tables){
foreach $row ($ts->rows){
$le->parse($row->[0]);
for my $link_tag ( $le->links ) {
my ($tag, %links) = #$link_tag;
# next if $tag ne 'a'; # exclude other kinds of links?
print for values %links;
}
}
}
Use keep_html option in the constructor.
keep_html
Return the raw HTML contained in the cell, rather than just the visible text. Embedded tables are not retained in the HTML extracted from a cell. Patterns for header matches must take into account HTML in the string if this option is enabled. This option has no effect if extracting into an element tree structure.
$te = HTML::TableExtract->new( keep_html => 1, headers => [qw(field1 ... fieldN)]);

How can I reliably parse a QuakeLive player profile using Perl?

I'm currently working on a Perl script to gather data from the QuakeLive website.
Everything was going fine until I couldn't get a set of data.
I was using regexes for that and they work for everything apart from the favourite arena, weapon and game type. I just need to get the names of those three elements in a $1 for further processing.
I tried regexing up to the favorites image, but without succeeding. If it's any use, I'm already using WWW::Mechanize in the script.
I think that the problem could be related to the class name of the paragraphs where those elements are, while the previous one was classless.
You can find an example profile HERE.
Note that for the previous part of the page, it worked using code like:
$content =~ /<b>Wins:<\/b> (.*?)<br \/>/;
$wins = $1;
print "Wins: $wins\n";
The immediate problem is that you have:
<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif"
width="17" height="17" alt="" class="fl fivepxhr" />
<b>Arena:</b> Campgrounds
<div class="cl"></div>
</p>
That is, there is no <br /> following the value for favorites such as Arena. Now, the correct way to do this would involve using a proper HTML parser. The fragile solution is to adapt your pattern (untested):
my ($favarena) = $content =~ m{<b>Arena:</b> ([^<]+)};
That should put everything up to the < of the next <div> in $favarena. Now, if all arenas are single words with no spaces in them,
my ($favarena) = $content =~ m{<b>Arena:</b> (\S+)};
would save you the trouble of having to trim whitespace afterwards.
Note that it is easy for such regex based solutions to be fooled with simple things like commented out snippets in the source. E.g., if the source were to be changed to:
<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif"
width="17" height="17" alt="" class="fl fivepxhr" />
<!-- <b>Arena: </b> here -->
<b>Arena:</b> Campgrounds
<div class="cl"></div>
</p>
your script would be in trouble where as a solution using an HTML parser would not.
An example using HTML::TokeParser::Simple:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( 'martianbuddy.html' );
while ( my $tag = $p->get_tag('p') ) {
next unless $tag->is_start_tag;
next unless defined (my $class = $tag->get_attr('class'));
next unless grep { /^prf_faves\z/ } split ' ', $class;
my $fav = $p->get_tag('b');
my $type = $p->get_text('/b');
my $value = $p->get_text('/p');
$value =~ s/\s+\z//;
print "$type = $value\n";
}
Output:
Arena: Campgrounds
Game Type: Clan Arena
Weapon: Rocket Launcher
And, here is an example using HTML::TreeBuilder:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TreeBuilder;
use YAML;
my $tree = HTML::TreeBuilder->new;
$tree->parse_file('martianbuddy.html');
my #p = $tree->look_down(_tag => 'p', sub {
return unless defined (my $class = $_[0]->attr('class'));
return unless grep { /^prf_faves\z/ } split ' ', $class;
return 1;
}
);
for my $p ( #p ) {
my $text = $p->as_text;
$text =~ s/^\s+//;
my ($type, $value) = split ': ', $text;
print "$type: $value\n";
}
Output:
Arena: Campgrounds
Game Type: Clan Arena
Weapon: Rocket Launcher
Given that the document is an HTML fragment rather than a full document, you will have more success with modules based on HTML::Parser rather than those that expect to operate on well-formed XML documents.
Using regular expressions for this particular task is less than ideal. There are just too many things that might change, and you're not taking advantage of inherent structure of HTML pages. Have you considered using something like HTML::TreeBuilder instead? It will allow you to say "get me the value of the 3rd table cell in the table named weapons", etc.

Why doesn't the match operator match anything?

I'm trying to parse this HTML block:
<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg">
to capture the redirect link:
/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=
and video title:
The Valley Downs Chicago
When I use this simple Perl code:
foreach $_ (#promotedVideos)
{
if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/six)
{
print $1;
print $2;
}
}
nothing prints. While I'm troubleshooting this, I thought I'd ask you the experts if you see anything wrong or problematic. Thanks so much in advance for your help!
Your /x regex modifier messes something with whitespaces. Remove it.
That is, it should be
if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si)
/x makes perl ignore whitespaces inside regex, making your regex equivalent of following:
/\s<divclass="v120WrapperInner"><a href="([^"]*)"title="([^"]*)"><img/six
that will not match.
Also that \s at the beginning may brake things.
This is the code I've used for testing:
use strict;
my $inp = '<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg">';
print "$inp\n";
if ( $inp =~ /<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si )
{
print "m:\n$1\n$2\n";
}
Okay, this is not exactly what you are asking, but I think (based in this and your older question) that you are parsing HTML.
Let me tell you this: regexes aren't the solution. You should use HTML::TreeBuilder to parse HTML documents, because HTML documents are horribly messy.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
my $root = HTML::TreeBuilder->new_from_file(\*DATA);
foreach my $div ($root->find_by_tag_name('div')) {
if ($div->attr('class') eq 'v120WrapperInner') {
foreach (my $a = $div->find_by_tag_name('a')) {
print "m:\n", $a->attr('href'), "\n", $a->attr('title'), "\n";
}
}
}
It is good that you are gaining experience with regex in perl, but for this type of work you might consider using a DOM parser like XML::DOM.
G'day,
If you're having problems understanding regexp's can I suggest having a read of the regexp intro in Dale Dougherty's excellent book "sed & awk" (sanitised Amazon link).
Definitely one of the best intro's to regexp's around.
HTH
cheers,