Perl CGI Functional Programming - generating links - html

I have a Perl CGI script. I'm trying to generate hyperlinks on an HTML page using all the elements in an array. I'm using functional CGI style programming. Here's a minimal representation of my code:
#!/usr/bin/perl
use strict; use warnings;
use CGI qw( :standard);
print header;
print start_html(
-title => 'Get LINK!'
);
my %HoA = (
'foo' => [ '12', '23', '593' ],
'bam' => [ '232', '65' ],
);
my #array = ("foo", "bam");
foreach my $i (#array){
foreach my $j (#{$HoA{$i}}){
my $link = get_link($i);
print "$i"."\t"; # this doesn't work!!
}
}
#-----------------------------------
# this subroutine works!
sub get_link{
my $id = $_[0];
my $link = 'http://www.example.com/'.$id;
return $link;
}
#------------------------------------
Desired Output
foo foo foo bam bam
Any help or suggestions are appreciated.

print "$g"."\t"; # this doesn't work!!
that's because your quotes in your quote end your quote. You need to escape them:
print "$g"."\t"; # this doesn't work!!
single quotes work on the inside, too.
As per the comment, qq-style quoting is also available to you, so you can use double quotes and variable interpolation at the same time without the escape characters.

There's a few things that don't make much sense.
The code below doesn't work because $link is a ref. Also, you're not using $j anywhere,
which should give you a sign that there's a bit of a "design" issue.
foreach my $i (#array){
foreach my $j (#{$HoA{$i}}){
my $link = get_link($i);
print "$g"."\t"; # this doesn't work!!
}
}
Why can't this be rewritten like so:
for my $array_ref (keys %HoA) {
for my $item ( #{ $HoA{$array_ref} } ){
my $link = get_link($item);
# print ....
}
}
How does this sub "work"? $id is an array reference here. Do you just mean to shift onto $id?
#-----------------------------------
# this subroutine works!
sub get_link{
#my $id = $HoA{$_[0]};
# with the refactored double for loop, or the way you had it if you fix the
# `ref` issue, a `shift` should do the trick.
my $id = shift;
my $link = 'http://www.google.com/'.$id;
return $link;
}
#------------------------------------
You also need to escape the quotes in print statement:
print "$g"."\t";

Here is another way you can do this ...
use strict;
use warnings;
use CGI qw(:standard);
print header,
start_html(-title => 'Get LINK!');
my %HoA = (
foo => [ qw(12 23 593) ],
bam => [ qw(232 65) ],
);
foreach my $i ( reverse sort keys %HoA ) {
foreach ( #{$HoA{$i}} ) {
print a({-href => 'http:/www.google.com/'.$_,
-target => '_blank'}, $i) . "\n";
}
}
Output:
foo foo foo bam bam

Related

Corrupted JSON encoding in Perl (missign comma)

My custom code (on Perl) give next wrong JSON, missing comma between blocks:
{
"data": [{
"{#LOGFILEPATH}": "/tmp/QRZ2007.tcserverlogs",
"{#LOGFILE}": "QRZ2007"
} **missing comma** {
"{#LOGFILE}": "ARZ2007",
"{#LOGFILEPATH}": "/tmp/ARZ2007.tcserverlogs"
}]
}
My terrible code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use utf8;
use JSON;
binmode STDOUT, ":utf8";
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
opendir(DIR, $dir) or die $!;
print '{"data": [';
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file");
next unless ($file =~ m/\.tcserverlogs$/);
my $fullPath = "$dir/$file";
my $filenameshort = basename($file, ".tcserverlogs");
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
print $json->encode($data_to_json);
}
print ']}'."\n";
closedir(DIR);
exit 0;
Dear Team i am not a programmer, please any idea how fix it, thank you!
If you do not print a comma, you will not get a comma.
You are trying to build your own JSON string from pre-encoded pieces of smaller data structures. That will not work unless you tell Perl when to put commas. You could do that, but it's easier to just collect all the data into a Perl data structure that is equivalent to the JSON string you want to produce, and encode the whole thing in one go when you're done.
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
my #data;
opendir( DIR, $dir ) or die $!;
while ( my $file = readdir(DIR) ) {
next unless ( -f "$dir/$file" );
next unless ( $file =~ m/\.tcserverlogs$/ );
my $fullPath = "$dir/$file";
my $filenameshort = basename( $file, ".tcserverlogs" );
my $data_to_json = { "{#LOGFILEPATH}" => $fullPath, "{#LOGFILE}" => $filenameshort };
push #data, $data_to_json;
}
closedir(DIR);
print $json->encode( { data => \#data } );

Perl XML2JSON : How to preserve XML element order?

I have a configuration file which is in XML format. I need to parse the XML and convert to JSON. I'm able to convert it with XML2JSON module of perl. But the problem is, it is not maintaining the order of XML elements. I strictly need the elements in order otherwise I cannot configure
My XML file is something like this. I have to configure an IP address and set that IP as a gateway to certain route.
<Config>
<ip>
<address>1.1.1.1</address>
<netmask>255.255.255.0</netmask>
</ip>
<route>
<network>20.20.20.0</network>
<netmask>55.255.255.0</netmask>
<gateway>1.1.1.1</gateway>
</route>
</Config>
This is my perl code to convert to JSON
my $file = 'config.xml';
use Data::Dumper;
open my $fh, '<',$file or die;
$/ = undef;
my $data = <$fh>;
my $XML = $data;
my $XML2JSON = XML::XML2JSON->new();
my $Obj = $XML2JSON->xml2obj($XML);
print Dumper($Obj);
The output I'm getting is,
$VAR1 = {'Config' => {'route' => {'netmask' => {'$t' => '55.255.255.0'},'gateway' => {'$t' => '1.1.1.1'},'network' => {'$t' => '20.20.20.0'}},'ip' => {'netmask' => {'$t' => '255.255.255.0'},'address' => {'$t' => '1.1.1.1'}}},'#encoding' => 'UTF-8','#version' => '1.0'};
I have a script which reads the json object and configure..
But it fails as it first tries to set gateway ip address to a route where the ip address is not yet configured and add then add ip address.
I strictly want key ip to come first and then route for proper configuration without error. Like this I have many dependencies where order of keys is a must.
Is there any way I can tackle this problem? I tried almost all modules of XML parsing like XML::Simple,Twig::XML,XML::Parser. But nothing helped..
Here's a program that I hacked together that uses XML::Parser to parse some XML data and generate the equivalent JSON in the same order. It ignores any attributes, processing instructions etc. and requires that every XML element must contain either a list of child elements or a text node. Mixing text and elements won't work, and this isn't checked except that the program will die trying to dereference a string
It's intended to be a framework for you to enhance as you require, but works fine as it stands with the XML data you show in your question
use strict;
use warnings 'all';
use XML::Parser;
my $parser = XML::Parser->new(Handlers => {
Start => \&start_tag,
End => \&end_tag,
Char => \&text,
});
my $struct;
my #stack;
$parser->parsefile('config.xml');
print_json($struct->[1]);
sub start_tag {
my $expat = shift;
my ($tag, %attr) = #_;
my $elem = [ $tag => [] ];
if ( $struct ) {
my $content = $stack[-1][1];
push #{ $content }, $elem;
}
else {
$struct = $elem;
}
push #stack, $elem;
}
sub end_tag {
my $expat = shift;
my ($elem) = #_;
die "$elem <=> $stack[-1][0]" unless $stack[-1][0] eq $elem;
for my $content ( $stack[-1][1] ) {
$content = "#$content" unless grep ref, #$content;
}
pop #stack;
}
sub text {
my $expat = shift;
my ($string) = #_;
return unless $string =~ /\S/;
$string =~ s/\A\s+//;
$string =~ s/\s+\z//;
push #{ $stack[-1][1] }, $string;
}
sub print_json {
my ($data, $indent, $comma) = (#_, 0, '');
print "{\n";
for my $i ( 0 .. $#$data ) {
# Note that $data, $indent and $comma are overridden here
# to reflect the inner context
#
my $elem = $data->[$i];
my $comma = $i < $#$data ? ',' : '';
my ($tag, $data) = #$elem;
my $indent = $indent + 1;
printf qq{%s"%s" : }, ' ' x $indent, $tag;
if ( ref $data ) {
print_json($data, $indent, $comma);
}
else {
printf qq{"%s"%s\n}, $data, $comma;
}
}
# $indent and $comma (and $data) are restored here
#
printf "%s}%s\n", ' ' x $indent, $comma;
}
output
{
"ip" : {
"address" : "1.1.1.1",
"netmask" : "255.255.255.0"
},
"route" : {
"network" : "20.20.20.0",
"netmask" : "55.255.255.0",
"gateway" : "1.1.1.1"
}
}
The problem isn't so much to do with XML parsing, but because perl hashes are not ordered. So when you 'write' some JSON... it can be any order.
The way to avoid this is to apply a sort function to your JSON.
You can do this by using sort_by to explicitly sort:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use JSON::PP;
use Data::Dumper;
sub order_nodes {
my %rank_of = ( ip => 0, route => 1, address => 2, network => 3, netmask => 4, gateway => 5 );
print "$JSON::PP::a <=> $JSON::PP::b\n";
return $rank_of{$JSON::PP::a} <=> $rank_of{$JSON::PP::b};
}
my $twig = XML::Twig -> parse (\*DATA);
my $json = JSON::PP -> new;
$json ->sort_by ( \&order_nodes );
print $json -> encode( $twig -> simplify );
__DATA__
<Config>
<ip>
<address>1.1.1.1</address>
<netmask>255.255.255.0</netmask>
</ip>
<route>
<network>20.20.20.0</network>
<netmask>55.255.255.0</netmask>
<gateway>1.1.1.1</gateway>
</route>
</Config>
In some scenarios, setting canonical can help, as that sets ordering to lexical order. (And means your JSON output would be consistently ordered). This doesn't apply to your case.
You could build the node ordering via XML::Twig, either by an xpath expression, or by using twig_handlers. I gave it a quick go, but got slightly unstuck in figuring out how you'd 'tell' how to figure out ordering based on getting address/netmask and then network/netmask/gateway.
As a simple example you could:
my $count = 0;
foreach my $node ( $twig -> get_xpath ( './*' ) ) {
$rank_of{$node->tag} = $count++ unless $rank_of{$node->tag};
}
print Dumper \%rank_of;
This will ensure ip and route are always the right way around. However it doesn't order the subkeys.
That actually gets a bit more complicated, as you'd need to recurse... and then decide how to handle 'collisions' (like netmask - address comes before, but how does it sort compared to network).
Or alternatively:
my $count = 0;
foreach my $node ( $twig->get_xpath('.//*') ) {
$rank_of{ $node->tag } = $count++ unless $rank_of{ $node->tag };
}
This walks all the nodes, and puts them in order. It doesn't quite work, because netmask appears in both stanzas though.
You get:
{"ip":{"address":"1.1.1.1","netmask":"255.255.255.0"},"route":{"netmask":"55.255.255.0","network":"20.20.20.0","gateway":"1.1.1.1"}}
I couldn't figure out a neat way of collapsing both lists.

Getting links from an HTML table using HTML::TableExtract and HTML::Extor in Perl

My goal is to extract the links from the tables titled "Agonists," "Antagonists," and "Allosteric Regulators" in the following site:
http://www.iuphar-db.org/DATABASE/ObjectDisplayForward?objectId=1&familyId=1
I've been using HTML::TableExtract to extract the tables but have been unable to get HTML::LinkExtor to retrieve the links in question. Here is the code I have so far:
use warnings;
use strict;
use HTML::TableExtract;
use HTML::LinkExtor;
my #names = `ls /home/wallakin/LINDA/ligands/iuphar/data/html2/`;
foreach (#names)
{
chomp ($_);
my $te = HTML::TableExtract->new( headers => [ "Ligand",
"Sp.",
"Action",
"Affinity",
"Units",
"Reference" ] );
my $le = HTML::LinkExtor->new();
$te->parse_file("/home/wallakin/LINDA/ligands/iuphar/data/html2/$_");
my $output = $_;
$output =~ s/\.html/\.txt/g;
open (RESET, ">/home/wallakin/LINDA/ligands/iuphar/data/links/$output") or die "Can't reset";
close RESET;
#open (DATA, ">>/home/wallakin/LINDA/ligands/iuphar/data/links/$output") or die "Can't append to file";
foreach my $ts ($te->tables)
{
foreach my $row ($ts->rows)
{
$le->parse($row->[0]);
for my $link_tag ( $le->links )
{
my %links = #$link_tag;
print #$link_tag, "\n";
}
}
}
#print "Links extracted from $_\n";
}
I've tried using some sample code from another thread on this site (Perl parse links from HTML Table) to no avail. I'm not sure whether it's a problem of parsing or table recognition. Any help provided would be greatly appreciated. Thanks!
Try this as a base script (you only need to adapt it to fetch links) :
use warnings; use strict;
use HTML::TableExtract;
use HTML::LinkExtor;
use WWW::Mechanize;
use utf8;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
my $m = WWW::Mechanize->new( autocheck => 1, quiet => 0 );
$m->agent_alias("Linux Mozilla");
$m->cookie_jar({});
my $te = HTML::TableExtract->new(
headers => [
"Ligand",
"Sp.",
"Action",
"Affinity",
"Units",
"Reference"
]
);
$te->parse(
$m->get("http://tinyurl.com/jvwov9m")->content
);
foreach my $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print join(',', #$row), "\n";
}
}
You don't describe what the problem is...what exactly doesn't work? What does $row->[0] contain? But part of the problem might be that TableExtract returns just the 'visible' text, not the raw html, by default. You probably want to use the keep_html option in HTML::TableExtract.

Parse specific text from html using Perl

I have an html page that has particular text that I want to parse into a databse using a Perl Script.
I want to be able to strip off all the stuff I don't want, an exmple of the html is-
<div class="postbody">
<h3><a href "foo">Re: John Smith <span class="posthilit">England</span></a></h3>
<div class="content">Is C# better than Visula Basic?</div>
</div>
Therefore I would want to import into the database
Name: John Smith.
Lives in: England.
Commented: Is C# better than Visula Basic?
I have started to create a Perl script but it needs to be changed to work for what I want;
use DBI;
open (FILE, "list") || die "couldn't open the file!";
open (F1, ">list.csv") || die "couldn't open the file!";
print F1 "Name\|Lives In\|Commented\n";
while ($line=<FILE>)
{
chop($line);
$text = "";
$add = 0;
open (DATA, $line) || die "couldn't open the data!";
while ($data=<DATA>)
{
if ($data =~ /ds\-div/)
{
$data =~ s/\,//g;
$data =~ s/\"//g;
$data =~ s/\'//g;
$text = $text . $data;
}
}
#p = split(/\\/, $line);
print F1 $p[2];
print F1 ",";
print F1 $p[1];
print F1 ",";
print F1 $p[1];
print F1 ",";
print F1 "\n";
$a = $a + 1;
Any input would be greatly appreciated.
Please do not use regular expressions to parse HTML as HTML is not a regular language. Regular expressions describe regular languages.
It is easy to parse HTML with HTML::TreeBuilder (and its family of modules):
#!/usr/bin/env perl
use warnings;
use strict;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_content(
do { local $/; <DATA> }
);
for ( $tree->look_down( 'class' => 'postbody' ) ) {
my $location = $_->look_down( 'class' => 'posthilit' )->as_trimmed_text;
my $comment = $_->look_down( 'class' => 'content' )->as_trimmed_text;
my $name = $_->look_down( '_tag' => 'h3' )->as_trimmed_text;
$name =~ s/^Re:\s*//;
$name =~ s/\s*$location\s*$//;
print "Name: $name\nLives in: $location\nCommented: $comment\n";
}
__DATA__
<div class="postbody">
<h3>Re: John Smith <span class="posthilit">England</span></h3>
<div class="content">Is C# better than Visual Basic?</div>
</div>
Output
Name: John Smith
Lives in: England
Commented: Is C# better than Visual Basic?
However, if you require much more control, have a look at HTML::Parser as has already been answered by ADW.
Use an HTML parser, like HTML::TreeBuilder to parse the HTML--don't do it yourself.
Also, don't use two-arg open with global handles, don't use chop--use chomp (read the perldoc to understand why). Find yourself a newer tutorial. You are using a ton of OLD OLD OLD Perl. And damnit, USE STRICT and USE WARNINGS. I know you've been told to do this. Do it. Leaving it out will do nothing but buy you pain.
Go. Read. Modern Perl. It is free.
my $page = HTML::TreeBuilder->new_from_file( $file_name );
$page->elementify;
my #posts;
for my $post ( $page->look_down( class => 'postbody' ) ) {
my %post = (
name => get_name($post),
loc => get_loc($post),
comment => get_comment($post),
);
push #posts, \%post;
}
# Persist #posts however you want to.
sub get_name {
my $post = shift;
my $name = $post->look_down( _tag => 'h3' );
return unless defined $name;
$name->look_down->(_tag=>'a');
return unless defined $name;
$name = ($name->content_list)[0];
return unless defined $name;
$name =~ s/^Re:\s*//;
$name =~ /\s*$//;
return $name;
}
sub get_loc {
my $post = shift;
my $loc = $post->look_down( _tag => 'span', class => 'posthilit' );
return unless defined $loc;
return $loc->as_text;
}
sub get_comment {
my $post = shift;
my $com = $post->look_down( _tag => 'div', class => 'content' );
return unless defined $com;
return $com->as_text;
}
Now you have a nice data structure with all your post data. You can write it to CSV or a database or whatever it is you really want to do. You seem to be trying to do both.
You'd be much better using the HTML::Parser module from the CPAN.

Can I use perltidy's HTML formatter in my automated Perl build?

I'm using Module::Build to perform build, test, testpod, html, & install actions on my Perl module that I'm developing. The HTML files that are generated are okay, but I'd be much happier if I could somehow configure Module::Build to use the perltidy -html formatting utility instead of its own HTML formatter.
Anyone know of a way I can replace the HTML formatter that comes with Module::Build with the prettier perltidy HTML formatter?
Addendum: When I said "replace" above, that was probably misleading. I don't really want to write code to replace the html formatter that comes with Module::Build. I really want to know if Module::Build has any other HTML formatter options. The HTML it generates is so plain and generic looking. It's so boring. I like perltidy's output a lot.
Here is how I got it working right now in a build script that I wrote, but it's totally a hack ... falling out to the command line perltidy script:
use strict;
use warnings;
# get list of files in directory
my $libLocation = "lib/EDF";
opendir( DIR, $libLocation );
my #filenameArray = readdir(DIR);
# iterate over all files to find *.pm set
for my $file (#filenameArray) {
if ( $file =~ m/ # matching regex
\. # literal period character
pm # the pm file extenstion
/x # end of regex
)
{
my $return = `perl D:/Perl/site/bin/perltidy -q --indent-columns=4 --maximum-line-length=80 -html -opath blib/libhtml2 -toc $libLocation/$file`;
if ($return eq "") {
print "HTMLized " . $file . "\n";
}
else {
print "Error: " . $return . "\n";
}
}
}
But I was really hoping there was a way to use Module::Build and just tell it with a flag or an argument or whatever to tell it to use a different HTML formatter. I guess that's a pipe dream, though:
use strict;
use warnings;
use Module::Build;
my $build = Module::Build->resume (
properties => {
config_dir => '_build',
},
);
$build->dispatch('build');
$build->dispatch('html', engine => 'perltidy');
or maybe:
$build->dispatch('htmltidy');
Well, the action is implemented in
htmlify_pods
in Module::Build::Base.
It should be possible to override that method.
Much Later ...
Here is my attempt (tested only once):
package My::Builder;
use strict;
use warnings;
use base 'Module::Build';
sub htmlify_pods {
my $self = shift;
my $type = shift;
my $htmldir = shift || File::Spec->catdir($self->blib, "${type}html");
require Module::Build::Base;
require Module::Build::PodParser;
require Perl::Tidy;
$self->add_to_cleanup('pod2htm*');
my $pods = $self->_find_pods(
$self->{properties}{"${type}doc_dirs"},
exclude => [ Module::Build::Base::file_qr('\.(?:bat|com|html)$') ] );
return unless %$pods; # nothing to do
unless ( -d $htmldir ) {
File::Path::mkpath($htmldir, 0, oct(755))
or die "Couldn't mkdir $htmldir: $!";
}
my #rootdirs = ($type eq 'bin') ? qw(bin) :
$self->installdirs eq 'core' ? qw(lib) : qw(site lib);
my $podpath = join ':',
map $_->[1],
grep -e $_->[0],
map [File::Spec->catdir($self->blib, $_), $_],
qw( script lib );
foreach my $pod ( keys %$pods ) {
my ($name, $path) = File::Basename::fileparse($pods->{$pod},
Module::Build::Base::file_qr('\.(?:pm|plx?|pod)$'));
my #dirs = File::Spec->splitdir( File::Spec->canonpath( $path ) );
pop( #dirs ) if $dirs[-1] eq File::Spec->curdir;
my $fulldir = File::Spec->catfile($htmldir, #rootdirs, #dirs);
my $outfile = File::Spec->catfile($fulldir, "${name}.html");
my $infile = File::Spec->abs2rel($pod);
next if $self->up_to_date($infile, $outfile);
unless ( -d $fulldir ){
File::Path::mkpath($fulldir, 0, oct(755))
or die "Couldn't mkdir $fulldir: $!";
}
my $path2root = join( '/', ('..') x (#rootdirs+#dirs) );
my $htmlroot = join( '/',
($path2root,
$self->installdirs eq 'core' ? () : qw(site) ) );
my $fh = IO::File->new($infile) or die "Can't read $infile: $!";
my $abstract = Module::Build::PodParser->new(fh => $fh)->get_abstract();
my $title = join( '::', (#dirs, $name) );
$title .= " - $abstract" if $abstract;
my %opts = (
argv => join(" ",
qw( -html --podflush ),
"--title=$title",
'--podroot='.$self->blib,
"--htmlroot=$htmlroot",
"--podpath=$podpath",
),
source => $infile,
destination => $outfile,
);
if ( eval{Pod::Html->VERSION(1.03)} ) {
$opts{argv} .= ' --podheader';
$opts{argv} .= ' --backlink=Back to Top';
if ( $self->html_css ) {
$opts{argv} .= " --css=$path2root/" . $self->html_css;
}
}
$self->log_info("HTMLifying $infile -> $outfile\n");
$self->log_verbose("perltidy %opts\n");
Perl::Tidy::perltidy(%opts); # or warn "pod2html #opts failed: $!";
}
}
1;
** To use it .. **
#!/usr/bin/perl
use strict;
use warnings;
use My::Builder;
my $builder = My::Builder->new(
module_name => 'My::Test',
license => 'perl',
);
$builder->create_build_script;
It's very easy to define new Module::Build actions that you can call with dispatch, and there are plenty of examples in the Module::Build documentation. Define an action to handle your new step:
sub ACTION_htmltidy
{
my( $self ) = #_;
$self->depends_on( ...other targets... );
require Perl::Tidy;
...do your damage...
}
If you want another action to use yours, you can extend it so you can make the dependency:
sub ACTION_install
{
my( $self ) = #_;
$self->depends_on( 'htmltidy' );
$self->SUPER::install;
}