Trouble with "unblessed reference" when using HTML::TokeParser - html

I've been poking a this and can't get around this "unblessed reference" error. Here's my simplified code:
#!/usr/local/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
my $p = HTML::TokeParser->new( $ARGV[0] );
while (my $t = $p->get_tag('img')) {
my $src = $t->get_attr('src');
print "$src\n";
}
And here's the error message when I try it:
Can't call method "get_attr" on unblessed reference at M:\list_images_in_html.pl line 9.
I gather that somehow it's not recognizing $t as a token object with a get_attr method, but I don't understand why.

According to the manual (HTML::TokeParse at MetaCPAN), get_tag() returns an array reference, not an object.
You cannot call get_attr() on a bog standard array ref.

get_attr is a convenience method in HTML::TokeParser::Simple (a wrapper for HTML::TokeParser) but does not exist in HTML::TokeParser.
Replace two lines in your code with this:
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( $ARGV[0] );
and your script will work.

Related

Grepping data from an html file gives some random value

I have a XML file using which I am grepping some of the value based on some regex.
The XML file looks like this-
<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
I want to retrieve 17'h00baf value from "SomeAddr" tag. I am matching the regex "SomeAddr" so as to reach that row in the file and then using index and substr function I am retrieving value using below code
my $i = index($row,">");
my $j = index($row,"<");
$Size_in_bits = substr $row,$i+1,$j-$i-3;
But after doing this I am not getting 17'h00baf . Instead I am getting 17'h01191 . On similar approach I am able to grep other values which are decimal or string,Only with the hexadecimal values I am facing this problem. Can somebody please tell me what is wrong in the approach??
Please don't parse XML with regexes. Use a proper XML parser.
But, ignoring that advice temporarily, I don't get the behaviour you describe when testing your code.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
next unless /<SomeAddr>/;
my $i = index($_, ">");
my $j = index($_, "<");
my $Size_in_bits = substr $_, $i + 1, $j - $i - 3;
say $Size_in_bits;
}
__END__
<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
And running it:
$ perl parsexml
17'h00baf
Of course, I've had to guess at what a lot of your code looks like because you didn't give us a complete example to test. So it looks likely that your problems are in bits of the code that you haven't shown us.
(My guess would be that there's another <SomeAddr> tag in the file somewhere.)
Never, ever use a regex to parse HTML/XML/.... Always use a proper parser and then implement your algorithm in the DOM domain.
My solution shows how to parse the XML and then extract the text content from <SomeAddr> nodes at the top-level of the XML document.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(IO => \*DATA);
my $xpc = XML::LibXML::XPathContext->new();
# register default NS
$xpc->registerNs('default', 'http://some.domain.com/some/path/to');
foreach my $node ($xpc->findnodes('//default:SomeAddr', $doc)) {
print $node->textContent, "\n";
}
exit 0;
__DATA__
<Root xmlns="http://some.domain.com/some/path/to">
<Instance>Fuse_Name</Instance>
<Id>8'hed</Id>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
</Root>
Test run
$ perl dummy.pl
17'h00baf

Perl - Print div by class

I need to print a specific div with class productSpecs from a webpage. Here is my code.
use strict;
use LWP::Simple;
use HTML::TreeBuilder::XPath qw();
my $url="http://www.flipkart.com/samsung-b310e-guru-music-2/p/itmdz9am8xehucbx";
my $content = get($url);
my $t = HTML::TreeBuilder::XPath->new;
$t->parse($content);
my $rank = $t->findvalue('//*[#class="productSpecs"]');
print $rank;
But I am not getting the content I want. What is wrong with my code?
Inspecting the HTML code you are trying to parse, the required div node has this declaration:
<div class="productSpecs specSection">
so your code should be:
my $rank = $t->findnodes('//div[#class="productSpecs specSection"]');
Just for comparison I tried this with Mojolicious using the ojo tool (great for oneliners) and it seems Mojo::DOM returns the HTML by default unless you ask for the text with a ->text() method. e.g. this seems to do what you want:
perl -Mojo -E 'g("http://www.flipkart.com/samsung-b310e-guru-music-2/p/itmdz9am8xehucbx")
->dom->find("div.productSpecs")->each(sub{say $_})'
cheers,
Hi user2186465 and welcome to Stack Exchange :-)
When you assign and print the output fromHTML::TreeBuilder::XPath's findnodes->() method it seems to default to parsing/rendering the <div> node and returning the content as text. Along with that it returns an XML::XPathEngine::NodeSet object (which HTML::TreeBuilder::XPath uses) and an array with a reference to an HTML::Tree object that has what you want. You need to assign that array element reference to your $rank variable or else you'll just get the text:
my $rank = $t->findnodes('//div[#class="productSpecs specSection"]')->[0];
(NB: this appears somewhere in the documentation as an example, but it is not prominent). Once you have the HTML::Element object you can use one of its methods with your print statement to get at the contents.
Without the ->[0] you get the rendered text and print $rank just shows that; but with ->[0] you get access to the object and its methods so print $rank->as_HTML can show the raw HTML content from the node (->as_XML works as well). HTML::TreeBuilder::XPath also has a as_XML_indented convenience method to make the output easier to read. So:
use strict;
use LWP::Simple;
use HTML::TreeBuilder::XPath qw();
my $url="http://www.flipkart.com/samsung-b310e-guru-music-2/p/itmdz9am8xehucbx";
my $content = get($url);
my $t = HTML::TreeBuilder::XPath->new;
$t->parse($content);
my $rank = $t->findnodes('//div[#class="productSpecs specSection"]')->[0];
print $rank->as_XML_indented ;
should do what you want.
HTH

Perl: Accessing part of a JSON query

I have been writing part of a website I'm making, part of the stats page will display information about a websites Json response.
The address of the website is: http://steamcommunity.com/market/listings/440/Name%20Tag/render/?count=1&start=1&query=.
Here is a link to a parser so the code is easier to read http://json.parser.online.fr/.
The code I have written so far works but no matter what i try I cant get the information I need.
use JSON::XS;
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::Simple;
use strict;
use warnings;
my $url = "http://steamcommunity.com/market/listings/440/Name%20Tag/render/?count=2&start=2";
my $json = get $url;
my $data = decode_json $json;
my $info = $data -> {listinginfo};
My problem is that i would like to access the price of the listing however when new listings are made available the reference for them changes. I have no idea how to deal with this and Google is not helping. Any help would be greatly appreciated, thanks in advance.
Seb Morris.
EDIT: Thanks for the replies, I have progressed my code and ended up with:
my $data = decode_json $json;
my #infoids = keys %{$data -> {listinginfo}};
foreach my $infoid (#infoids) {
my $price = $data -> {listinginfo}{$infoid}{converted_price};
print "$price" . "\n";
}
However I am getting the error: Use of uninitialized value $price in string at line 30. I dont understand why I am getting this error as I have declared the variable. Any help would be really appreciated.
If I understand, your problem is that the listinginfo object contains key(s) which change for each request, and you don't know to find out what the key is for the request you just made.
You can find the keys to a perl hash using the 'keys' function. So you can get all of the keys of the listinginfo hash like this:
my #infoids = keys %{$data -> {listinginfo}};
Note the need to use %{ } to de-reference listinfo, which is itself a hash reference.
There could be more than one info ID, although when I tested the web service you linked in your question it only ever returned one. If you are sure there will only ever be one, you can use:
my $price = $data -> {listinginfo}{$infoids[0]}{price};
Or, if there might be more than one, you can loop through them:
foreach my $infoid (#infoids) {
my $price = $data -> {listinginfo}{$infoids[0]}{price};
# Now do something with price
}

Passing a hashref to a sub

My code is as follows super simplistic but I am just not getting it to work as intended.
use strict;
use warnings;
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(-dubug);
use warnings;
use diagnostics;
use strict;
use JSON;
use Data::Dumper;
my $q = CGI->new;
my $data = $q->param('POSTDATA');
my $data_hash;
if (defined($data)) {
$data_hash = decode_json($data);
}
sub test {
my $return_hash = shift;
return \$return_hash;
}
my $return_to_print = test($data_hash);
print $q->header();
print "This is a test: \n";
print Dumper($return_to_print);
Basically I am sending json to the perl script, I decode the json into a hashref, then id like to pass that data to the test sub who just does nothing more than return it back so the cgi can print it, all while keeping its structure. So far I am unsuccessful and I am hoping someone can shed some light on how to properly write something like this.
So in the end dumper should print something like:
$VAR1 = { 'key' => 'value', 'key2' => 'value' };
Your code boils down to
my $data_hash = decode_json($data);
my $return_hash = $data_hash;
my $return_to_print = \$return_hash;
It should not be a surprise that $return_hash is different than $return_to_print. You assigned a reference to a scalar to $return_to_print rather than copying its value (a refernce to a hash). You would need the following for them to be the same
my $return_to_print = $return_hash;
Which is to say you'd need the following:
return $return_hash;

Problems parsing Reddit's JSON

I'm working on a perl script that parses reddit's JSON using the JSON module.
However I do have the problem of being very new to both perl and json.
I managed to parse the front page and subreddits successfully, but the comments have a different structure and I can't figure out how to access the data I need.
Here's the code that successfully finds the "data" hash for the front page and subreddits:
foreach my $children(#{$json_text->{"data"}->{"children"}}) #For values of children.
{
my $data = $children->{"data"}; #accessing each data hash.
my %phsh = (); #my hash to collect and print.
$phsh{author} = $data->{"author"};#Here I get the "author" value from "data"
*Etc....
This successfully gets what I need from http://www.reddit.com/.json
But when I go to the json of a comment, this one for example, it has a different format and I can't figure out how to parse it. If I try the same thing as before my parser crashes, saying it is not a HASH reference.
So my question is: How do access the "children" in the second JSON? I need to get both the data for the Post and the data for the comments. Can anybody help?
Thanks in advance!
(I know it may be obvious, but I'm running on very little sleep XD)
You need to either look at the JSON data or dump the decoded data to see what form it takes. The comment data, for example is an array at the top level.
Here is some code that prints the body field of all top-level comments. Note that a comment may have an array of replies in its replies field, and each reply may also have replies in turn.
Depending on what you want to do you may need to check whether a reference is to an array or a hash by checking the value returned by the ref operator.
use strict;
use warnings;
binmode STDOUT, ':utf8';
use JSON;
use LWP;
use Data::Dump;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.reddit.com/r/funny/comments/wx3n5/caption_win.json');
die $resp->status_line unless $resp->is_success;
my $json = $resp->decoded_content;
my $data = decode_json($json);
die "Error: $data->{error}" if ref $data eq 'HASH' and exists $data->{error};
dd $data->[1]{data}{children}[0];
print "\n\n";
my $children = $data->[1]{data}{children};
print scalar #$children, " comments:\n\n";
for my $child (#$children) {
print $child->{data}{body}, "\n";
}