Problems parsing Reddit's JSON - json

I'm working on a perl script that parses reddit's JSON using the JSON module.
However I do have the problem of being very new to both perl and json.
I managed to parse the front page and subreddits successfully, but the comments have a different structure and I can't figure out how to access the data I need.
Here's the code that successfully finds the "data" hash for the front page and subreddits:
foreach my $children(#{$json_text->{"data"}->{"children"}}) #For values of children.
{
my $data = $children->{"data"}; #accessing each data hash.
my %phsh = (); #my hash to collect and print.
$phsh{author} = $data->{"author"};#Here I get the "author" value from "data"
*Etc....
This successfully gets what I need from http://www.reddit.com/.json
But when I go to the json of a comment, this one for example, it has a different format and I can't figure out how to parse it. If I try the same thing as before my parser crashes, saying it is not a HASH reference.
So my question is: How do access the "children" in the second JSON? I need to get both the data for the Post and the data for the comments. Can anybody help?
Thanks in advance!
(I know it may be obvious, but I'm running on very little sleep XD)

You need to either look at the JSON data or dump the decoded data to see what form it takes. The comment data, for example is an array at the top level.
Here is some code that prints the body field of all top-level comments. Note that a comment may have an array of replies in its replies field, and each reply may also have replies in turn.
Depending on what you want to do you may need to check whether a reference is to an array or a hash by checking the value returned by the ref operator.
use strict;
use warnings;
binmode STDOUT, ':utf8';
use JSON;
use LWP;
use Data::Dump;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.reddit.com/r/funny/comments/wx3n5/caption_win.json');
die $resp->status_line unless $resp->is_success;
my $json = $resp->decoded_content;
my $data = decode_json($json);
die "Error: $data->{error}" if ref $data eq 'HASH' and exists $data->{error};
dd $data->[1]{data}{children}[0];
print "\n\n";
my $children = $data->[1]{data}{children};
print scalar #$children, " comments:\n\n";
for my $child (#$children) {
print $child->{data}{body}, "\n";
}

Related

Grepping data from an html file gives some random value

I have a XML file using which I am grepping some of the value based on some regex.
The XML file looks like this-
<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
I want to retrieve 17'h00baf value from "SomeAddr" tag. I am matching the regex "SomeAddr" so as to reach that row in the file and then using index and substr function I am retrieving value using below code
my $i = index($row,">");
my $j = index($row,"<");
$Size_in_bits = substr $row,$i+1,$j-$i-3;
But after doing this I am not getting 17'h00baf . Instead I am getting 17'h01191 . On similar approach I am able to grep other values which are decimal or string,Only with the hexadecimal values I am facing this problem. Can somebody please tell me what is wrong in the approach??
Please don't parse XML with regexes. Use a proper XML parser.
But, ignoring that advice temporarily, I don't get the behaviour you describe when testing your code.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
next unless /<SomeAddr>/;
my $i = index($_, ">");
my $j = index($_, "<");
my $Size_in_bits = substr $_, $i + 1, $j - $i - 3;
say $Size_in_bits;
}
__END__
<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
And running it:
$ perl parsexml
17'h00baf
Of course, I've had to guess at what a lot of your code looks like because you didn't give us a complete example to test. So it looks likely that your problems are in bits of the code that you haven't shown us.
(My guess would be that there's another <SomeAddr> tag in the file somewhere.)
Never, ever use a regex to parse HTML/XML/.... Always use a proper parser and then implement your algorithm in the DOM domain.
My solution shows how to parse the XML and then extract the text content from <SomeAddr> nodes at the top-level of the XML document.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(IO => \*DATA);
my $xpc = XML::LibXML::XPathContext->new();
# register default NS
$xpc->registerNs('default', 'http://some.domain.com/some/path/to');
foreach my $node ($xpc->findnodes('//default:SomeAddr', $doc)) {
print $node->textContent, "\n";
}
exit 0;
__DATA__
<Root xmlns="http://some.domain.com/some/path/to">
<Instance>Fuse_Name</Instance>
<Id>8'hed</Id>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>
</Root>
Test run
$ perl dummy.pl
17'h00baf

Perl: Accessing part of a JSON query

I have been writing part of a website I'm making, part of the stats page will display information about a websites Json response.
The address of the website is: http://steamcommunity.com/market/listings/440/Name%20Tag/render/?count=1&start=1&query=.
Here is a link to a parser so the code is easier to read http://json.parser.online.fr/.
The code I have written so far works but no matter what i try I cant get the information I need.
use JSON::XS;
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::Simple;
use strict;
use warnings;
my $url = "http://steamcommunity.com/market/listings/440/Name%20Tag/render/?count=2&start=2";
my $json = get $url;
my $data = decode_json $json;
my $info = $data -> {listinginfo};
My problem is that i would like to access the price of the listing however when new listings are made available the reference for them changes. I have no idea how to deal with this and Google is not helping. Any help would be greatly appreciated, thanks in advance.
Seb Morris.
EDIT: Thanks for the replies, I have progressed my code and ended up with:
my $data = decode_json $json;
my #infoids = keys %{$data -> {listinginfo}};
foreach my $infoid (#infoids) {
my $price = $data -> {listinginfo}{$infoid}{converted_price};
print "$price" . "\n";
}
However I am getting the error: Use of uninitialized value $price in string at line 30. I dont understand why I am getting this error as I have declared the variable. Any help would be really appreciated.
If I understand, your problem is that the listinginfo object contains key(s) which change for each request, and you don't know to find out what the key is for the request you just made.
You can find the keys to a perl hash using the 'keys' function. So you can get all of the keys of the listinginfo hash like this:
my #infoids = keys %{$data -> {listinginfo}};
Note the need to use %{ } to de-reference listinfo, which is itself a hash reference.
There could be more than one info ID, although when I tested the web service you linked in your question it only ever returned one. If you are sure there will only ever be one, you can use:
my $price = $data -> {listinginfo}{$infoids[0]}{price};
Or, if there might be more than one, you can loop through them:
foreach my $infoid (#infoids) {
my $price = $data -> {listinginfo}{$infoids[0]}{price};
# Now do something with price
}

perl script to create xml from mysql query - out of memory

I need to generate an XML file from database records, and I get the error "out of memory". Here's the script I am using, it's found on Google, but it's not suitable for me, and it's also killing the server's allocated memory. It's a start though.
#!/usr/bin/perl
use warnings;
use strict;
use XML::Simple;
use DBI;
my $dbh = DBI->connect('DBI:mysql:db_name;host=host_address','db_user','db_pass')
or die DBI->errstr;
# Get an array of hashes
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
# Convert to XML where each hash element becomes an XML element
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
print $xml;
$dbh->disconnect;
This script only prints the records, because I tested with a where clause for a single row id.
First of all, I couldn't manage to make it to save the output to a file.xml.
Second, I need somehow to split the "job" in multiple jobs and then put together the XML file all in one piece.
I have no idea how to achieve both.
Constraint: No access to change server settings.
These are problem lines:
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
This reads the whole table into memory, representing every single row as an array of values.
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
This is probably even larger structure, it is a the whole XML string in one go.
The lowest memory-use solution needs to involve loading the table one item at a time, and printing that item out immediately. In DBI, it is possible to make a query so that you fetch one row at a time in a loop.
You will need to play with this before the result looks like your intended output (I haven't tried to match your XML::Simple output - I'm leaving that to you:
print "<records>\n";
my $sth = $dbh->prepare('SELECT * FROM my_table');
$sth->execute;
while ( my $row = $sth->fetchrow_arrayref ) {
# Convert db row to XML row
print XMLout( {row => $row}, NoAttr => 1 ),"\n";
}
print "</records>\n";
Perl can use e.g. open( FILEHANDLE, mode, filename ) to start access to a file and print FILEHANDLE $string to print to it, or you could just call your script and pipe it to a file e.g. perl myscript.pl > table.xml
It's the select * with no contraints that will be killing your memory. Add some constraint to your query ie date or id and use a loop to execute the query and do your output in chunks. That way you won't need to load the whole table in mem before your get started on the output.

Trouble getting Google Books API data via PHP

The current code takes form input and does THIS to it:
$apikey = 'myapikey';
$q = urlencode($bookSearchTerm);
$endpoint = 'https://www.googleapis.com/books/v1/volumes?q=' . $q . '&key=' . $apikey;
$session = curl_init($endpoint);
curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($session);
curl_close($session);
$search_results = json_decode($data);
if ($search_results === NULL) die('Error parsing json');
Just for kicks, I also did
echo $endpoint;
which shows
https://www.googleapis.com/books/v1/volumes?q=lord+of+the+rings&key=myapikey
When I enter that URL into a browser, I get a screen full o' data, telling me that, among other things, there are 814 items.
Yet when I run the code on a page, I get
Error parsing json
Can someone show me where I'm going wrong?
Thanks in advance.
By the response to the comment, it could be set, maybe not. It may also be the case that what is returned isn't parse-able by the parser because it isn't in the right data format. Check what $data gives back as a result. If it's correct then by the json_decode doc on the PHP site it may be that it's simply too big for the parser to parse (reaches recursion limit) though I doubt that.
It it is possible that the what is return just overflows the PHP allocated memory limit. PHP parses JSON into associative or numbered arrays, which can get expensive. So if what you get back is just too big, it'll fail to finish parsing and just return null.

Is there a neat way to serialize a Perl hash into an HTML querystring

I have a perl script using CGI.
The browser calls it with some parameters.
I want to take those parameters, modify some of them and then send back a redirect with a new querystring representing the modified parameters.
I know that I could do this, like this:
my $cgi = CGI->new()
my %vars = $cgi->Vars
$vars{'modify_me'} .=' more stuff';
my $serialized = join '&', map {$_.'='.$cgi->escapeHTML($vars{$_})} keys %vars;
However, this just feels like it might be missing something. In addition, it doesn't do anything to handle multivalued parameters. Who knows what else it fails to do.
So, is there a module out there that just deals with this problem? I'm not interested in reinventing a wheel that a more talented wright wrought. Right?
The URI module is your friend. It has a query_form method that takes a hash, hashref or arrayref of parameters and generates a query string from it.
It will URL Encode your data for you (and note that you do want it URL Encoded and not HTML Encoded).
So you might have something like:
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use URI;
my $q = CGI->new;
my #data = map {
my $name = $_;
my #values = $q->param($name);
my $value;
if (scalar #values == 1) {
($value) = #values;
} else {
$value = \#values;
}
if ($name eq "foo") {
$value = "replaced";
}
($name, $value);
} $q->param;
my $uri = URI->new('http://example.com/myAlternative.cgi');
$uri->query_form(\#data);
print $q->redirect(
-uri=> $uri,
-status => 301
);
Have you looked at Data::URIEncode or URI::QueryParam?
Turns out, there's a way to achieve my specific need using just the CGI module. However, the other answers cover a wider need, to serialize an arbitrary hash.
If you want to modify incoming parameters and then create a link to the same script with modified parameters you can do this:
my $params = $cgi->Vars;
$ Modify the values in hash that $params references
my $new_url = $cgi->self_url(); # URL with modified parameters