Below listed program fails with the following error:
JSON text must be an object or array (but found number, string, true, false or null, use allow_nonref to allow this) at json_test.pl line 10.
Works fine when I comment out thread startup/join, or when JSON is parsed before thread is run.
Message seems to be coming from JSON library, so I suppose something is wrong with it.
Any ideas what's going on and how to fix it?
# json_test.pl
use strict;
use warnings;
use threads;
use JSON;
use Data::Dumper;
my $t = threads->new(\&DoSomething);
my $str = '{"category":"dummy"}';
my $json = JSON->new();
my $data = $json->decode($str);
print Dumper($data);
$t->join();
sub DoSomething
{
sleep 10;
return 1;
}
JSON uses JSON::XS if installed which is not compatible with Perl threads (please don't take the author's words at face value - threads are discouraged and difficult to use effectively, but not deprecated and there are no plans to remove them). The community-preferred fork Cpanel::JSON::XS is thread safe and will be used by JSON::MaybeXS by default, which is a mostly drop-in replacement for JSON.
Related
I have written Perl code that was working until recently, when I tried to run it again. The problem seems to originate from the JSON::XS "decode_json" method.
Code Snippet:
use warnings;
use strict;
use MooseX::Singleton;
use Array::Utils qw(:all);
use Data::Dumper;
use JSON::XS qw(encode_json decode_json);
use Storable;
use Tie::IxHash;
open (my $observations_fh, '<', 'observations.json') or die "Could not open observations.json\n";
my $observations_json = <$obserations_fh>;
my #decoded_observations = #{decode_json($observations_json)};
Usually, after this code I was able to go through each JSON component in a for loop and take specific information, but now I get the error:
, or ] expected while parsing array, at character offset 5144816
(before "(end of string)")
I saw a similar question here, but it didn't resolve my problem.
I also have similar json decoding going on that doesn't utilize #{decode_json($variable)}, but when I tried that with this observations.json file, the same error was output.
I also tried just using the JSON module, but same error occurred.
Any insight would be greatly appreciated!
-cookersjs
That probably indicates you have incomplete JSON in $observations_json. Your assumption that the entire file consists of just one line is probably incorrect. Use
my $observations;
{
open (my $observations_fh, '<', 'observations.json')
or die("Can't open observations.json: $!\n");
local $/;
my $observations_json = <$obserations_fh>;
$observations = decode_json($observations_json);
}
If that doesn't help, observations.json doesn't contain valid JSON.
I'm relatively new to Perl and trying to self teach. However i have read all of the related threads on this page and others and none of them seem to work for me.
Below is my code - trying to get a lot of data from a webpage in Perl format and export it to update values in an SQL table.
Currently i can't even data dumper the results of the url out.
Any help would be great.
#!/usr/bin/perl
#
use LWP::Simple;
use warnings;
use strict;
use JSON qw( decode_json from_json );
use LWP::Simple;
use Data::Dumper;
use utf8;
my $url = "http://.sensitivedata.txt";
my #json= from_json(get ( $url ));
die "Couldn't get $url" if not defined #json;
##my $decoded_json = decode_json( #json);
print Dumper #json;
exit 0;
This is the error message it is giving me:
defined(#array) is deprecated at alarms.pl line 14.
(Maybe you should just omit the defined()?)
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at /opt/csw/share/perl/csw/JSON.pm line 168
The error message is pretty clear about a) what the problem is and b) how to get rid of it.
defined(#array) is deprecated at alarms.pl line 14. (Maybe you should
just omit the defined()?)
Calling defined() on #json is pointless. You're really just checking to see if there is any data in the array so replace if not defined #json with if not #json.
That will get rid of the error message. But you'll still have a problem as your program will almost certainly now die on the same line with the error message "Couldn't get http://.sensitivedata.txt". And that's probably not an accurate error message.
The problem is that this error can be caused by two problems. Either you can't get the data or you can't parse the data. Your error message only mentions one of these possibilities. Better to split the error checking into two.
# Step 1: Get the data
my $raw_json = get($url);
die "Can't get data from $url" unless $raw_json;
# Step 2: Parse the data
my #json = from_json($raw_json);
if (!#json) {
warn $raw_json;
die "Can't parse data from $url";
}
With code more like this, you'll be able to see what the problem is.
There's another little problem here, so to pre-empt your next question...
from_json always returns a scalar. It will either be a hash reference or an array reference (depending on the JSON you get). Looks like you're expecting an array. You'll need to store the reference in a scalar and dereference it.
my $json_array_ref = from_json($raw_json);
if (!#$json_array_ref) {
warn $raw_json;
die "Can't parse data from $url";
}
my #json = #$json_array_ref;
What I am trying to do should be VERY straightforward and simple.
use JSON;
use YAML;
use Data::Dumper;
my $yaml_hash = YAML::LoadFile("data_file.yaml");
print ref($yaml_hash) # prints HASH as expected
print Dumper($yaml_hash) # correctly prints the hash
my $json_text = encode_json($yaml_hash);
The encode_json errors out saying:
cannot encode reference to scalar 'SCALAR(0x100ab630)' unless the scalar is 0 or 1
I am not able to understand why encode_json thinks that $yaml_hash is a reference to a scalar when in fact it is a reference to a HASH
What am I doing wrong?
It is not $yaml_hash that it is complaining about, it is some reference in one of the hash values (or deeper). Scalar references can be represented in YAML but not in JSON.
YAML enables you to load objects and scalar references. JSON does not by default
I suspect that your data file most likely contains an inside-out object, and JSON doesn't know how to work with the scalar reference.
The following demonstrates loading a YAML hash containing a scalar reference in one of the values and then failing to encode it using JSON:
use strict;
use warnings;
use YAML;
use JSON;
# Load a YAML hash containing a scalar ref as a value.
my ($hashref) = Load(<<'END_YAML');
---
bar: !!perl/ref
=: 17
foo: 1
END_YAML
use Data::Dump;
dd $hashref;
my $json_text = encode_json($hashref);
Output:
{ bar => \17, foo => 1 }
cannot encode reference to scalar at script.pl line 18.
Here are one liners that can be used to pipe YAML in and produce JSON on STDOUT
perl -0777 -MYAML -MJSON -e 'print(JSON->new()->utf8()->pretty()->encode(Load(<STDIN>)))'
or even shorter if you don't care for formatting
perl -0777 -MYAML -MJSON -e 'print encode_json(Load(<STDIN>))'
For large volumes and faster parsing I'd also recommend using YAML::XS and JSON::XS counterparts
i have an html page that contain urls like :
<h3><a href="http://site.com/path/index.php" h="blablabla">
<h3><a href="https://www.site.org/index.php?option=com_content" h="vlavlavla">
i want to extract :
site.com/path
www.site.org
between <h3><a href=" & /index.php .
i've tried this code :
#!/usr/local/bin/perl
use strict;
use warnings;
open (MYFILE, 'MyFileName.txt');
while (<MYFILE>)
{
my $values1 = split('http://', $_); #VALUE WILL BE: www.site.org/path/index2.php
my #values2 = split('index.php', $values1); #VALUE WILL BE: www.site.org/path/ ?option=com_content
print $values2[0]; # here it must print www.site.org/path/ but it don't
print "\n";
}
close (MYFILE);
but this give an output :
2
1
2
2
1
1
and it don't parse https websites.
hope you've understand , regards.
The main thing wrong with your code is that when you call split in scalar context as in your line:
my $values1 = split('http://', $_);
It returns the size of the list created by the split. See split.
But I don't think split is appropriate for this task anyway. If you know that the value you are looking for will always lie between 'http[s]://' and '/index.php' you just need a regex substitution in your loop (you should also be more careful opening your file...):
open(my $myfile_fh, '<', 'MyFileName.txt') or die "Couldn't open $!";
while(<$myfile_fh>) {
s{.*http[s]?://(.*)/index\.php.*}{$1} && print;
}
close($myfile_fh);
It's likely you will need a more general regex than that, but I think this would work based on your description of the problem.
This feels to me like a job for modules
HTML::LinkExtor
URI
Generally using regexps to parse HTML is risky.
dms explained in his answer why using split isn't the best solution here:
It returns the number of items in scalar context
A normal regex is better suited for this task.
However, I do not think that line-based processing of the input is valid for HTML, or that using a substitution makes sense (it does not, especially when the pattern looks like .*Pattern.*).
Given an URL, we can extract the required information like
if ($url =~ m{^https?://(.+?)/index\.php}s) { # domain+path now in $1
say $1;
}
But how do we extract the URLs? I'd recommend the wonderful Mojolicious suite.
use strict; use warnings;
use feature 'say';
use File::Slurp 'slurp'; # makes it easy to read files.
use Mojo;
my $html_file = shift #ARGV; # take file name from command line
my $dom = Mojo::DOM->new(scalar slurp $html_file);
for my $link ($dom->find('a[href]')->each) {
say $1 if $link->attr('href') =~ m{^https?://(.+?)/index\.php}s;
}
The find method can take CSS selectors (here: all a elements that have an href attribute). The each flattens the result set into a list which we can loop over.
As I print to STDOUT, we can use shell redirection to put the output into a wanted file, e.g.
$ perl the-script.pl html-with-links.html >only-links.txt
The whole script as a one-liner:
$ perl -Mojo -E'$_->attr("href") =~ m{^https?://(.+?)/index\.php}s and say $1 for x(b("test.html")->slurp)->find("a[href]")->each'
In Perl, using module WWW::Mechanize (required, not other module), is it possible to "parse" document from string variable, instead of url?
I mean instead of
$mech->get($url);
to do something like
$html = '<html...';
$mech->???($html);
Possible?
You could write the data to disk and then get() it in the usual manner. Something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp;
use URI::File;
use WWW::Mechanize;
my $data = '<html><body>foo</body></html>';
# write the data to disk
my $fh = File::Temp->new;
print $fh $data;
$fh->close;
my $mech = WWW::Mechanize->new;
$mech->get( URI::file->new( $fh->filename ) );
print $mech->content;
prints: <html><body>foo</body></html>
Got it:
$mech->get(0);
$mech->update_html('<html>...</html>');
It works!
Not really. You could try getting the HTTP::Response object using $mech->response and then using that object's content method to replace the content with your own string. But you would have to adjust all the message headers as well and it would get quite messy.
What is it that you want to do? The methods like forms and images that WWW::Mechanize provides are based on other modules and are fairly simple to code.