I pass a utf8 encoded string from my command line into a Perl program:
> ./test.pl --string='ḷet ūs try ṭhiñgs'
which seems to recognize the string correctly:
use utf8;
GetOptions(
'string=s' => \$string,
) or die;
print Dumper($string);
print Dumper(utf8::is_utf8($string));
print Dumper(utf8::valid($string));
prints
$VAR1 = 'ḷet ūs try ṭhiñgs';
$VAR1 = '';
$VAR1 = 1;
When I store this string into a hash and call encode_json on it, the string seems to be again encoded whereas to_json seems to work (if I read the output correctly):
my %a = ( 'nāme' => $string ); # Note the Unicode character
print Dumper(\%a);
print Dumper(encode_json(\%a));
print Dumper(to_json(\%a));
prints
$VAR1 = {
"n\x{101}me" => 'ḷet ūs try ṭhiñgs'
};
$VAR1 = '{"nāme":"ḷet Å«s try á¹hiñgs"}';
$VAR1 = "{\"n\x{101}me\":\"\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs\"}";
Turning this back into the original hash, however, doesn't seem to work with either methods and in both cases hash and string and broken:
print Dumper(decode_json(encode_json(\%a)));
print Dumper(from_json(to_json(\%a)));
prints
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
A hash lookup $a{'nāme'} now fails.
Question: How do I handle utf8 encoding and strings and JSON encode/decode correctly in Perl?
You need to decode your input:
use Encode;
my $string;
GetOptions('string=s' => \$string) or die;
$string = decode('UTF-8', $string);
Putting it all together, we get:
use strict;
use warnings;
use 5.012;
use utf8;
use Encode;
use Getopt::Long;
use JSON;
my $string;
GetOptions('string=s' => \$string) or die;
$string = decode('UTF-8', $string);
my %hash = ('nāme' => $string);
my $json = encode_json(\%hash);
my $href = decode_json($json);
binmode(STDOUT, ':encoding(utf8)');
say $href->{nāme};
Example:
$ perl test.pl --string='ḷet ūs try ṭhiñgs'
ḷet ūs try ṭhiñgs
Make sure your source file is actually encoded as UTF-8!
I am sending a request to the endpoint url ,from there i am getting the response in case of success in form of JSON,but if it fails it return certain text .
Sending request:
$data->{response} = $self->{_http}->send($myData);
So before doing this:
$resp = from_json($data->{response});
i want to check whether the reponse is in json format or not .How we can handle this in Perl kindly help in this
You can catch exception thrown by from_json(),
my $resp;
my $ok = eval { $resp = from_json("{}"); 1 };
$ok or die "Not valid json";
or simpler,
my $resp = eval { from_json("rrr") };
$resp // die "Not valid json";
Use JSON or JSON::XS to decode the JSON into a Perl structure.
Simple example:
use strict;
use warnings;
use JSON::XS;
my $json = '[{"Year":"2012","Quarter":"Q3","DataType":"Other 3","Environment":"STEVE","Amount":125},{"Year":"2012","Quarter":"Q4","DataType":"Other 2","Environment":"MIKE","Amount":500}]';
my $arrayref = decode_json $json;
foreach my $item( #$arrayref ) {
# fields are in $item->{Year}, $item->{Quarter}, etc.
}
You could use a try/catch block using Try::Tiny
use Try::Tiny;
try {
$resp = from_json($data->{response});
} catch {
# Do something if it does not parse
warn 'Could not parse json'
};
I am new to perl and I writing a simple cgi script in perl read Json data.. that looks like this
use CGI;
use JSON;
use strict;
my $cgi = CGI->new;
my $error=0;
$cgi->param();
my $data = $cgi->param('POSTDATA') || '{
"field1":"value1",
"field2":"value2"
}'; # Used a sample JSON with
my $json = JSON->new->utf8;
my $input = $json->decode ($data) || $error++;
my #errors=();
my %slots;
$slots{'coloumn1'} = $input->{'field1'} || $error++;
$slots{'coloumn2'} = $input->{'field2'} || $error++; # if there is no field2 in JSON it will increment value
if ( $error > 0) {
print $cgi->header('text/html','400 Bad Data');
print "error with $data ";
exit;
}
How can i do more sophisticated error handling for JSON to check if its a valid json string.....and raise exception if a field is missing in JSON string using eval or other methods??
I'm usually no Perl coder. However I've got to complete this task.
The following code works for me:
#!/usr/bin/perl
use LWP::UserAgent;
use JSON;
use strict;
my $md5 = $ARGV[0];
$md5 =~ s/[^A-Fa-f0-9 ]*//g;
die "invalid MD5" unless ( length($md5) == 32 );
my $ua = LWP::UserAgent->new(ssl_opts => { verify_hostname => 1 }, timeout => 10);
my $key="12345...7890";
my $url='https://www.virustotal.com/vtapi/v2/file/report';
my $response = $ua->post( $url, ['apikey' => $key, 'resource' => $md5] );
die "$url error: ", $response->status_line unless $response->is_success;
my $results=$response->content;
my $json = JSON->new->allow_nonref;
my $decjson = $json->decode( $results);
print "md5: ",$md5,"\n";
print "positives: ", $decjson->{"positives"}, "\n";
print "total: ", $decjson->{"total"}, "\n";
print "date: ", $decjson->{"scan_date"}, "\n";
Now I would like to recode the above for using asynchronous http using Mojo. I'm trying this:
#!/usr/bin/perl
use warnings;
use strict;
use Mojo;
use Mojo::UserAgent;
my $md5 = $ARGV[0];
$md5 =~ s/[^A-Fa-f0-9 ]*//g;
die "invalid MD5" unless ( length($md5) == 32 );
my ($vt_positives, $vt_scandate, $response_vt);
my $url='https://www.virustotal.com/vtapi/v2/file/report';
my $key="12345...7890";
my $ua = Mojo::UserAgent->new;
my $delay = Mojo::IOLoop->delay;
$ua->max_redirects(0)->connect_timeout(3)->request_timeout(6);
$ua->max_redirects(5);
$delay->begin;
$response_vt = $ua->post( $url => ['apikey' => $key, 'resource' => $md5] => sub {
my ($ua, $tx) = #_;
$vt_positives=$tx->res->json->{"positives"};
print "Got response: $vt_positives\n";
});
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
The first code is OK, the second isn't working. I must be doing something wrong when sending the request since I seem to get a 403 response (incorrect API usage). I also tried -> json calls but it didn't work out.
And even if I had done the request correctly, I'm not sure if I'm correctly decoding the json results with Mojo.
Help will be appreciated!
EDIT
It seems that we missed the real question, how to post forms. Oops sorry about that.
Posting forms depends on which version of Mojolicious you are using. Until recently (v3.85 -- 2013-02-13) there was a post_form method. On reflection however, it was decided there should either be *_form methods for every request type, or we should do something smarter, and thus the form generator was born.
$response_vt = $ua->post(
$url,
form => {'apikey' => $key, 'resource' => $md5},
sub { ... }
);
It can be added to any request method, making it much more consistent than the old form. Also note that it should be a hashref, not an arrayref as LWP allows. BTW there is also a json generator that works like this too, or you can even add your own!
I'm leaving my original answer, showing non-blocking usage, which you may now amend given the above.
ORIGINAL
Building off the logic from creaktive, this is how I would start. The major difference is that there isn't a monitor watching to be sure that there are works going, rather when one finishes it checks to be sure that there are no idlers.
I have also made some changes in the parsing logic, but nothing major.
#!/usr/bin/env perl
use Mojo::Base -strict;
use utf8::all;
use Mojo::URL;
use Mojo::UserAgent;
# FIFO queue
my #urls = qw(
http://sysd.org/page/1/
http://sysd.org/page/2/
http://sysd.org/page/3/
);
# User agent following up to 5 redirects
my $ua = Mojo::UserAgent
->new(max_redirects => 5)
->detect_proxy;
start_urls($ua, \#urls, \&get_callback);
sub start_urls {
my ($ua, $queue, $cb) = #_;
# Limit parallel connections to 4
state $idle = 4;
state $delay = Mojo::IOLoop->delay(sub{say #$queue ? "Loop ended before queue depleated" : "Finished"});
while ( $idle and my $url = shift #$queue ) {
$idle--;
print "Starting $url, $idle idle\n\n";
$delay->begin;
$ua->get($url => sub{
$idle++;
print "Got $url, $idle idle\n\n";
$cb->(#_, $queue);
# refresh worker pool
start_urls($ua, $queue, $cb);
$delay->end;
});
}
# Start event loop if necessary
$delay->wait unless $delay->ioloop->is_running;
}
sub get_callback {
my ($ua, $tx, $queue) = #_;
# Parse only OK HTML responses
return unless
$tx->res->is_status_class(200)
and $tx->res->headers->content_type =~ m{^text/html\b}ix;
# Request URL
my $url = $tx->req->url;
say "Processing $url";
parse_html($url, $tx, $queue);
}
sub parse_html {
my ($url, $tx, $queue) = #_;
state %visited;
my $dom = $tx->res->dom;
say $dom->at('html title')->text;
# Extract and enqueue URLs
$dom->find('a[href]')->each(sub{
# Validate href attribute
my $link = Mojo::URL->new($_->{href});
return unless eval { $link->isa('Mojo::URL') };
# "normalize" link
$link = $link->to_abs($url)->fragment(undef);
return unless grep { $link->protocol eq $_ } qw(http https);
# Don't go deeper than /a/b/c
return if #{$link->path->parts} > 3;
# Access every link only once
return if $visited{$link->to_string}++;
# Don't visit other hosts
return if $link->host ne $url->host;
push #$queue, $link;
say " -> $link";
});
say '';
return;
}
Take a look at this concurrent-requesting Mojolicious-based web crawler I wrote to illustrate my article Web Scraping with Modern Perl:
#!/usr/bin/env perl
use 5.010;
use open qw(:locale);
use strict;
use utf8;
use warnings qw(all);
use Mojo::UserAgent;
# FIFO queue
my #urls = map { Mojo::URL->new($_) } qw(
http://sysd.org/page/1/
http://sysd.org/page/2/
http://sysd.org/page/3/
);
# Limit parallel connections to 4
my $max_conn = 4;
# User agent following up to 5 redirects
my $ua = Mojo::UserAgent
->new(max_redirects => 5)
->detect_proxy;
# Keep track of active connections
my $active = 0;
Mojo::IOLoop->recurring(
0 => sub {
for ($active + 1 .. $max_conn) {
# Dequeue or halt if there are no active crawlers anymore
return ($active or Mojo::IOLoop->stop)
unless my $url = shift #urls;
# Fetch non-blocking just by adding
# a callback and marking as active
++$active;
$ua->get($url => \&get_callback);
}
}
);
# Start event loop if necessary
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
sub get_callback {
my (undef, $tx) = #_;
# Deactivate
--$active;
# Parse only OK HTML responses
return
if not $tx->res->is_status_class(200)
or $tx->res->headers->content_type !~ m{^text/html\b}ix;
# Request URL
my $url = $tx->req->url;
say $url;
parse_html($url, $tx);
return;
}
sub parse_html {
my ($url, $tx) = #_;
say $tx->res->dom->at('html title')->text;
# Extract and enqueue URLs
for my $e ($tx->res->dom('a[href]')->each) {
# Validate href attribute
my $link = Mojo::URL->new($e->{href});
next if 'Mojo::URL' ne ref $link;
# "normalize" link
$link = $link->to_abs($tx->req->url)->fragment(undef);
next unless grep { $link->protocol eq $_ } qw(http https);
# Don't go deeper than /a/b/c
next if #{$link->path->parts} > 3;
# Access every link only once
state $uniq = {};
++$uniq->{$url->to_string};
next if ++$uniq->{$link->to_string} > 1;
# Don't visit other hosts
next if $link->host ne $url->host;
push #urls, $link;
say " -> $link";
}
say '';
return;
}
LWP::UserAgent takes arguments to post as either ref to array or ref to hash format.
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#REQUEST_METHODS
$ua->post( $url, \%form )
$ua->post( $url, \#form )
which you provide in the first script in the ref to array format "\#form"
my $response = $ua->post( $url, ['apikey' => $key, 'resource' => $md5] );
being as it is a hash this is probably better written in the hash format "\%form"
my $response = $ua->post( $url, {'apikey' => $key, 'resource' => $md5} );
In Mojo::UserAgent the arguments to post are a little more complex, but essentially appear to be a "string" of hash refs to hash keys, with which I am unfamiliar. However you may find using the hash ref format provides the expected arguments correctly.
http://search.cpan.org/~sri/Mojolicious-3.87/lib/Mojo/UserAgent.pm#post
POST
my $tx = $ua->post('kraih.com');
my $tx = $ua->post('http://kraih.com' => {DNT => 1} => 'Hi!');
my $tx = $ua->post('http://kraih.com' => {DNT => 1} => form => {a => 'b'});
my $tx = $ua->post('http://kraih.com' => {DNT => 1} => json => {a => 'b'});
try this ?:
$response_vt = $ua->post( $url => form => {'apikey' => $key, 'resource' => $md5} => sub {... });
When trying to run my perl script via cmd prompt my json string is returning [] I have read other posts, and fixed my database to be utf8 and the error still persists. I have tried two didfferent way to encode my perl string the first was $json = encode_json #temp_array which returns this error hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this However, when I use this line $json_text = $json->encode(#temp_array) I just get []
Here is my perl:
my $json_text;
my $json = JSON->new->utf8;
my #temp_array =[];
my $temp_array;
while (#data = $query_handle->fetchrow_array())
{
my %json_hash = ();
my $hash_ref;
%json_hash = (
"User ID" => $data[0],
"Status" => $data[1],
"Last Password Reset" => $data[2],
"Reset Needed" => $data[3]
);
$hash_ref = \%json_hash;
push (#temp_array, $hash_ref);
}
print $json = encode_json #temp_array . "\n"; #encode with error
print $json_text = $json->encode(#temp_array) . "\n"; #encode with []
print $cgi->header(-type => "application/json", -charset => "utf-8");
print $json_text; #Prints []
So in my own testing, via the cmd prompt I know the while is retrieving the data from my db correctly and is building a hash, which I am assuming is correct.
Is it the fact I am pushing my hash reference to the array instead of the hash itself? Once I get this string built correctly, I will be calling it to an html via jquery
Thank you.
JSON expects references:
print $json = encode_json(\#temp_array) . "\n";
print $json_text = $json->encode(\#temp_array) . "\n";
Edit: Unless you enable allow_nonref.
Another edit: This line is wrong--
my #temp_array =[]; ## should be my #temp_array = ();
and this line overwrites the $json variable:
print $json = encode_json #temp_array . "\n"; ## the next line in your script shouldn't work
Last edit - untested:
my $json = JSON->new->utf8;
my #temp_array;
while (my #data = $query_handle->fetchrow_array()) {
my %json_hash = (
"User ID" => $data[0],
"Status" => $data[1],
"Last Password Reset" => $data[2],
"Reset Needed" => $data[3]
);
push (#temp_array, \%json_hash);
}
print $json->encode(\#temp_array) . "\n";