How do I use encode_json with string in Perl? - json

Here is my code that I try to open the file to get data and change it to UTF-8, then read each line and store it in variable my $abstract_text and send it back in JSON structure.
my $fh;
if (!open($fh, '<:encoding(UTF-8)',$path))
{
returnApplicationError("Cannot read abstract file: $path ($!)\nERRORCODE|111|\n");
}
printJsonHeader;
my #lines = <$fh>;
my $abstract_text = '';
foreach my $line (#lines)
{
$abstract_text .= $line;
}
my $json = encode_json($abstract_text);
close $fh;
print $json;
By using that code, I get this error;
hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this)
error message also point out that the problem is in this line;
my $json = encode_json($abstract_text);
I want to send the data back as a string (which is in UTF-8). Please help.

I assume you're using either JSON or JSON::XS.
Both allow for non-reference data, but not via the procedural encode_json routine.
You'll need to use the object-oriented approach:
use strict; # obligatory
use warnings; # obligatory
use JSON::XS;
my $encoder = JSON::XS->new();
$encoder->allow_nonref();
print $encoder->encode('Hello, world.');
# => "Hello, world."

Related

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

I am getting a "csv file" from a vendor (using their API), but what they do is just spew the whole thing into their response. It wouldn't be a significant problem except that, of course, some of those pesky humans entered the data and put in "features" like line breaks. What I am doing now is creating a file for the raw data and then reopening it to read the data:
open RAW, ">", "$rawfile" or die "ERROR: Could not open $rawfile for write: $! \n";
print RAW $response->content;
close RAW;
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
open my $fh, "<", "$rawfile" or die "ERROR: Could not open $rawfile for read: $! \n";
while ( $line = $csv->getline ($fh) ) { ...
Somehow this seems ... inelegant. It seems that I ought to be able to just read the data from the $response->content (multiline string) as if it were a file. But I'm drawing a total blank on how do this.
A pointer would be greatly appreciated.
Thanks,
Paul
You could use a string filehandle:
my $data = $response->content;
open my $fh, "<", \$data or croak "unable to open string filehandle : $!";
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
while ( $line = $csv->getline ($fh) ) { ... }
Yes, you can use Text::CSV_XS on a string, via its functional interface
use warnings;
use strict;
use feature 'say';
use Text::CSV_XS qw(csv); # must use _XS version
my $csv = qq(a,line\nand,another);
my $aoa = csv(in => \$csv)
or die Text::CSV->error_diag;
say "#$_" for #aoa;
Note that this indeed needs Text::CSV_XS (normally Text::CSV works but not with this).
I don't know why this isn't available in the OO interface (or perhaps is but is not documented).
While the above parses the string directly as asked, one can also lessen the "inelegant" aspect in your example by writing content directly to a file as it's acquired, what most libraries support like with :content_file option in LWP::UserAgent::get method.
Let me also note that most of the time you want the library to decode content, so for LWP::UA to use decoded_content (see HTTP::Response).
I cooked up this example with Mojo::UserAgent. For the CSV input I used various data sets from the NYC Open Data. This is also going to appear in the next update for Mojo Web Clients.
I build the request without making the request right away, and that gives me the transaction object, $tx. I can then replace the read event so I can immediately send the lines into Text::CSV_XS:
#!perl
use v5.10;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content->unsubscribe('read')->on(read => sub {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new;
};
state $buffer;
state $reader = do {
open my $r, '<:encoding(UTF-8)', \$buffer;
$r;
};
my ($content, $bytes) = #_;
$buffer .= $bytes;
while (my $row = $csv->getline($reader) ) {
say join ':', $row->#[2,4];
}
});
$tx = $ua->start($tx);
That's not as nice as I'd like it to be because all the data still show up in the buffer. This is slightly more appealing, but it's fragile in the ways I note in the comments. I'm too lazy at the moment to make it any better because that gets hairy very quickly as you figure out when you have enough data to process a record. My particular code isn't as important as the idea that you can do whatever you like as the transactor reads data and passes it into the content handler:
use v5.10;
use strict;
use warnings;
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content
->unsubscribe('read')
->on( read => process_bytes_factory() );
$tx = $ua->start($tx);
sub process_bytes_factory {
return sub ( $content, $bytes ) {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new( { decode_utf8 => 1 } );
};
state $buffer = '';
state $line_no = 0;
$buffer .= $bytes;
# fragile if the entire content does not end in a
# newline (or whatever the line ending is)
my $last_line_incomplete = $buffer !~ /\n\z/;
# will not work if the format allows embedded newlines
my #lines = split /\n/, $buffer;
$buffer = pop #lines if $last_line_incomplete;
foreach my $line ( #lines ) {
my $status = $csv->parse($line);
my #row = $csv->fields;
say join ':', $line_no++, #row[2,4];
}
};
}

Escape special characters in JSON string

I have Perl script which contains variable $env->{'arguments'}, this variable should contain a JSON object and I want to pass that JSON object as argument to my other external script and run it using backticks.
Value of $env->{'arguments'} before escaping:
$VAR1 = '{"text":"This is from module and backslash \\ should work too"}';
Value of $env->{'arguments'} after escaping:
$VAR1 = '"{\\"text\\":\\"This is from module and backslash \\ should work too\\"}"';
Code:
print Dumper($env->{'arguments'});
escapeCharacters(\$env->{'arguments'});
print Dumper($env->{'arguments'});
my $command = './script.pl '.$env->{'arguments'}.'';
my $output = `$command`;
Escape characters function:
sub escapeCharacters
{
#$env->{'arguments'} =~ s/\\/\\\\"/g;
$env->{'arguments'} =~ s/"/\\"/g;
$env->{'arguments'} = '"'.$env->{'arguments'}.'"';
}
I would like to ask you what is correct way and how to parse that JSON string into valid JSON string which I can use as argument for my script.
You're reinventing a wheel.
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote('./script.pl', $env->{arguments});
my $output = `$cmd`;
Alternatively, there's a number of IPC:: modules you could use instead of qx. For example,
use IPC::System::Simple qw( capturex );
my $output = capturex('./script.pl', $env->{arguments});
Because you have at least one argument, you could also use the following:
my $output = '';
open(my $pipe, '-|', './script.pl', $env->{arguments});
while (<$pipe>) {
$output .= $_;
}
close($pipe);
Note that current directory isn't necessarily the directory that contains the script that executing. If you want to executing script.pl that's in the same directory as the currently executing script, you want the following changes:
Add
use FindBin qw( $RealBin );
and replace
'./script.pl'
with
"$RealBin/script.pl"
Piping it to your second program rather than passing it as an argument seems like it would make more sense (and be a lot safer).
test1.pl
#!/usr/bin/perl
use strict;
use JSON;
use Data::Dumper;
undef $/;
my $data = decode_json(<>);
print Dumper($data);
test2.pl
#!/usr/bin/perl
use strict;
use IPC::Open2;
use JSON;
my %data = ('text' => "this has a \\backslash", 'nums' => [0,1,2]);
my $json = JSON->new->encode(\%data);
my ($chld_out, $chld_in);
print("Executing script\n");
my $pid = open2($chld_out, $chld_in, "./test1.pl");
print $chld_in "$json\n";
close($chld_in);
my $out = do {local $/; <$chld_out>};
waitpid $pid, 0;
print(qq~test1.pl output =($out)~);

Perl encoding from JSON issue

apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg \u2019 in the JSON, decode_json appears to convert this to \x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (#{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$text\n";
}
The Dumper output shows that the \u encoding has been converted to \x encoding. What am I doing wrong?
decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open \"$in_qfn\": $!\n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
for my $result (#{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json (or JSON->new->decode) instead of decode_json.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open \"$in_qfn\": $!\n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
for my $result (#{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
Missing local $/; in case there are line breaks in the JSON.
Don't use 2-arg open.
Don't needlessly use global variables.
Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.
Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.
Get rid of the noisy quotes around identifiers used as hash keys.

Script extracts and prints UTF-8 words well, but prints JSON as garbage

I have tried my script on Mac OS Mavericks (perl 5.16.2) and Yosemite and also with Windows 7 (strawberry-perl-5.20.1.1-64bit-portable).
It is supposed to read UTF-8 data (russian text) and put it into a data structure - and finally print the data structure as JSON string (the output will be used to feed Core Data in an iOS word game).
The first part works (extracting words and printing them - to verify) works well, but the final part not: the resulting JSON string contains garbage:
Does anybody please know, how to fix my simple test script?
#!/usr/bin/perl -w
use strict;
use warnings;
use utf8;
use JSON;
binmode(STDOUT, ':utf8');
my $root = { words => [] };
while (<DATA>) {
chomp;
utf8::decode($_);
my #a = split /\s*[:,]\s*/;
my $words = [];
for my $word (#a[1 .. $#a]) {
print "WORD: $word\n";
#push #$words, utf8::encode($word);
push #$words, $word;
}
push #{$root->{words}}, $words;
}
print to_json($root, {utf8 => 1, pretty => 1});
__DATA__
Голова: небо, язык, мозг, глотка, надгортанник, пищевод, горло, гортань
Сумки: портмоне, кошелек, портфель, рюкзак, лямка, застежка
You're double encoding. You're encoding using from_json (utf8 => 1), then you're encoding again when outputting to STDOUT (binmode(STDOUT, ':utf8');).
The solution isn't clear, because it's not clear what you are trying to achieve. If you're really going to output non-JSON and JSON to STDOUT, don't ask from_json to encode.
The output looks "wrong", but that's OK: it's encoded. To see it correctly, just set
binmode STDOUT, ':raw';
before printing the JSON.
You can simplify the script by using encode_json:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use JSON;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
my $root;
while (<DATA>) {
chomp;
my #words = split /\s*[:,]\s*/;
push #{ $root->{words} }, [];
for my $word (#words[1 .. $#words]) {
print "WORD: $word\n";
push #{ $root->{words}[-1] }, $word;
}
}
my $json = encode_json($root);
binmode STDOUT, ':raw';
print $json;

how to decode json in perl if value has \n or more than one line

how to decode the json file if a value has more than one line
a.json file:
{
"sv1" : {
"output" : "Hostname: abcd
asdkfasfjsl",
"exp_result" : "xyz"
}
}
when I try to read the above json file, I am hitting with an error "invalid character encountered while parsing JSON string, at character offset 50 (before "\n ...")"
code to read the above json file:
#!/volume/perl/bin/perl -w
use strict;
use warnings;
use JSON;
local $/;
open(AA,"<a.json") or die "can't open json file : $!\n";
my $json = <AA>;
my $data = decode_json($json);
print "reading output $data->{'sv1'}->{'output'}\n";
print "reading output $data->{'sv1'}->{'exp_result'}\n";
close AA;
Besides from whether the JSON is valid or not (see comments on question), you're reading only the first line from the file.
my $json = <AA>;
This is a scalar variable and receives only one line.
Use an array to get all lines:
my #json = <AA>;
my $json = join "\n", #json;
or even better: use File::Slurp::read_file to get the whole content of the file with one simple command.
use File::Slurp qw/read_file/;
my $json = read_file( "a.json" );