I pass a utf8 encoded string from my command line into a Perl program:
> ./test.pl --string='ḷet ūs try ṭhiñgs'
which seems to recognize the string correctly:
use utf8;
GetOptions(
'string=s' => \$string,
) or die;
print Dumper($string);
print Dumper(utf8::is_utf8($string));
print Dumper(utf8::valid($string));
prints
$VAR1 = 'ḷet ūs try ṭhiñgs';
$VAR1 = '';
$VAR1 = 1;
When I store this string into a hash and call encode_json on it, the string seems to be again encoded whereas to_json seems to work (if I read the output correctly):
my %a = ( 'nāme' => $string ); # Note the Unicode character
print Dumper(\%a);
print Dumper(encode_json(\%a));
print Dumper(to_json(\%a));
prints
$VAR1 = {
"n\x{101}me" => 'ḷet ūs try ṭhiñgs'
};
$VAR1 = '{"nāme":"ḷet Å«s try á¹hiñgs"}';
$VAR1 = "{\"n\x{101}me\":\"\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs\"}";
Turning this back into the original hash, however, doesn't seem to work with either methods and in both cases hash and string and broken:
print Dumper(decode_json(encode_json(\%a)));
print Dumper(from_json(to_json(\%a)));
prints
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
A hash lookup $a{'nāme'} now fails.
Question: How do I handle utf8 encoding and strings and JSON encode/decode correctly in Perl?
You need to decode your input:
use Encode;
my $string;
GetOptions('string=s' => \$string) or die;
$string = decode('UTF-8', $string);
Putting it all together, we get:
use strict;
use warnings;
use 5.012;
use utf8;
use Encode;
use Getopt::Long;
use JSON;
my $string;
GetOptions('string=s' => \$string) or die;
$string = decode('UTF-8', $string);
my %hash = ('nāme' => $string);
my $json = encode_json(\%hash);
my $href = decode_json($json);
binmode(STDOUT, ':encoding(utf8)');
say $href->{nāme};
Example:
$ perl test.pl --string='ḷet ūs try ṭhiñgs'
ḷet ūs try ṭhiñgs
Make sure your source file is actually encoded as UTF-8!
Related
I work with data from web forms in cp1251;
I need to convert hash with data to JSON
When i used
my $href = {a => "дот"};
my $str = to_json($href, {utf8 => 0});
use Data::Dumper;
print Dumper($str); # will print something like this {"a": "\x{c4}\x{ee}\x{e8}"}
i need to make JSON string without converting.
In this case helps something like this:
my $str = encode('cp1251', decode('cp1251', to_json($href, {utf8 => 0}) ));
print Dumper($str); # will print {"a": "дот"}
or
my $str = to_json($href, {utf8 => 0});
utf8::downgrade($str);
print Dumper($str); # will print {"a": "дот"}
What other right options are there to solve the problem?
You can use the encode_json function from the JSON module to generate a JSON string from a data structure in a specified encoding. For example:
use JSON qw(encode_json);
my $href = {a => "дот"};
my $str = encode_json($href, {utf8 => 0, encode_check_method => 'sv'});
print Dumper($str); # will print {"a": "дот"}
The encode_check_method option specifies the method used to check for characters that cannot be encoded in the specified encoding. The value sv checks the SVf_UTF8 flag on the scalar, which means that it will only encode characters that are not flagged as UTF-8.
Alternatively, you can use the encode function from the Encode module to encode the JSON string after it has been generated:
use Encode qw(encode);
my $href = {a => "дот"};
my $json = to_json($href, {utf8 => 0});
my $str = encode('cp1251', $json);
print Dumper($str); # will print {"a": "дот"}
This will encode the JSON string in the cp1251 encoding after it has been generated.
I am trying to decode an UTF-8 encoded json string with Cpanel::JSON::XS:
use strict;
use warnings;
use open ':std', ':encoding(utf-8)';
use utf8;
use Cpanel::JSON::XS;
use Data::Dumper qw(Dumper);
my $str = '{ "title": "Outlining — How to outline" }';
my $hash = decode_json $str;
#my $hash = Cpanel::JSON::XS->new->utf8->decode_json( $str );
print Dumper($hash);
but this throws an exception at decode_json:
Wide character in subroutine entry
I also tried Cpanel::JSON::XS->new->utf8->decode_json( $str ) (see commented out line), but this gives another error:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)")
What am I missing here?
decode_json expects UTF-8, but you are providing decoded text (a string of Unicode Code Points).
Use
use utf8;
use Encode qw( encode_utf8 );
my $json_utf8 = encode_utf8( '{ "title": "Outlining — How to outline" }' );
my $data = decode_json( $json_utf8 );
or
use utf8;
my $json_utf8 = do { no utf8; '{ "title": "Outlining — How to outline" }' };
my $data = decode_json( $json_utf8 );
or
use utf8;
my $json_ucp = '{ "title": "Outlining — How to outline" }';
my $data = Cpanel::JSON::XS->new->decode( $json_ucp ); # Implied: ->utf8(0)
(The middle one seems hackish to me. The first one might be used if you get data from multiple source, and the others provide it encoded.)
I have Perl script which contains variable $env->{'arguments'}, this variable should contain a JSON object and I want to pass that JSON object as argument to my other external script and run it using backticks.
Value of $env->{'arguments'} before escaping:
$VAR1 = '{"text":"This is from module and backslash \\ should work too"}';
Value of $env->{'arguments'} after escaping:
$VAR1 = '"{\\"text\\":\\"This is from module and backslash \\ should work too\\"}"';
Code:
print Dumper($env->{'arguments'});
escapeCharacters(\$env->{'arguments'});
print Dumper($env->{'arguments'});
my $command = './script.pl '.$env->{'arguments'}.'';
my $output = `$command`;
Escape characters function:
sub escapeCharacters
{
#$env->{'arguments'} =~ s/\\/\\\\"/g;
$env->{'arguments'} =~ s/"/\\"/g;
$env->{'arguments'} = '"'.$env->{'arguments'}.'"';
}
I would like to ask you what is correct way and how to parse that JSON string into valid JSON string which I can use as argument for my script.
You're reinventing a wheel.
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote('./script.pl', $env->{arguments});
my $output = `$cmd`;
Alternatively, there's a number of IPC:: modules you could use instead of qx. For example,
use IPC::System::Simple qw( capturex );
my $output = capturex('./script.pl', $env->{arguments});
Because you have at least one argument, you could also use the following:
my $output = '';
open(my $pipe, '-|', './script.pl', $env->{arguments});
while (<$pipe>) {
$output .= $_;
}
close($pipe);
Note that current directory isn't necessarily the directory that contains the script that executing. If you want to executing script.pl that's in the same directory as the currently executing script, you want the following changes:
Add
use FindBin qw( $RealBin );
and replace
'./script.pl'
with
"$RealBin/script.pl"
Piping it to your second program rather than passing it as an argument seems like it would make more sense (and be a lot safer).
test1.pl
#!/usr/bin/perl
use strict;
use JSON;
use Data::Dumper;
undef $/;
my $data = decode_json(<>);
print Dumper($data);
test2.pl
#!/usr/bin/perl
use strict;
use IPC::Open2;
use JSON;
my %data = ('text' => "this has a \\backslash", 'nums' => [0,1,2]);
my $json = JSON->new->encode(\%data);
my ($chld_out, $chld_in);
print("Executing script\n");
my $pid = open2($chld_out, $chld_in, "./test1.pl");
print $chld_in "$json\n";
close($chld_in);
my $out = do {local $/; <$chld_out>};
waitpid $pid, 0;
print(qq~test1.pl output =($out)~);
I have a function i cannot control which returns a string which is acutally a hash. It looks something like below:
{"offset":0,"limit":500,"count":0,"virtual_machines":[]}
I need to check if the count is greater than 0. Because the output is a string and not a hash, i am trying to split the string and get the output from it.
The snippet for the same is below:
my $output = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
$output =~ s/ +/ /g;
my #words = split /[:,"\s\/]+/, $output;
print Dumper(#words);
The output for this is:
$VAR1 = '{';
$VAR2 = 'offset';
$VAR3 = '0';
$VAR4 = 'limit';
$VAR5 = '500';
$VAR6 = 'count';
$VAR7 = '0';
$VAR8 = 'virtual_machines';
$VAR9 = '[]}';
Now, i can get the value $VAR7 and get the count.
Is there a way to convert a string to hash and then use the keys to get the values instead of using regex and split. Can someone help me out here!
That string is in JSON format. I'd simply do
use strict;
use warnings;
use JSON::PP qw(decode_json);
my $output = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
my $data = decode_json $output;
print $data->{count}, "\n";
If all colons are just separators, then you can replace them with '=>'s and eval the string.
That's probably unrealistic, though. So you can use JSON ... looks like the string is in JSON format. Try the following (worked for me :-):
#!/usr/bin/perl
use JSON::Parse 'parse_json';
# the string is JSON
my $jstr = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
# oversimplified (not using json ... o.k. if no colons anywhere but as separators
my $sstr = $jstr;
$sstr =~ s/:/=>/g;
my $href = eval "$sstr";
printf("From oversimplified eval, limit == %d\n", $href->{limit});
# using JSON (looks like string is in JSON format).
# get JSON::Parse from CPAN (sudo cpan JSON::Parse)
my $jref = parse_json($jstr);
printf("From JSON::Parse, limit == %d\n", $jref->{limit});
1;
Output:
From oversimplified eval, limit == 500
From JSON::Parse, limit == 500
My custom code (on Perl) give next wrong JSON, missing comma between blocks:
{
"data": [{
"{#LOGFILEPATH}": "/tmp/QRZ2007.tcserverlogs",
"{#LOGFILE}": "QRZ2007"
} **missing comma** {
"{#LOGFILE}": "ARZ2007",
"{#LOGFILEPATH}": "/tmp/ARZ2007.tcserverlogs"
}]
}
My terrible code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use utf8;
use JSON;
binmode STDOUT, ":utf8";
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
opendir(DIR, $dir) or die $!;
print '{"data": [';
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file");
next unless ($file =~ m/\.tcserverlogs$/);
my $fullPath = "$dir/$file";
my $filenameshort = basename($file, ".tcserverlogs");
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
print $json->encode($data_to_json);
}
print ']}'."\n";
closedir(DIR);
exit 0;
Dear Team i am not a programmer, please any idea how fix it, thank you!
If you do not print a comma, you will not get a comma.
You are trying to build your own JSON string from pre-encoded pieces of smaller data structures. That will not work unless you tell Perl when to put commas. You could do that, but it's easier to just collect all the data into a Perl data structure that is equivalent to the JSON string you want to produce, and encode the whole thing in one go when you're done.
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
my #data;
opendir( DIR, $dir ) or die $!;
while ( my $file = readdir(DIR) ) {
next unless ( -f "$dir/$file" );
next unless ( $file =~ m/\.tcserverlogs$/ );
my $fullPath = "$dir/$file";
my $filenameshort = basename( $file, ".tcserverlogs" );
my $data_to_json = { "{#LOGFILEPATH}" => $fullPath, "{#LOGFILE}" => $filenameshort };
push #data, $data_to_json;
}
closedir(DIR);
print $json->encode( { data => \#data } );