How to convert \u0421 into letter "C"? - json

I made a post query to server and got json. It contains wrong symbol: instead "Correct" I got "\u0421orrect". How can I encode this text?
A parse_json function performs it like "РЎorrect";
I found out that
$a = "\x{0421}orrect";
$a= encode("utf-8", $a);
returns "РЎorrect", and
$a = "\x{0421}orrect";
$a= encode("cp1251", $a);
returns "Correct"
So I've decided to change \u to \x and then to use cp1251.
\u to \x
I wrote:
Encode::Escape::enmode 'unicode-escape', 'perl';
Encode::Escape::demode 'unicode-escape', 'python';
$content= encode 'unicode-escape', decode 'unicode-escape', $content;
and got \x{0421}orrect.
And then I tried:
$content = encode( 'cp1251', $content );
And... nothing changed! I still have \x{0421}orrect...
I notice something interesting:
$a = "\x{0421}orrect";
$a= encode("cp1251", $a);
returns "Correct"
BUT
$a = '\x{0421}orrect';
$a= encode("cp1251", $a);
still returns "\x{0421}orrect".
Maybe this is a key, but I don't know what I can do with this.
I've already tried to encode and decode, Encode:: from_to,JSON::XS and utf8.

You mention escaping multiple times, but you want to do the opposite (unescape).
decode_json/from_json will correctly return "Сorrect" (Where the "C" is CYRILLIC CAPITAL LETTER ES).
use JSON::XS qw( decode_json );
my $json_utf8 = '{"value":"\u0421orrect"}';
my $data = decode_json($json_utf8);
You do need to encode your outputs, though. For example, if you have Cyrillic-based Windows system, and you wanted to create a native file, you could use
open(my $fh, '>:encoding(cp1251)', $qfn)
or die("Can't create \"$qfn\": $!\n");
say $fh $data->{value};
If you want to hardcode the encoding, or if you're interested in the encoding output to STDOUT and STDERR as well, check out this.

Apologies if you realise this already - I just think it's worth pointing out so we're all on the same page.
Character number \x{0421} has the description "CYRILLIC CAPITAL LETTER ES" and looks like this: С
Character number \x{0043} has the description "LATIN CAPITAL LETTER C" and looks like this: C
So depending on the font you're using, it's entirely likely that the two characters appear identical.
You asked "How can I encode this text?" but you didn't explain what you mean by that or why you want to "encode" it. There is no encoding that will convert 'С' (\x{0421}) into 'C' (\x{0043}) - they are two different characters from two different alphabets.
So the question is, what are you trying to achieve? Are you trying to check if the string returned from the server matched "Correct"? If so, that simply won't work, because the server is returning the string "Сorrect". They might look the same, but they are two different strings.
It's possible that whole situation is an error in the server code and it should be returning "Correct". If that is the case and you can't rely on the server reliably returning the "Correct", one workaround would be to use a character replacement, to "normalise" the string before you inspect its contents. For example:
use JSON::XS qw( decode_json );
my $response = <<EOF;
{
"status": "\u0421orrect"
}
EOF
my $data = decode_json($response);
my $status = $data->{status};
$status =~ tr/\x{0421}/C/;
if($status eq "Correct") {
say "The status is correct";
}
else {
say "The status is not correct";
}
This code will work now, and in the future if the server code is fixed to return "Correct".

Related

JSON_decode() acting weird, adding letters and renaming characters

I have an object $code which contains [{"id":863183023486434}]. After performing a decode, $code = json_decode($code);, the decoded $code returns8.6318302348643E+14. How is that possible?
PHP CODE:
<?php
$a = '[{"id":863183023486434}]';
$code = json_decode($a, true, 512, JSON_BIGINT_AS_STRING);
echo '<pre>';print_r($code);exit();
?>
OUTPUT:
8.6318302348643E+14 is just scientific notation for 863183023486434. (Well, nearly; it's a bit truncated, probably because of IEEE-754 double-precision floating point precision issues.) You're seeing that because of how you're outputting the value, it's not that the value itself is different.
To output the number without scientific notation, this answer says you use sprintf (I'm not a PHP guy).

How can I delete parts of a JSON web response?

I have a simple Perl script and I want to remove everything up to the word "city". Or remove everything up to the nth occurrence (the 2nd in my particular case) of the comma's " , ". Here's what is looks like below.
#!/usr/bin/perl
use warnings;
use strict;
my $CMD = `curl http://ip-api.com/json/8.8.8.8`;
chomp($CMD);
my $find = "^[^city]*city";
$CMD =~ s/$find//;
print $CMD;
The output is this:
{"as":"AS15169 Google Inc.","city":"Mountain View","country":"United States","countryCode":"US","isp":"Google","lat" :37.386,"lon":-122.0838,"org":"Google","query":"8.8.8.8","region":"CA","regionName":"California","status":"success","timezone":"America/Los_Angeles","zip":"94035"}
So i want do drop
" {"as":"AS15169 Google Inc.","
or drop up to
{"as":"AS15169 Google Inc.","city":"Mountain View",
EDIT:
I see I was doing far too much when matching the string. I simplified the fix for my problem with removing all before "city". My $find has been changed to
my $find = ".*city";
While I also changed the replace function like so,
$CMD =~ s/$find/city/;
Still haven't figured out how to remove all before the nth occurrence of a comma or any character / string for that matter.
The content you get back is JSON, so you can easily turn it into a Perl data structure, play with it, and even turn it back into JSON if you like. That's the point! And, it's so easy:
use Mojo::UserAgent;
use Mojo::JSON qw(decode_json encode_json);
my $ua = Mojo::UserAgent->new;
my $tx = $ua->get( 'http://ip-api.com/json/8.8.8.8' );
my $json = $tx->res->body;
my $perl = decode_json( $json );
delete $perl->{'as'};
my $new_json = encode_json( $perl );
print $new_json;
Mojolicious is wonderful for this. It's my preferred way for dealing with JSON even without the user-agent stuff. If you play with the JSON string directly, you're likely to have problems when the order of elements change or it contains wide characters.
You don't have to manually decode_json() with Mojolicious. Simply do this:
my $tx = $ua->get('http://ip-api.com/json/8.8.8.8');
my $json = $tx->res->json;
my $as = $json->{as}
You can even go fancy with JSON pointers:
my $as = $tx->res->json("/as");
Something like
#!/usr/bin/perl -w
my $results = `curl http://ip-api.com/json/8.8.8.8`;
chomp $results;
$results =~ s/^.*city":"\w+\s?\w+",//g;
print $results . "\n";
should do the trick.. unless there's a misunderstanding of what you want to keep v.s. remove.
FYI, http://regexr.com/ is totally my go to for regex happiness.

Processing form data using CGI

I'm programming in Perl and need to get data from the following HTML form:
<FORM action="./cgi-bin/Perl.pl" method="GET">
<br>
Full name: <br><input type="text" name="full_name" maxlength="20"><br>
Username: <br><input type="text" name="user_name" maxlength="8"><br>
Password: <br><input type="password" name="password" maxlength="15"><br>
Confirm password: <br><input type="password" name="new_password" maxlength="15"><br>
<input type="submit" value ="Submit"><br><br>
</FORM>
EDIT: If i cannot use CGI.pm, will the following work?
local ($buffer, #pairs, $pair, $name, $value, %FORM);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "GET") {
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}
else {
$buffer = $ENV{'QUERY_STRING'};
}
# Split information into name/value pairs
#pairs = split(/&/, $buffer);
foreach $pair (#pairs)
{
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%(..)/pack("C", hex($1))/eg;
$FORM{$name} = $value;
}
but every time I attempt to use these values I get the error:
Use of unitilialized value
How can I properly use CGI to handle my form data?
EDIT: It's possible that my error lies elsewhere. This is my code. Could it be the way in which I'm using grep? Should I not be using the GET method?
#!/usr/bin/perl
use CGI qw(:standard);
use strict;
use warnings;
print "Content-type: text/html\n\n";
#getting these from HTML form
my $full_name = param('full_name');
my $user_name= param('user_name');
my $password = param('password');
my $new_password = param('new_password');
#checking that inputs are alphanumeric or an underscore
my $mismatch = grep /^[a-zA-Z0-9_]*$/i, $full_name, $user_name, $password, $new_password;
if($mismatch) {
#error message if invalid input
print qq(<html>\n);
print qq(<head>\n);
print qq(<title> Error: alphanumeric inputs only. </title>\n);
print qq{<meta http-equiv="refresh" content="5;URL="http://www.cs.mcgill.ca/~amosqu/registration.html">\n};
print qq(</head>\n);
print qq(<body>\n);
print qq(<b><center> Inputs with alphanumeric characters only please. </b></center>\n\n);
print qq(</body>\n);
print qq(</html>\n);
}
You have altered the regex that I suggested in my answer to your previous question, which was
grep /[^A-Z0-9]/i, $full_name, $user_name, $password, $new_password
You have changed it so that $mismatch is now set to the number of parameters that are valid, and the condition for an invalid set of arguments is now the awkward $mismatch < 4.
If your requirement has altered from alphanumeric to alphanumeric plus underscore, then you can restore the sense of the grep by writing
my $mismatch = grep /\W/, $full_name, $user_name, $password, $new_password
which will set $mismatch to a positive true value if any of the values contains a "non-word" character, which is alphanumeric plus underscore as you wanted.
However, the problem you are seeing
Use of uninitialized value $_ in pattern match (m//)
is because at least one of the parameters $full_name, $user_name, $password, or $new_password is undefined. You need to find out which one and why it is happening. Are you sure that all four query parameters full_name, user_name, password, and new_password are present in the query string you're getting back? Take a look at what the query_string method returns to see.
Well, "use of initialized value" isn't an error, it's just a warning. More recent versions of Perl will tell you which variable is causing the problem.
Are you sure that it's the grep line that is generating the errors? Are you sure that you are filling in all of the form inputs when you're testing this?
The following suggestions, don't address your warning. But they are problems with your code.
The regex you are using for your grep seems broken. Your code says "set $missing to the number of variables that include nothing but alphanumeric characters". That will be set to four if you get four alphanumeric-only inputs. You then trigger the error page - which seems to be the inverse of what you want. Your regex also checks for zero or more alphanumeric characters. So it accepts the empty string. Is that what you want?
Also, your regex is too complicated. There's no need to include both A-Z and a-z if you're using /i to make the match case-insensitive. In fact your regex can be collapsed to /^\w*$/i (as \w means "alphanumeric characters plus an underscore").
Checking that inputs only contain alphanumeric characters is probably a bad idea as well. Most people's full names will include at least one space. And limiting passwords to just containing alphanumeric characters is a terrible idea.
When people point out that CGI is no longer recommended, that doesn't mean that you should go back to using Matt Wright's broken CGI parameter parser from twenty years ago. That code is just as broken as it always was. No-one should be using it. You should be looking at one of the modern Perl web development frameworks that are based on PSGI - something like Web::Simple, Dancer or Mojolicious. See CGI::Alternatives for details.

Why does the JSON module quote some numbers but not others?

We recently switched to the new JSON2 perl module.
I thought all and everything gets returned quoted now.
But i encountered some cases in which a number (250) got returned as unquoted number in the json string created by perl.
Out of curiosity:
Does anyone know why such cases exist and how the json module decides if to quote a value?
It will be unquoted if it's a number. Without getting too deeply into Perl internals, something is a number if it's a literal number or the result of an arithmetic operation, and it hasn't been stringified since its numeric value was produced.
use JSON::XS;
my $json = JSON::XS->new->allow_nonref;
say $json->encode(42); # 42
say $json->encode("42"); # "42"
my $x = 4;
say $json->encode($x); # 4
my $y = "There are $x lights!";
say $json->encode($x); # "4"
$x++; # modifies the numeric value of $x
say $json->encode($x); # 5
Note that printing a number isn't "stringifying it" even though it produces a string representation of the number to output; print $x doesn't cause a number to be a string, but print "$x" does.
Anyway, all of this is a bit weird, but if you want a value to be reliably unquoted in JSON then put 0 + $value into your structure immediately before encoding it, and if you want it to be reliably quoted then use "" . $value or "$value".
You can force it into a string by doing something like this:
$number_str = '' . $number;
For example:
perl -MJSON -le 'print encode_json({foo=>123, bar=>"".123})'
{"bar":"123","foo":123}
It looks like older versions of JSON has autoconvert functionality that can be set. Did you not have $JSON::AUTOCONVERT set to a true value?

Retrieving original JSON code from a decoded array node in Perl

I'm working on a script which receives JSON code for an array of objects similar to this:
{
"array":[
{ "id": 1, "text": "Some text" },
{ "id": 2, "text": "Some text" }
]
}
I decode it using JSON::XS and then filter out some of the results. After this, I need to store the JSON code for each individual node into a queue for later processing. The format this queue requires is JSON too, so the code I'd need to insert for each node would be something like this:
{ "id": 1, "text": "Some text" }
However, after decode_json has decoded a node, all that's left are hash references for each node:
print $json->{'array'}[0]; # Would print something like HASH(0x7ffa80c83270)
I know I could get something similar to the original JSON code using encode_json on the hash reference, but the resulting code is different from the original code, UTF-8 characters get all weird, and it seems like a lot of extra processing, specially considering the amount of data this script has to deal with.
Is there a way to retrieve the original JSON code from a decoded array node? Does JSON::XS keep the original chunks somewhere after they have been decoded?
EDIT
About the weird UTF-8 characters, they just look weird on the screen:
#!/usr/bin/perl
use utf8;
use JSON::XS;
binmode STDOUT, ":utf8";
$old_json = '{ "text": "Drag\u00f3n" }';
$json = decode_json($old_json);
print $json->{'text'}; # Dragón
$new_json = encode_json($json);
print $new_json; # {"text":"Dragón"}
$json = decode_json($new_json);
print $json->{'text'}; # Dragón
encode_json will produce equivalent JSON to what you originally had before you decoded it with decode_json. Characters encoded using UTF-8 do not get all weird.
$ cat a.pl
use Encode qw( encode_utf8 );
use JSON::XS qw( decode_json encode_json );
my $json = encode_utf8(qq!{"name":"\x{C9}ric" }!);
print($json, "\n");
print(encode_json(decode_json($json)), "\n");
$ perl a.pl | od -c
0000000 { " n a m e " : " 303 211 r i c "
0000020 } \n { " n a m e " : " 303 211 r i c
0000040 " } \n
0000043
If you want a parser that preserves the original JSON, you'll surely have to write your own; the existing ones don't do that.
No, it doesn't exist anywhere. The "original JSON" isn't stored per-element; it's decoded in a single pass.
No, this is not possible. Every JSON object can have multiple, but equivalent representations:
{ "key": "abc" }
and
{
"key" : "abc"
}
are pretty much the same.
So just use the re-encoded JSON your module gives you.
Even if JSON::XS caches the chunks, extracting them would be a breach of encapsulation, therefore having no guarantee of working if the module is upgraded. And it is bad design.
Don't care about performance. The XS modules have exceptional performance, as they are coded in C. And if you were paranoid about performance, you wouldn't use JSON but some binary format. And you wouldn't be using Perl, but Fortran ;-)
You should treat equivalent data as equivalent data. Even if the presentation is different.
If the unicode chars look weird, but process fine, there is no problem. If they don't get processed correctly, you might have to specify an exact encoding.