We recently switched to the new JSON2 perl module.
I thought all and everything gets returned quoted now.
But i encountered some cases in which a number (250) got returned as unquoted number in the json string created by perl.
Out of curiosity:
Does anyone know why such cases exist and how the json module decides if to quote a value?
It will be unquoted if it's a number. Without getting too deeply into Perl internals, something is a number if it's a literal number or the result of an arithmetic operation, and it hasn't been stringified since its numeric value was produced.
use JSON::XS;
my $json = JSON::XS->new->allow_nonref;
say $json->encode(42); # 42
say $json->encode("42"); # "42"
my $x = 4;
say $json->encode($x); # 4
my $y = "There are $x lights!";
say $json->encode($x); # "4"
$x++; # modifies the numeric value of $x
say $json->encode($x); # 5
Note that printing a number isn't "stringifying it" even though it produces a string representation of the number to output; print $x doesn't cause a number to be a string, but print "$x" does.
Anyway, all of this is a bit weird, but if you want a value to be reliably unquoted in JSON then put 0 + $value into your structure immediately before encoding it, and if you want it to be reliably quoted then use "" . $value or "$value".
You can force it into a string by doing something like this:
$number_str = '' . $number;
For example:
perl -MJSON -le 'print encode_json({foo=>123, bar=>"".123})'
{"bar":"123","foo":123}
It looks like older versions of JSON has autoconvert functionality that can be set. Did you not have $JSON::AUTOCONVERT set to a true value?
Related
I have pored over this site (and others) trying to glean the answer for this but have been unsuccessful.
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, auto_diag => 1 } );
$line = q(data="a=1,b=2",c=3);
my $csvParse = $csv->parse($line);
my #fields = $csv->fields();
for my $field (#fields) {
print "FIELD ==> $field\n";
}
Here's the output:
# CSV_XS ERROR: 2034 - EIF - Loose unescaped quote # rec 0 pos 6 field 1
FIELD ==>
I am expecting 2 array elements:
data="a=1,b=2"
c=3
What am I missing?
You may get away with using Text::ParseWords. Since you are not using real csv, it may be fine. Example:
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $line = q(data="a=1,b=2",c=3);
my #fields = quotewords(',', 1, $line);
print Dumper \#fields;
This will print
$VAR1 = [
'data="a=1,b=2"',
'c=3'
];
As you requested. You may want to test further on your data.
Your input data isn't "standard" CSV, at least not the kind that Text::CSV expects and not the kind that things like Excel produce. An entire field has to be quoted or not at all. The "standard" encoding of that would be "data=""a=1,b=2""",c=3 (which you can see by asking Text::CSV to print your expected data using say).
If you pass the allow_loose_quotes option to the Text::CSV constructor, it won't error on your input, but it won't consider the quotes to be "protecting" the comma, so you will get three fields, namely data="a=1, b=2" and c=3.
I need to extract Data from a single line of json-data which is inbetween two variables (Powershell)
my Variables:
in front of Data:
DeviceAddresses":[{"Id":
after Data:
,"
I tried this, but there needs to be some error because of all the special characters I'm using:
$devicepattern = {DeviceAddresses":[{"Id":{.*?},"}
#$deviceid = [regex]::match($changeduserdata, $devicepattern).Groups[1].Value
#$deviceid
As you've found, some character literals can't be used as-is in a regex pattern because they carry special meaning - we call these meta-characters.
In order to match the corresponding character literal in an input string, we need to escape it with \ -
to match a literal (, we use the escape sequence \(,
for a literal }, we use \}, and so on...
Fortunately, you don't need to know or remember which ones are meta-characters or escapable sequences - we can use Regex.Escape() to escape all the special character literals in a given pattern string:
$prefix = [regex]::Escape('DeviceAddresses":[{"Id":')
$capture = '(.*?)'
$suffix = [regex]::Escape(',"')
$devicePattern = "${prefix}${capture}${suffix}"
You also don't need to call [regex]::Match directly, PowerShell will populate the automatic $Matches variable with match groups whenever a scalar -match succeeds:
if($changeduserdata -match $devicePattern){
$deviceid = $Matches[1]
} else {
Write-Error 'DeviceID not found'
}
For reference, the following ASCII literals needs to be escaped in .NET's regex grammar:
$ ( ) * + . ? [ \ ^ { |
Additionally, # and (regular space character) needs to be escaped and a number of other whitespace characters have to be translated to their respective escape sequences to make patterns safe for use with the IgnorePatternWhitespace option (this is not applicable to your current scenario):
\u0009 => '\t' # Tab
\u000A => '\n' # Line Feed
\u000C => '\f' # Form Feed
\u000D => '\r' # Carriage Return
... all of which Regex.Escape() takes into account for you :)
To complement Mathias R. Jessen's helpful answer:
Generally, note that JSON data is much easier to work with - and processed more robustly - if you parse it into objects whose properties you can access - see the bottom section.
As for your regex attempt:
Note: The following also applies to all PowerShell-native regex features, such as the -match, -replace, and -split operators, the switch statement, and the Select-String cmdlet.
Mathias' answer uses [regex]::Escape() to escape the parts of the regex pattern to be used verbatim by the regex engine.
This is unequivocally the best approach if those verbatim parts aren't known in advance - e.g., when provided via a variable or expression, or passed as an argument.
However, in a regex pattern that is specified as a string literal it is often easier to individually \-escape the regex metacharacters, i.e. those characters that would otherwise have special meaning to the regex engine.
The list of characters that need escaping is (it can be inferred from the .NET Regular-Expression Quick Reference):
\ ( ) | . * + ? ^ $ [ {
If you enable the IgnorePatternWhiteSpace option (which you can do inline with
(?x), at the start of a pattern), you'll additionally have to \-escape:
#
significant whitespace characters (those you actually want matched) specified verbatim (e.g., ' ', or via string interpolation,"`t"); this does not apply to those specified via escape sequences (e.g., \t or \n).
Therefore, the solution could be simplified to:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Note how [ and { are \-escaped
$deviceId = if ($changeduserdata -match 'DeviceAddresses":\[\{"Id":(.*?),"') {
$Matches[1]
}
Using ConvertFrom-Json to properly parse JSON into objects is both more robust and more convenient, as it allows property access (dot notation) to extract the value of interest:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Convert to an object ([pscustomobject]) and drill down to the property
# of interest; note that the value of .DeviceAddresses is an *array* ([...]).
$deviceId = (ConvertFrom-Json $changeduserdata).DeviceAddresses[0].Id # -> 42
I made a post query to server and got json. It contains wrong symbol: instead "Correct" I got "\u0421orrect". How can I encode this text?
A parse_json function performs it like "РЎorrect";
I found out that
$a = "\x{0421}orrect";
$a= encode("utf-8", $a);
returns "РЎorrect", and
$a = "\x{0421}orrect";
$a= encode("cp1251", $a);
returns "Correct"
So I've decided to change \u to \x and then to use cp1251.
\u to \x
I wrote:
Encode::Escape::enmode 'unicode-escape', 'perl';
Encode::Escape::demode 'unicode-escape', 'python';
$content= encode 'unicode-escape', decode 'unicode-escape', $content;
and got \x{0421}orrect.
And then I tried:
$content = encode( 'cp1251', $content );
And... nothing changed! I still have \x{0421}orrect...
I notice something interesting:
$a = "\x{0421}orrect";
$a= encode("cp1251", $a);
returns "Correct"
BUT
$a = '\x{0421}orrect';
$a= encode("cp1251", $a);
still returns "\x{0421}orrect".
Maybe this is a key, but I don't know what I can do with this.
I've already tried to encode and decode, Encode:: from_to,JSON::XS and utf8.
You mention escaping multiple times, but you want to do the opposite (unescape).
decode_json/from_json will correctly return "Сorrect" (Where the "C" is CYRILLIC CAPITAL LETTER ES).
use JSON::XS qw( decode_json );
my $json_utf8 = '{"value":"\u0421orrect"}';
my $data = decode_json($json_utf8);
You do need to encode your outputs, though. For example, if you have Cyrillic-based Windows system, and you wanted to create a native file, you could use
open(my $fh, '>:encoding(cp1251)', $qfn)
or die("Can't create \"$qfn\": $!\n");
say $fh $data->{value};
If you want to hardcode the encoding, or if you're interested in the encoding output to STDOUT and STDERR as well, check out this.
Apologies if you realise this already - I just think it's worth pointing out so we're all on the same page.
Character number \x{0421} has the description "CYRILLIC CAPITAL LETTER ES" and looks like this: С
Character number \x{0043} has the description "LATIN CAPITAL LETTER C" and looks like this: C
So depending on the font you're using, it's entirely likely that the two characters appear identical.
You asked "How can I encode this text?" but you didn't explain what you mean by that or why you want to "encode" it. There is no encoding that will convert 'С' (\x{0421}) into 'C' (\x{0043}) - they are two different characters from two different alphabets.
So the question is, what are you trying to achieve? Are you trying to check if the string returned from the server matched "Correct"? If so, that simply won't work, because the server is returning the string "Сorrect". They might look the same, but they are two different strings.
It's possible that whole situation is an error in the server code and it should be returning "Correct". If that is the case and you can't rely on the server reliably returning the "Correct", one workaround would be to use a character replacement, to "normalise" the string before you inspect its contents. For example:
use JSON::XS qw( decode_json );
my $response = <<EOF;
{
"status": "\u0421orrect"
}
EOF
my $data = decode_json($response);
my $status = $data->{status};
$status =~ tr/\x{0421}/C/;
if($status eq "Correct") {
say "The status is correct";
}
else {
say "The status is not correct";
}
This code will work now, and in the future if the server code is fixed to return "Correct".
I am trying to get the values in powershell within specific characters. Basically I have a json with thousands of objects like this
"Name": "AllZones_APOPreface_GeographyMatch_FromBRE_ToSTR",
"Sequence": 0,
"Condition": "this.TripOriginLocationCode==\"BRE\"&&this.TripDestinationLocationCode==\"STR\"",
"Action": "this.FeesRate=0.19m;this.ZoneCode=\"Zone1\";halt",
"ElseAction": ""
I want everything within \" \"
IE here I would see that BRE and STR is Zone1
All I need is those 3 things outputted.
I have been searching how to do it with ConvertFrom-Json but no success, maybe I just havent found a good article on this.
Thanks
Start by representing your JSON as a string:
$myjson = #'
{
"Name": "AllZones_APOPreface_GeographyMatch_FromBRE_ToSTR",
"Sequence": 0,
"Condition": "this.TripOriginLocationCode==\"BRE\"&&this.TripDestinationLocationCode==\"STR\"",
"Action": "this.FeesRate=0.19m;this.ZoneCode=\"Zone1\";halt",
"ElseAction": ""
}
'#
Next, create a regular expression that matches everything in between \" and \", that's under 10 characters long (else it'll match unwanted results).
$regex = [regex]::new('\\"(?<content>.{1,10})\\"')
Next, perform the regular expression comparison, by calling the Matches() method on the regular expression. Pass your JSON string into the method parameters, as the text that you want to perform the comparison against.
$matchlist = $regex.Matches($myjson)
Finally, grab the content match group that was defined in the regular expression, and extract the values from it.
$matchlist.Groups.Where({ $PSItem.Name -eq 'content' }).Value
Result
BRE
STR
Zone1
Approach #2: Use Regex Look-behinds for more accurate matching
Here's a more specific regular expression that uses look-behinds to validate each field appropriately. Then we assign each match to a developer-friendly variable name.
$regex = [regex]::new('(?<=TripOriginLocationCode==\\")(?<OriginCode>\w+)|(?<=TripDestinationLocationCode==\\")(?<DestinationCode>\w+)|(?<=ZoneCode=\\")(?<ZoneCode>\w+)')
$matchlist = $regex.Matches($myjson)
### Assign each component to its own friendly variable name
$OriginCode, $DestinationCode, $ZoneCode = $matchlist[0].Value, $matchlist[1].Value, $matchlist[2].Value
### Construct a string from the individual components
'Your origin code is {0}, your destination code is {1}, and your zone code is {2}' -f $OriginCode, $DestinationCode, $ZoneCode
Result
Your origin code is BRE, your destination code is STR, and your zone code is Zone1
I have a JSON object which has a key value pair and the value of a one such pair is 0E10.
The problem is that this value should be a string but this is being treated as a float because of the presence of letter E after a number, hence it is showing 0 whenever I print this value (0*e+10).
Can somebody please help me solve this problem?
I am using perl to pass the JSON and reading it through Javascript. (Solution in any language would be acceptable)
This is what I get when I print the JSON.
KEY1 : 0E10
KEY2 : "XYZ"
You can clearly see that, if the value is string it puts under quotes (") but for 0E10 it is not using the quotes (").
The problem in my case is that I am reading the JSON from an API whose control is beyond my reach. I have a backend service which is written in perl which passes the JSON returned by the API. So whenever I hit a URL, the backend service written in perl is called. This service gets the JSON from the API and return back the JSON to the service which is hitting the URL.
See the difference:
Option A
use strict;
use warnings;
use JSON;
my $value = 12345;
my $hr = { KEY1=> $value, KEY2=> "XYZ" };
my $json = encode_json $hr;
print $json, "\n";
#<-- prints: {"KEY2":"XYZ","KEY1":12345}
Option B: double quote the $value assign to KEY1
use strict;
use warnings;
use JSON;
my $value = 12345;
my $hr = { KEY1=> "$value", KEY2=> "XYZ" };
my $json = encode_json $hr;
print $json, "\n";
#<-- prints: {"KEY2":"XYZ","KEY1":"12345"}
If you want to generate key: 0E10 (as opposed to key: 0 and key: '0E10'), then you'll have to generate your own JSON. Perl doesn't have a way of storing 0E10 differently than 0E9. (Neither do JavaScript, Java, C, C++, ...)
If you're willing to accept any exponent, you'll probably still have to generate your own JSON. Perl doesn't have a type system, and JSON encoders tend to use integer notation for integers (in the mathematical sense).
I specifically tested JSON::XS and JSON::PP will use 0 for a zero internally stored as a floating point number.
$ perl -MJSON::XS -MDevel::Peek -E'($_=1.1)-=$_; Dump $_; say encode_json([$_]);'
SV = PVNV(0x8002b7d8) at 0x800720f0
REFCNT = 1
FLAGS = (NOK,pNOK)
IV = 1
NV = 0
PV = 0
[0]
$ perl -MJSON::PP -MDevel::Peek -E'($_=1.1)-=$_; Dump $_; say encode_json([$_]);'
SV = PVNV(0x801602b0) at 0x8008e520
REFCNT = 1
FLAGS = (NOK,pNOK)
IV = 1
NV = 0
PV = 0
[0]
(NOK indicates the scalar contains a value stored as a floating point number.)