I'm using JsonParser.parseString which is passing if i give a value like "1234" or "abcd" (ignore case). Can anyone explain this?
Your first example is valid JSON for a number. Using JavaScript's parser:
console.log(JSON.parse('1234'));
It seems unlikely to me that a JSON parser was happy with your second example, because it's not valid:
console.log(JSON.parse('abcd'));
...unless it actually had quotes around it, in which case it's valid JSON for a string:
console.log(JSON.parse('"abcd"'));
But even objects don't have to have whitespace, for instance:
console.log(JSON.parse('{"example":"object"}'));
I'm using JsonParser.parseString which is passing if i give a value like "1234" or "abcd" (ignore case). Can anyone explain this?
The current specification for application/json is RFC 8259
Highlights from the production rules:
JSON-text = ws value ws
value = false / null / true / object / array / number / string
number = [ minus ] int [ frac ] [ exp ]
int = zero / ( digit1-9 *DIGIT )
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
Therefore:
A sequence of digits (ex: 1234) is a valid JSON-text
A sequence of digits enclosed in double quotes (ex: "1234") is a valid JSON-text
A sequence of "unescaped" characters enclosed in double quotes (ex: "abcd") is a valid JSON-text
A sequence of "unescaped" characters without enclosing double quotes (ex: abcd) is not a valid JSON-text.
cat <<TEST | jq .
"abcd"
TEST
"abcd"
cat <<TEST | jq .
abcd
TEST
parse error: Invalid numeric literal at line 2, column 0
Related
I need to extract Data from a single line of json-data which is inbetween two variables (Powershell)
my Variables:
in front of Data:
DeviceAddresses":[{"Id":
after Data:
,"
I tried this, but there needs to be some error because of all the special characters I'm using:
$devicepattern = {DeviceAddresses":[{"Id":{.*?},"}
#$deviceid = [regex]::match($changeduserdata, $devicepattern).Groups[1].Value
#$deviceid
As you've found, some character literals can't be used as-is in a regex pattern because they carry special meaning - we call these meta-characters.
In order to match the corresponding character literal in an input string, we need to escape it with \ -
to match a literal (, we use the escape sequence \(,
for a literal }, we use \}, and so on...
Fortunately, you don't need to know or remember which ones are meta-characters or escapable sequences - we can use Regex.Escape() to escape all the special character literals in a given pattern string:
$prefix = [regex]::Escape('DeviceAddresses":[{"Id":')
$capture = '(.*?)'
$suffix = [regex]::Escape(',"')
$devicePattern = "${prefix}${capture}${suffix}"
You also don't need to call [regex]::Match directly, PowerShell will populate the automatic $Matches variable with match groups whenever a scalar -match succeeds:
if($changeduserdata -match $devicePattern){
$deviceid = $Matches[1]
} else {
Write-Error 'DeviceID not found'
}
For reference, the following ASCII literals needs to be escaped in .NET's regex grammar:
$ ( ) * + . ? [ \ ^ { |
Additionally, # and (regular space character) needs to be escaped and a number of other whitespace characters have to be translated to their respective escape sequences to make patterns safe for use with the IgnorePatternWhitespace option (this is not applicable to your current scenario):
\u0009 => '\t' # Tab
\u000A => '\n' # Line Feed
\u000C => '\f' # Form Feed
\u000D => '\r' # Carriage Return
... all of which Regex.Escape() takes into account for you :)
To complement Mathias R. Jessen's helpful answer:
Generally, note that JSON data is much easier to work with - and processed more robustly - if you parse it into objects whose properties you can access - see the bottom section.
As for your regex attempt:
Note: The following also applies to all PowerShell-native regex features, such as the -match, -replace, and -split operators, the switch statement, and the Select-String cmdlet.
Mathias' answer uses [regex]::Escape() to escape the parts of the regex pattern to be used verbatim by the regex engine.
This is unequivocally the best approach if those verbatim parts aren't known in advance - e.g., when provided via a variable or expression, or passed as an argument.
However, in a regex pattern that is specified as a string literal it is often easier to individually \-escape the regex metacharacters, i.e. those characters that would otherwise have special meaning to the regex engine.
The list of characters that need escaping is (it can be inferred from the .NET Regular-Expression Quick Reference):
\ ( ) | . * + ? ^ $ [ {
If you enable the IgnorePatternWhiteSpace option (which you can do inline with
(?x), at the start of a pattern), you'll additionally have to \-escape:
#
significant whitespace characters (those you actually want matched) specified verbatim (e.g., ' ', or via string interpolation,"`t"); this does not apply to those specified via escape sequences (e.g., \t or \n).
Therefore, the solution could be simplified to:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Note how [ and { are \-escaped
$deviceId = if ($changeduserdata -match 'DeviceAddresses":\[\{"Id":(.*?),"') {
$Matches[1]
}
Using ConvertFrom-Json to properly parse JSON into objects is both more robust and more convenient, as it allows property access (dot notation) to extract the value of interest:
# Sample JSON
$changeduserdata = '{"DeviceAddresses":[{"Id": 42,"More": "stuff"}]}'
# Convert to an object ([pscustomobject]) and drill down to the property
# of interest; note that the value of .DeviceAddresses is an *array* ([...]).
$deviceId = (ConvertFrom-Json $changeduserdata).DeviceAddresses[0].Id # -> 42
I am trying to get the values in powershell within specific characters. Basically I have a json with thousands of objects like this
"Name": "AllZones_APOPreface_GeographyMatch_FromBRE_ToSTR",
"Sequence": 0,
"Condition": "this.TripOriginLocationCode==\"BRE\"&&this.TripDestinationLocationCode==\"STR\"",
"Action": "this.FeesRate=0.19m;this.ZoneCode=\"Zone1\";halt",
"ElseAction": ""
I want everything within \" \"
IE here I would see that BRE and STR is Zone1
All I need is those 3 things outputted.
I have been searching how to do it with ConvertFrom-Json but no success, maybe I just havent found a good article on this.
Thanks
Start by representing your JSON as a string:
$myjson = #'
{
"Name": "AllZones_APOPreface_GeographyMatch_FromBRE_ToSTR",
"Sequence": 0,
"Condition": "this.TripOriginLocationCode==\"BRE\"&&this.TripDestinationLocationCode==\"STR\"",
"Action": "this.FeesRate=0.19m;this.ZoneCode=\"Zone1\";halt",
"ElseAction": ""
}
'#
Next, create a regular expression that matches everything in between \" and \", that's under 10 characters long (else it'll match unwanted results).
$regex = [regex]::new('\\"(?<content>.{1,10})\\"')
Next, perform the regular expression comparison, by calling the Matches() method on the regular expression. Pass your JSON string into the method parameters, as the text that you want to perform the comparison against.
$matchlist = $regex.Matches($myjson)
Finally, grab the content match group that was defined in the regular expression, and extract the values from it.
$matchlist.Groups.Where({ $PSItem.Name -eq 'content' }).Value
Result
BRE
STR
Zone1
Approach #2: Use Regex Look-behinds for more accurate matching
Here's a more specific regular expression that uses look-behinds to validate each field appropriately. Then we assign each match to a developer-friendly variable name.
$regex = [regex]::new('(?<=TripOriginLocationCode==\\")(?<OriginCode>\w+)|(?<=TripDestinationLocationCode==\\")(?<DestinationCode>\w+)|(?<=ZoneCode=\\")(?<ZoneCode>\w+)')
$matchlist = $regex.Matches($myjson)
### Assign each component to its own friendly variable name
$OriginCode, $DestinationCode, $ZoneCode = $matchlist[0].Value, $matchlist[1].Value, $matchlist[2].Value
### Construct a string from the individual components
'Your origin code is {0}, your destination code is {1}, and your zone code is {2}' -f $OriginCode, $DestinationCode, $ZoneCode
Result
Your origin code is BRE, your destination code is STR, and your zone code is Zone1
Here is my code:
import json
a = "\1"
print json.dumps(a)
It returns "\u0001", instead of desired "\1".
Is there any way to get the exact character after passing with json dumps.
In Python, the string literal "\1" represents just one character of which the character code is 1. The backslash functions here as an escape to provide the character code as an octal value.
So either escape the backslash like this:
a = "\\1"
Or use the raw string literal notation with the r prefix:
a = r"\1"
Both will assign exactly the same string: print a produces:
\1
The output of json.dumps(a) will be:
"\\1"
... as also in JSON format, a literal backslash (reverse solidus) needs to be escaped by another backslash. But it truly represents \1.
The following prints True:
print a == json.loads(json.dumps(a))
I'm getting some corrupted JSON and I've reduced it down to this test case.
use utf8;
use 5.18.0;
use Test::More;
use Test::utf8;
use JSON::XS;
BEGIN {
# damn it
my $builder = Test::Builder->new;
foreach (qw/output failure_output todo_output/) {
binmode $builder->$_, ':encoding(UTF-8)';
}
}
foreach my $string ( 'Deliver «French Bread»', '日本国' ) {
my $hashref = { value => $string };
is_sane_utf8 $string, "String: $string";
my $json = encode_json($hashref);
is_sane_utf8 $json, "JSON: $json";
say STDERR $json;
}
diag ord('»');
done_testing;
And this is the output:
utf8.t ..
ok 1 - String: Deliver «French Bread»
not ok 2 - JSON: {"value":"Deliver «French Bread»"}
# Failed test 'JSON: {"value":"Deliver «French Bread»"}'
# at utf8.t line 17.
# Found dodgy chars "<c2><ab>" at char 18
# String not flagged as utf8...was it meant to be?
# Probably originally a LEFT-POINTING DOUBLE ANGLE QUOTATION MARK char - codepoint 171 (dec), ab (hex)
{"value":"Deliver «French Bread»"}
ok 3 - String: 日本国
ok 4 - JSON: {"value":"æ¥æ¬å½"}
1..4
{"value":"日本国"}
# 187
So the string containing guillemets («») is valid UTF-8, but the resulting JSON is not. What am I missing? The utf8 pragma is correctly marking my source. Further, that trailing 187 is from the diag. That's less than 255, so it almost looks like a variant of the old Unicode bug in Perl. (And the test output still looks like crap. Never could quite get that right with Test::Builder).
Switching to JSON::PP produces the same output.
This is Perl 5.18.1 running on OS X Yosemite.
is_sane_utf8 doesn't do what you think it does. You're suppose to pass strings you've decoded to it. I'm not sure what's the point of it, but it's not the right tool. If you want to check if a string is valid UTF-8, you could use
ok(eval { decode_utf8($string, Encode::FB_CROAK | Encode::LEAVE_SRC); 1 },
'$string is valid UTF-8');
To show that JSON::XS is correct, let's look at the sequence is_sane_utf8 flagged.
+--------------------- Start of two byte sequence
| +---------------- Not zero (good)
| | +---------- Continuation byte indicator (good)
| | |
v v v
C2 AB = [110]00010 [10]101011
00010 101011 = 000 1010 1011 = U+00AB = «
The following shows that JSON::XS produces the same output as Encode.pm:
use utf8;
use 5.18.0;
use JSON::XS;
use Encode;
foreach my $string ('Deliver «French Bread»', '日本国') {
my $hashref = { value => $string };
say(sprintf("Input: U+%v04X", $string));
say(sprintf("UTF-8 of input: %v02X", encode_utf8($string)));
my $json = encode_json($hashref);
say(sprintf("JSON: %v02X", $json));
say("");
}
Output (with some spaces added):
Input: U+0044.0065.006C.0069.0076.0065.0072.0020.00AB.0046.0072.0065.006E.0063.0068.0020.0042.0072.0065.0061.0064.00BB
UTF-8 of input: 44.65.6C.69.76.65.72.20.C2.AB.46.72.65.6E.63.68.20.42.72.65.61.64.C2.BB
JSON: 7B.22.76.61.6C.75.65.22.3A.22.44.65.6C.69.76.65.72.20.C2.AB.46.72.65.6E.63.68.20.42.72.65.61.64.C2.BB.22.7D
Input: U+65E5.672C.56FD
UTF-8 of input: E6.97.A5.E6.9C.AC.E5.9B.BD
JSON: 7B.22.76.61.6C.75.65.22.3A.22.E6.97.A5.E6.9C.AC.E5.9B.BD.22.7D
JSON::XS is generating valid UTF-8, but you're using the resulting UTF-8 encoded byte strings in two different contexts that expect character strings.
Issue 1: Test::utf8
Here are the two main situations when is_sane_utf8 will fail:
You have a miscoded character string that had been decoded from a UTF-8 byte string as if it were Latin-1 or from double encoded UTF-8, or the character string is perfectly fine and looks like a potentially "dodgy" miscoding (using the terminology from its docs).
You have a valid UTF-8 byte string containing the encoded code points U+0080 through U+00FF, for example «French Bread».
The is_sane_utf8 test is intended only for character strings and has the documented potential for false negatives.
Issue 2: Output Encoding
All of your non-JSON strings are character strings while your JSON strings are UTF-8 encoded byte strings, as returned from the JSON encoder. Since you're using the :encoding(UTF-8) PerlIO layer for TAP output, the character strings are being implicitly encoded to UTF-8 with good results, while the byte strings containing JSON are being double encoded. STDERR however does not have an :encoding PerlIO layer set, so the encoded JSON byte strings look good in your warnings since they're already encoded and being passed straight out.
Only use the :encoding(UTF-8) PerlIO layer for IO with character strings, as opposed to the UTF-8 encoded byte strings returned by default from the JSON encoder.
We recently switched to the new JSON2 perl module.
I thought all and everything gets returned quoted now.
But i encountered some cases in which a number (250) got returned as unquoted number in the json string created by perl.
Out of curiosity:
Does anyone know why such cases exist and how the json module decides if to quote a value?
It will be unquoted if it's a number. Without getting too deeply into Perl internals, something is a number if it's a literal number or the result of an arithmetic operation, and it hasn't been stringified since its numeric value was produced.
use JSON::XS;
my $json = JSON::XS->new->allow_nonref;
say $json->encode(42); # 42
say $json->encode("42"); # "42"
my $x = 4;
say $json->encode($x); # 4
my $y = "There are $x lights!";
say $json->encode($x); # "4"
$x++; # modifies the numeric value of $x
say $json->encode($x); # 5
Note that printing a number isn't "stringifying it" even though it produces a string representation of the number to output; print $x doesn't cause a number to be a string, but print "$x" does.
Anyway, all of this is a bit weird, but if you want a value to be reliably unquoted in JSON then put 0 + $value into your structure immediately before encoding it, and if you want it to be reliably quoted then use "" . $value or "$value".
You can force it into a string by doing something like this:
$number_str = '' . $number;
For example:
perl -MJSON -le 'print encode_json({foo=>123, bar=>"".123})'
{"bar":"123","foo":123}
It looks like older versions of JSON has autoconvert functionality that can be set. Did you not have $JSON::AUTOCONVERT set to a true value?