Is there a suggested practice for encoding the following three values in JSON?
NaN
Infinity
+Infinity
For example, should they just be encoded as-is as strings? Should NaN be converted to null instead? What is the suggested practice for encoding these values?
Related
RFC 7515 Section 3 mentions:
In both serializations, the JWS Protected Header, JWS Payload, and JWS Signature are base64url encoded, since JSON lacks a way to directly represent arbitrary octet sequences.
Why JSON cannot be represented using octet sequences?
JSON by definition is UTF-8, so there is no way to (usefully) represent a byte sequence which is not a valid UTF-8 character.
For example, you cannot encode the bytes \x80 \x80.
(You could set up mutual agreement on both sides for additional semantics beyond what JSON supports, and encode them like for example \\x80\\x80; but then your format is no longer strictly JSON. In this case, to actually encode as UTF-8, you'd have to spell out the UTF-8 encoding for U-0080 twice! And then base64 is just a better convention because it's more compact, and avoids any confusion between characters and bytes.)
To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".
ECMA-404: The JSON Data Interchange Format
I believe that there is no need to encode this character at all, so it could be represented directly as "𝄞". However, should one wish to encode it, it must, per spec, be encoded as "\uD834\uDD1E", not (as would seem reasonable) as "\u1d11e". Why is this?
One of the key architectural features of JSON is that JSON-encoded objects are valid Javascript literals that can be evaluated using the eval function, for example. Unfortunately, older Javascript implementations only support 16-bit Unicode escape sequences with four hex characters in string literals, so there's no other way than to use UTF-16 surrogates in escape sequences for code points above 0xFFFF in a portable way. (The \u{...} syntax that allows arbitrary code points was only introduced in ECMAScript 6.)
But as you mentioned, there's no need to use escape sequences if your application supports Unicode JSON text. Simply encode the characters directly in the respective Unicode format.
I use jackson to parse json data. Now I have a problem with handling a \uXXXX issue.
The data I got here is like
{"UID":"here_\ud83d\udc3b"}
After I use ObjectMapper.readValue(jsonContent, UserId.class); to convert json to an instance of UserId, the UID property is not literally "here_\ud83d\udc3b". Jackson convert \ud83d\udc3b to 2 chars as the unicode value.
My question is, is it possible to let jackson skip this "Unicode transformation" and key the literal value "\ud83d\udc3b" as it is?
No. JSON parsers are required to handle Unicode escapes to produce underlying Unicode characters.
When writing, on the other hand, some characters may also be encoded using similar Unicode escapes.
So if you need to use escaping, you need to re-encode such values yourself.
I'm sorry if this is really trivial, but I've got a series of data as follows:
{"color":{"red":255,"blue":123,"green",1}}
I know it's in this format because, for some reason, it's easy to work with. What is this format called so that I might look it up?
\edit: If there is any significance to the organization of the data, of course.
That's JSON, a serialised text data storage based on a subset of JavaScript Object Notation. To learn more about JSON, vist: http://json.org
In JSON, there are the following data types:
object
array
number
string
null
boolean
Objects are represented using the following syntax, and are key-value pairings, similar to a dictionary (the key must be a string):
{ "number": 1, "string": "test" }
Like dictionaries, objects are unordered.
An array is a ordered, heterogeneous data structure, represented using the following syntax:
[0, true, false, "1", null]
Numbers are what you'd expect, however unlike JavaScript itself they cannot be Infinity or NaN (i.e. they must be decimals or integers) and contain no leading 0s. Exponents are represented using the following format (the e is not case sensitive):
10e6
where 10 is the base and 6 is the exponent - this is equivalent to 1000000 (1 million). The exponent section may have leading 0s, though there is not much point and may lower compatibility with parsers which are not 100% compliant.
Booleans are case sensitive and are both lowercase. In JSON, there are only two booleans:
true
false
To represent an intentionally left out or otherwise unknown field, use null (case sensitive too).
Strings must be delimited using double quotes (single quotes are invalid syntax), and single quotes need not be escaped.
"This string is valid, and it's alright to do this."
'No, this won't work'
'Nor will this.'
There are numerous escapes available using the backslash character - to use a literal backslash, use \\.
As JSON is a data transmission format, there is no syntax for comments available.
I have a column of type char(32) where I want to store an MD5 hash key. The problem is i've used SQL to update the existing records using HashBytes() function which creates values like
:›=k! ©úw"5Ýâ‘<\
but when I do the insert via .NET it comes through as
3A9B3D6B2120A9FA772235DDE2913C5C
What do I need to do to get these to match up? Is it the encoding?
HashKey isn't a SQL function, did you mean HASHBYTES? Some actual code would help. SQL appears to be computing the raw binary hash and displaying it as ASCII characters.
.NET is computing the hash, then converting it to hexadecimal (or so it appears). CHAR(32) isn't a good way to store raw binary data, you would want to use the BINARY type.
An Example in SQL:
SELECT SUBSTRING(sys.fn_varbintohexstr(HASHBYTES('MD5',0x2040)),3, 32)
And an Example in .NET:
using (MD5 md5 = MD5.Create())
{
var data = new byte[] { 0x20, 0x40 };
var hashed = md5.ComputeHash(data);
var hexHash = BitConverter.ToString(hashed).Replace("-", "");
Console.Out.WriteLine("hexHash = {0}", hexHash);
}
These will both produce the same value. (Where 0x2040 is sample data).
You can either store the hexadecimal data as CHAR(32), or as BINARY(16). Storing the Binary data is twice as space efficient than storing it as hex. What you should not be doing is storing the binary data as CHAR(16).
It's not clear what you mean by "when I do the insert via .NET" - but you shouldn't be storing binary data just in a raw form, as it looks like your'e doing using HashKey(). (Do you definitely mean HashKey by the way? I can't find a reference for it, but there's HashBytes...)
Two common options are to encode the raw binary data as hex - which it looks like you're doing in the second case - or to use base64. Either way should be easy from .NET (Base64 marginally easier, using Convert.ToBase64String) and you probably just need to find the equivalent SQL Server function.
MD5 is typically stored as in hex encoding. I'd guess that your hashkey() SQL function is not hex encoding the MD5 hash, rather it's just returning the ASCII characters representing the hash. But your .NET method is HEX encoding. If you store your MD5 hashing consistently as HEX (or not - up to you but usually stored as HEX), then the results between the two should always be consistent.
For example, the : symbol from your SQL hash is the first character returned from HashKey(). In the .NET method, the first 2 characters are 3A. 31 is 51 in decimal. ASCII code 51 is the colon (:) character. Similarly, you can work your way through each other character, and do the HEX conversion.
See any ASCII codes table for reference, i.e. http://www.asciitable.com/