Are whitespace characters insignificant in JSON? - json

Are blank characters like spaces, tabs and carriage returns ignored in json strings?
For example, is {"a":"b"} equal to {"a" : "b"}?

Yes, blanks outside a double-quoted string literal are ignored in the syntax. Specifically, the ws production in the JSON grammar in RFC 4627 shows:
Insignificant whitespace is allowed before or after any of the six
structural characters.
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)

In standard JSON, whitespace outside of string literals is ignored, as has been said.
However, since your question is tagged C#, I should note that there's at least one other case in C#/.NET where whitespace in JSON does matter.
The DataContractJsonSerializer uses a special __type property to support deserializing to the correct subclass. This property is required to be the first property in an object, and to have no whitespace between the property name and the preceeding {. See this previous thread:
DataContractJsonSerializer doesn't work with formatted JSON?
At least, I have tested that the no-whitespace requirement is true as of .NET 4. Perhaps this will be changed in a future version to bring it more into line with the JSON standard?

Related

Are both comma and colon redundant for JSON parser?

JSON I mentioned below is valid JSON.
I finished writing a parser of JSON which allowing only two basic data types of String and Object. Let me show what parser does in case of any ambiguity.
parse("{ "Mon": "weekday", "Tue": "weekday", "Sun": "weekend" }").get("Sun");//return value: "weekend"
parse("{ "weekday" : { "Mon": "1", "Tue": "2"} }").get("weekday").get("Mon");//return value: "1"
Function parse returns a dictionary from which we can get what we want.
I found that I didn't use any commas or colons to parse JSON, then I guess those notations may be also redundant for a full-data-type-supported JSON parser, is that true? If it is, they are for readability, right?
PS: what if it's invalid JSON? Same answer?
According to RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), the colon and comma are listed as name-separator and value-separator respectively.
See under section 2. JSON Grammar:
These are the six structural characters:
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
So, they are both valid JSON separators with specific uses.
Refer section 9. Parsers:
A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.
An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range and precision
of numbers. An implementation may set limits on the length and
character contents of strings.
From the Parsers section, one can gather that there's no mention of skipping (ignoring) colon and/or comma because then the parser in question would not be conforming to JSON grammar.
Summing up, from the above sections, it is safe to say that any such decision of ignoring the JSON grammar would certainly be completely subjective implying that such parser is not conforming to the grammar.
So, that answers the question that the colon or comma are not redundant and they are essential part of the JSON grammar.
Hope that helps!
Json is a subset of JavaScript syntax. It's very small subset, and so not all of the punctuation is necessary. But it is necessary in full expression syntax, because in many cases you cannot know where one expression in a list ends and the next one starts, unless there is a comma between them.
(There are alternatives to commas, of course. Lisp S-expressions don't need commas, as Ira Baxter points out, but they use more parentheses, which many people find noisier than commas.)
So as long as you consider it important to be able to insert JSON into a JavaScript text, you need to keep the JavaScript form, commas and colons and all.
One important aspect of JSON is that correct JSON is safe. You cannot insert untested JSON into an executable string, of course. That would be insane. But a JSON parser should validate its input, and validated JSON is safe to the ninect into code. If your parser lets you leave out the commas, that would no longer be rhe case.

Clojure: Escaping unicode `\U` in JSON encoding

Postamble for future readers
Elm allows literal C:\Users\myuser in strings
This is consistent with the JSON spec
My problem was unrelated to this, but several layers of escaping convoluted the problem. Future lesson: fully producing a minimal working example would have found the error!
Original question
I have a Clojure backend that talks to an Elm frontend. I hit a bump when decoding JSON values in Elm.
\U below means the literal characters backslash and U, as if read from a text file. "\\U" is the same string as input in Clojure and Elm source (\ must be escaped). Note enclosing "".
Problem: encoding \U
The literal string \U, escaped "\\U" is not accepted by the Elm string decoder.
A blog post suggests that to obtain the literal string \U, this should be encoded in source code as "\\\\U", "escaping the unicode escape".
The literal string I want to send to the client is C:\Users\myuser. I prefer to send valid JSON from the server to the client.
Clojure standard library behavior
clojure.data.json does not do anything special for strings containing the literal \U. The example below shows that \U and \m are threated equally, the backslash is escaped, and the following character ignored.
project.core> (clojure.data.json/write-str "C:\\Users\\myuser")
"\"C:\\\\Users\\\\myuser\""
Manual workaround
Temporary workaround is manually escaping the strings I need:
(defn escape-backslash-u [s]
(clojure.string/replace s "\\U" "\\\\U"))
Concrete questions
Is clojure.data.json/write-str behaving correctly? As I understand the documentation, output should be valid unicode.
Are other JSON libraries behaving similarly?
Is Elm's Json.Decode behaving correctly by rejecting the literal string \U?
Solution progress
A friendly Clojurians Slack user pointed to the JSON standard specification, specifically sections 7. Strings and 8.2. Unicode characters.
I think you may be on the wrong track here.
The string you gave as an example, "C:\\Users\\myuser" is completely unproblematic, it does not contain any Unicode escape sequences. It is a string containing the ASCII characters ‘C’, ‘:’, ‘\’, ‘U’, and so on. The backslash is the escape character in Clojure strings so it needs to be escaped itself to represent a literal backslash.
In any case the string "C:\\Users\\myuser" can be serialized with (clojure.data.json/write-str "C:\\Users\\myuser"), and, as you know, this gives "\"C:\\\\Users\\\\myuser\"". All of this seems perfectly straightforward and sound.
Printing "\"C:\\\\Users\\\\myuser\"" results in the original string "C:\\Users\\myuser" being printed. That string is accepted as valid by JSONLint, again as expected.
I understood it as Elm beeing unable to decode \"C:\\\\User... to "C:\\User... because it interprets \u as start for an escape sequence.
I tried elm here with the following code:
import Html exposing (text)
main =
text "\"c:\\\\user\\\\foo\"" // from clojure.data.json/write-str
which in turn compiles/runs to
"c:\\user\\foo"
which looks fine to me.
Are you sure there is nothing else going on (middleware, transport) ?

JSON and escaping characters

I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.
It looks like if the string contains a degree symbol, then I get a problem.
I could use some help in figuring out who to blame:
is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)
is it Google gson?
is it me for not doing something properly?
Here's what happens in JSDB:
js>s='15\u00f8C'
15°C
js>JSON.stringify(s)
"15°C"
I would have expected "15\u00f8C' which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description (is that the spec?) says that a char can be
any-Unicode-character-
except-"-or-\-or-
control-character"
so maybe it passes the string along as-is without encoding it as \u00f8... in which case I would think the problem is with the gson library.
Can anyone help?
I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling JSON.stringify() -- but if this is a bug then I'd like to file a bug report.
This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:
2.5. Strings
The representation of strings is
similar to conventions used in the C
family of programming languages. A
string begins and ends with quotation
marks. All Unicode characters may be
placed within the quotation marks
except for the characters that must be
escaped: quotation mark, reverse
solidus, and the control characters
(U+0000 through U+001F).
Any character may be escaped.
Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).
It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.
hmm, well here's a workaround anyway:
function JSON_stringify(s, emit_unicode)
{
var json = JSON.stringify(s);
return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
function(c) {
return '\\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
}
);
}
test case:
js>s='15\u00f8C 3\u0111';
15°C 3◄
js>JSON_stringify(s, true)
"15°C 3◄"
js>JSON_stringify(s, false)
"15\u00f8C 3\u0111"
This is SUPER late and probably not relevant anymore, but if anyone stumbles upon this answer, I believe I know the cause.
So the JSON encoded string is perfectly valid with the degree symbol in it, as the other answer mentions. The problem is most likely in the character encoding that you are reading/writing with. Depending on how you are using Gson, you are probably passing it a java.io.Reader instance. Any time you are creating a Reader from an InputStream, you need to specify the character encoding, or java.nio.charset.Charset instance (it's usually best to use java.nio.charset.StandardCharsets.UTF_8). If you don't specify a Charset, Java will use your platform default encoding, which on Windows is usually CP-1252.

Does Jackson JSON do special char escaping?

I was assuming that Jackson would automatically escape special characters during serialization i.e. serialize "/path/" as "\/path\/". It appears not to be the case - at least out of the box with 1.6:
#Test
public void testJacksonSerialize() throws Exception
{
ObjectMapper om = new ObjectMapper();
assertEquals("\\/path\\/", om.writeValueAsString("/path/"));
}
...fails - the output produced is "/path/". Do I have to write my own serializer or is there a way to enable special char escaping in Jackson?
thanks,
-nikita
Jackson only escapes mandatory things. "/" is not something you must escape, hence it is not. This as per JSON specification.
Now: if you absolutely want escaping, you can use methods to write "raw" content or values (in which case Jackson does no processing whatsoever and dumps String in output).
But do you really need such escaping? I know some generators do escape it (for reasons unknown to me), but no parser expects it so it should be just fine to leave slashes unescaped. This is different from backslashes that obviously must be escaped.
The slash "/" doesn't need to be escaped in JSON because it has no special meaning. Yet JSON allows the slash to be escaped for the following reason.
If you dump a JSON text right into a <SCRIPT> element of an HTML text, you have to make sure that the two-character-sequence "</" does not occur in the text. That sequence would end the script element immediately according to HTML rules. But if the JSON text reads "<\/", this has the same meaning to JSON while not interferring with HTML rules. Consequently, some JSON generators escape the slash if and only if it's preceded by a less-than-sign.
That being said, I don't know the direct answer to your question (how to absolutely do the escaping in Jackson).
New line character will not work in case of Tooltip with some browsers.
Not working \r\n Not working or \n Not working
Use double backslash to skip characters
Solution : use '\\r\\n' in place of '\r\n' ,
it will solve your problem.

Do the JSON keys have to be surrounded by quotes?

Example:
Is the following code valid against the JSON Spec?
{
precision: "zip"
}
Or should I always use the following syntax? (And if so, why?)
{
"precision": "zip"
}
I haven't really found something about this in the JSON specifications. Although they use quotes around their keys in their examples.
Yes, you need quotation marks. This is to make it simpler and to avoid having to have another escape method for javascript reserved keywords, ie {for:"foo"}.
You are correct to use strings as the key. Here is an excerpt from RFC 4627 - The application/json Media Type for JavaScript Object Notation (JSON)
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
object = begin-object [ member *( value-separator member ) ] end-object
member = string name-separator value
[...]
2.5. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. [...]
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
Read the whole RFC here.
From 2.2. Objects
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string.
and from 2.5. Strings
A string begins and ends with quotation marks.
So I would say that according to the standard: yes, you should always quote the key (although some parsers may be more forgiving)
Yes, quotes are mandatory. http://json.org/ says:
string
""
" chars "
Not if you use JSON5
For regular JSON, yes keys must be quoted. But if you need otherwise, checkout widely used JSON5, which is so-named because is a superset of JSON that allows ES5 syntax, including:
unquoted property keys
single-quoted, escaped and multi-line strings
alternate number formats
comments
extra whitespace
The JSON5 reference implementation (json5 npm package) provides a JSON5 object that has parse and stringify methods with the same args and semantics as the built-in JSON object.
widely used, and depended on by many high profile projects
JSON5 was started in 2012, and as of 2022, now gets >65M downloads/week, ranks in the top 0.1% of the most depended-upon packages on npm, and has been adopted by major projects like Chromium, Next.js, Babel, Retool, WebStorm, and more. It's also natively supported on Apple platforms like MacOS and iOS.
~ json5.org homepage
In your situation, both of them are valid, meaning that both of them will work.
However, you still should use the one with quotation marks in the key names because it is more conventional, which leads to more simplicity and ability to have key names with white spaces etc.
Therefore, use the one with the quotation marks.
edit// check this: What is the difference between JSON and Object Literal Notation?
Since you can put "parent.child" dotted notation and you don't have to put parent["child"] which is also valid and useful, I'd say both ways is technically acceptable. The parsers all should do both ways just fine. If your parser does not need quotes on keys then it's probably better not to put them (saves space). It makes sense to call them strings because that is what they are, and since the square brackets gives you the ability to use values for keys essentially it makes perfect sense not to.
In Json you can put...
>var keyName = "someKey";
>var obj = {[keyName]:"someValue"};
>obj
Object {someKey: "someValue"}
just fine without issues, if you need a value for a key and none quoted won't work, so if it doesn't, you can't, so you won't so "you don't need quotes on keys". Even if it's right to say they are technically strings. Logic and usage argue otherwise. Nor does it officially output Object {"someKey": "someValue"} for obj in our example run from the console of any browser.