Are both comma and colon redundant for JSON parser? - json

JSON I mentioned below is valid JSON.
I finished writing a parser of JSON which allowing only two basic data types of String and Object. Let me show what parser does in case of any ambiguity.
parse("{ "Mon": "weekday", "Tue": "weekday", "Sun": "weekend" }").get("Sun");//return value: "weekend"
parse("{ "weekday" : { "Mon": "1", "Tue": "2"} }").get("weekday").get("Mon");//return value: "1"
Function parse returns a dictionary from which we can get what we want.
I found that I didn't use any commas or colons to parse JSON, then I guess those notations may be also redundant for a full-data-type-supported JSON parser, is that true? If it is, they are for readability, right?
PS: what if it's invalid JSON? Same answer?

According to RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), the colon and comma are listed as name-separator and value-separator respectively.
See under section 2. JSON Grammar:
These are the six structural characters:
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
So, they are both valid JSON separators with specific uses.
Refer section 9. Parsers:
A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.
An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range and precision
of numbers. An implementation may set limits on the length and
character contents of strings.
From the Parsers section, one can gather that there's no mention of skipping (ignoring) colon and/or comma because then the parser in question would not be conforming to JSON grammar.
Summing up, from the above sections, it is safe to say that any such decision of ignoring the JSON grammar would certainly be completely subjective implying that such parser is not conforming to the grammar.
So, that answers the question that the colon or comma are not redundant and they are essential part of the JSON grammar.
Hope that helps!

Json is a subset of JavaScript syntax. It's very small subset, and so not all of the punctuation is necessary. But it is necessary in full expression syntax, because in many cases you cannot know where one expression in a list ends and the next one starts, unless there is a comma between them.
(There are alternatives to commas, of course. Lisp S-expressions don't need commas, as Ira Baxter points out, but they use more parentheses, which many people find noisier than commas.)
So as long as you consider it important to be able to insert JSON into a JavaScript text, you need to keep the JavaScript form, commas and colons and all.
One important aspect of JSON is that correct JSON is safe. You cannot insert untested JSON into an executable string, of course. That would be insane. But a JSON parser should validate its input, and validated JSON is safe to the ninect into code. If your parser lets you leave out the commas, that would no longer be rhe case.

Related

Why doesn't the "official" json checker allow top-level primitives? [duplicate]

I've carefully read the JSON description http://json.org/ but I'm not sure I know the answer to the simple question. What strings are the minimum possible valid JSON?
"string" is the string valid JSON?
42 is the simple number valid JSON?
true is the boolean value a valid JSON?
{} is the empty object a valid JSON?
[] is the empty array a valid JSON?
At the time of writing, JSON was solely described in RFC4627. It describes (at the start of "2") a JSON text as being a serialized object or array.
This means that only {} and [] are valid, complete JSON strings in parsers and stringifiers which adhere to that standard.
However, the introduction of ECMA-404 changes that, and the updated advice can be read here. I've also written a blog post on the issue.
To confuse the matter further however, the JSON object (e.g. JSON.parse() and JSON.stringify()) available in web browsers is standardised in ES5, and that clearly defines the acceptable JSON texts like so:
The JSON interchange format used in this specification is exactly that described by RFC 4627 with two exceptions:
The top level JSONText production of the ECMAScript JSON grammar may consist of any JSONValue rather than being restricted to being a JSONObject or a JSONArray as specified by RFC 4627.
snipped
This would mean that all JSON values (including strings, nulls and numbers) are accepted by the JSON object, even though the JSON object technically adheres to RFC 4627.
Note that you could therefore stringify a number in a conformant browser via JSON.stringify(5), which would be rejected by another parser that adheres to RFC4627, but which doesn't have the specific exception listed above. Ruby, for example, would seem to be one such example which only accepts objects and arrays as the root. PHP, on the other hand, specifically adds the exception that "it will also encode and decode scalar types and NULL".
There are at least four documents which can be considered JSON standards on the Internet. The RFCs referenced all describe the mime type application/json. Here is what each has to say about the top-level values, and whether anything other than an object or array is allowed at the top:
RFC-4627: No.
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized object or array.
JSON-text = object / array
Note that RFC-4627 was marked "informational" as opposed to "proposed standard", and that it is obsoleted by RFC-7159, which in turn is obsoleted by RFC-8259.
RFC-8259: Yes.
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array. Implementations that generate only objects or arrays where a
JSON text is called for will be interoperable in the sense that all
implementations will accept these as conforming JSON texts.
JSON-text = ws value ws
RFC-8259 is dated December 2017 and is marked "INTERNET STANDARD".
ECMA-262: Yes.
The JSON Syntactic Grammar defines a valid JSON text in terms of tokens defined by the JSON lexical
grammar. The goal symbol of the grammar is JSONText.
Syntax
JSONText :
JSONValue
JSONValue :
JSONNullLiteral
JSONBooleanLiteral
JSONObject
JSONArray
JSONString
JSONNumber
ECMA-404: Yes.
A JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value
grammar. The set of tokens includes six structural tokens, strings, numbers, and three literal name tokens.
According to the old definition in RFC 4627 (which was obsoleted in March 2014 by RFC 7159), those were all valid "JSON values", but only the last two would constitute a complete "JSON text":
A JSON text is a serialized object or array.
Depending on the parser used, the lone "JSON values" might be accepted anyway. For example (sticking to the "JSON value" vs "JSON text" terminology):
the JSON.parse() function now standardised in modern browsers accepts any "JSON value"
the PHP function json_decode was introduced in version 5.2.0 only accepting a whole "JSON text", but was amended to accept any "JSON value" in version 5.2.1
Python's json.loads accepts any "JSON value" according to examples on this manual page
the validator at http://jsonlint.com expects a full "JSON text"
the Ruby JSON module will only accept a full "JSON text" (at least according to the comments on this manual page)
The distinction is a bit like the distinction between an "XML document" and an "XML fragment", although technically <foo /> is a well-formed XML document (it would be better written as <?xml version="1.0" ?><foo />, but as pointed out in comments, the <?xml declaration is technically optional).
JSON stands for JavaScript Object Notation. Only {} and [] define a Javascript object. The other examples are value literals. There are object types in Javascript for working with those values, but the expression "string" is a source code representation of a literal value and not an object.
Keep in mind that JSON is not Javascript. It is a notation that represents data. It has a very simple and limited structure. JSON data is structured using {},:[] characters. You can only use literal values inside that structure.
It is perfectly valid for a server to respond with either an object description or a literal value. All JSON parsers should be handle to handle just a literal value, but only one value. JSON can only represent a single object at a time. So for a server to return more than one value it would have to structure it as an object or an array.
The ecma specification might be useful for reference:
http://www.ecma-international.org/ecma-262/5.1/
The parse function parses a JSON text (a JSON-formatted String) and produces an ECMAScript value. The
JSON format is a restricted form of ECMAScript literal. JSON objects are realized as ECMAScript objects.
JSON arrays are realized as ECMAScript arrays. JSON strings, numbers, booleans, and null are realized as
ECMAScript Strings, Numbers, Booleans, and null. JSON uses a more limited set of white space characters
than WhiteSpace and allows Unicode code points U+2028 and U+2029 to directly appear in JSONString literals
without using an escape sequence. The process of parsing is similar to 11.1.4 and 11.1.5 as constrained by
the JSON grammar.
JSON.parse("string"); // SyntaxError: Unexpected token s
JSON.parse(43); // 43
JSON.parse("43"); // 43
JSON.parse(true); // true
JSON.parse("true"); // true
JSON.parse(false);
JSON.parse("false");
JSON.parse("trueee"); // SyntaxError: Unexpected token e
JSON.parse("{}"); // {}
JSON.parse("[]"); // []
Yes, yes, yes, yes, and yes. All of them are valid JSON value literals.
However, the official RFC 4627 states:
A JSON text is a serialized object or array.
So a whole "file" should consist of an object or array as the outermost structure, which of course can be empty. Yet, many JSON parsers accept primitive values as well for input.
Just follow the railroad diagrams given on the json.org page. [] and {} are the minimum possible valid JSON objects. So the answer is [] and {}.
var x;
console.log(JSON.stringify(x)); // will output "{}"
So your answer is "{}" which denotes an empty object.

What is the format of this data?

I'm sorry if this is really trivial, but I've got a series of data as follows:
{"color":{"red":255,"blue":123,"green",1}}
I know it's in this format because, for some reason, it's easy to work with. What is this format called so that I might look it up?
\edit: If there is any significance to the organization of the data, of course.
That's JSON, a serialised text data storage based on a subset of JavaScript Object Notation. To learn more about JSON, vist: http://json.org
In JSON, there are the following data types:
object
array
number
string
null
boolean
Objects are represented using the following syntax, and are key-value pairings, similar to a dictionary (the key must be a string):
{ "number": 1, "string": "test" }
Like dictionaries, objects are unordered.
An array is a ordered, heterogeneous data structure, represented using the following syntax:
[0, true, false, "1", null]
Numbers are what you'd expect, however unlike JavaScript itself they cannot be Infinity or NaN (i.e. they must be decimals or integers) and contain no leading 0s. Exponents are represented using the following format (the e is not case sensitive):
10e6
where 10 is the base and 6 is the exponent - this is equivalent to 1000000 (1 million). The exponent section may have leading 0s, though there is not much point and may lower compatibility with parsers which are not 100% compliant.
Booleans are case sensitive and are both lowercase. In JSON, there are only two booleans:
true
false
To represent an intentionally left out or otherwise unknown field, use null (case sensitive too).
Strings must be delimited using double quotes (single quotes are invalid syntax), and single quotes need not be escaped.
"This string is valid, and it's alright to do this."
'No, this won't work'
'Nor will this.'
There are numerous escapes available using the backslash character - to use a literal backslash, use \\.
As JSON is a data transmission format, there is no syntax for comments available.

JSON alternatives (for the purpose of specifying configuration)?

I like json as a format for configuration files for the software I write. I like that it's lightweight, simple, and widely supported. However, I'm finding that there are some things I'd really like in json that it doesn't have.
Json doesn't have multiline strings or here documents ( http://en.wikipedia.org/wiki/Here_document ), and that is often very awkward when you want your json file to be human-readable and -editable. You can use arrays of strings, but that's a kludgy workaround.
Json doesn't allow comments.
If you look at the formats of unix configuration files, you see a lot of people designing their own awkward formats for things that it would really make more sense to do using some kind of general-purpose thing. For example, here's some code from an Apache config file:
RewriteEngine on
RewriteBase /temp
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.html
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule t\.html t.xhtml [T=application/xhtml+xml]
Essentially, what's going on here is that they've invented an extremely painful way of writing a boolean function f(w,x,y,z)=w&!x&y&z. You want a logical "or"? They've got some separate (ugly) mechanism for that, too.
What this seems to point toward is some kind of data description language that is simple and Turing-incomplete, but still more expressive, flexible, and convenient than json. Does anyone know of such a language?
To my taste, XML is too complicated, and lisp expressions have the wrong features (Turing-completeness) and lack the right features (here documents, expressive syntax).
[EDIT] The title is misleading. I'm not literally interested in the next iteration of json. I'm not interested in languages that are a subset of javascript. I'm interested in alternative data-description languages.
The EDN format is one option based on Clojure literals. It is almost a superset of JSON, except that no special symbol separates keys and values in maps (as : does in JSON); rather, all elements are separated by whitespace and/or a comma and a map is encoded as a list with an even number of elements, enclosed in {..}.
EDN allows for comments (to newline using ;, or to end of the next element read using #_), but not here-docs. It is extensible to new types using a tag notation:
#myapp/Person {:first "Fred" :last "Mertz"}
The argument of the myapp/Person tag (i.e. {:first "Fred" :last "Mertz"}) must be a valid EDN expression, which makes it unextensible to here-doc support.
It has two built-in tags: #inst for timestamps and #uuid. It also supports namespaced symbol (i.e. identifier) and keyword (i.e. map key consts) types; it distinguishes lists (..) and vectors [..]. An element of any type may be used as a key in a map.
In the context of your above problem, one could invent an #apache/rule-or tag which accepts a sequence of elements, whose semantics I leave up to you!
Have a look at http://github.com/igagis/puu/
It is even simpler than JSON.
It has C++ style comments.
It is possible to format multiline strings and use escaped new line \n and tab \t chars if "real" new line or tab is needed.
Here is the example snippet:
"String object"
AnotherStringObject
"String with children"{
"child 1"
Child2
"child three"{
SubChild1
"Subchild two"
Property1 {Value1}
"Property two" {"Value 2"}
//comment
/* multi-line
comment */
"multi-line
string"
"Escape sequences \" \n \r \t \\"
}
R"qwerty(
This is a
raw string, "Hello world!"
int main(argc, argv){
int a = 10;
printf("Hello %d", a);
}
)qwerty"
}
Consider TOML.
Designed for configuration. Appears to be pretty friendly and powerful. Easy to read and supports a wide range of datatypes and structures. There are parsers for a lot of languages:
C
C#
C++
Common Lisp
Crystal
Dart
Erlang
Fortran
Go
Janet
Java
JavaScript
Julia
Kotlin
Lua
Nim
OCaml
Perl
Perl6/Raku
Python
Rust
Swift
V
The 'J' in JSON is "Javascript". If a particular desired syntax construct isn't in Javascript, then it won't be on JSON.
Heredocs are beyond JSON's purview. That's a language syntax construct for simplified multi-line string definition, but JSON is a transport notation. It has nothing to do with construction. It does, however, have multiline strings, simply by allowing \n newline characters within strings. There's nothing in JSON that says you can't have a linebreak in a string. As long as the containing quote characters are correct, it's perfectly valid. e.g.
{"x":"y\nz"}
is 100% legitimate valid JSON, and is a multiline string, whereas
{"x":"y
z"}
isn't and will fail on parsing.
There's always what I like to call "real JSON". JSON stands for JavaScript Object Notation, and JavaScript does have comments and something close enough to heredocs.
For the heredoc, you would use JavaScript's E4X inline XML:
{
longString: <>
Hello, world!
This is a long string made possible with the magic of E4X.
Implementing a parser isn't so difficult.
</>.toString() // And a comment
/* And another
comment */
}
You can use Firefox's JavaScript engine (FF is the only browser to support E4X currently) or you can implement your own parser, which really isn't so difficult.
Here's the E4X quickstart guide, too.
Since March 2018 you can use JSON5 which seems to have added everything you (& many others) were missing from JSON.
Short Example (JSON5)
{
// comments
unquoted: 'and you can quote me on that',
singleQuotes: 'I can use "double quotes" here',
lineBreaks: "Look, Mom! \
No \\n's!",
hexadecimal: 0xdecaf,
leadingDecimalPoint: .8675309, andTrailing: 8675309.,
positiveSign: +1,
trailingComma: 'in objects', andIn: ['arrays',],
"backwardsCompatible": "with JSON",
}
The JSON5 Data Interchange Format (JSON5) is a superset of JSON that
aims to alleviate some of the limitations of JSON by expanding its
syntax to include some productions from ECMAScript 5.1.
Summary of Features
The following ECMAScript 5.1 features, which are not supported in
JSON, have been extended to JSON5.
Objects
Object keys may be an ECMAScript 5.1 IdentifierName.
Objects may have a single trailing comma.
Arrays
Arrays may have a single trailing comma.
Strings
Strings may be single quoted.
Strings may span multiple lines by escaping new line characters.
Strings may include character escapes.
Numbers
Numbers may be hexadecimal.
Numbers may have a leading or trailing decimal point.
Numbers may be IEEE 754 positive infinity, negative infinity, and NaN.
Numbers may begin with an explicit plus sign.
Comments
Single and multi-line comments are allowed.
White Space
Additional white space characters are allowed.
GitHub: https://github.com/json5/json5
One important attribute of JSON (probably the most important) is that you can easily "flip" between the string representation and the representation in object form, and the objects used to represent the object form are relatively simple arrays and maps. This is what makes JSON so useful in a networking context.
The functions you want would conflict with this dual nature of JSON.
For configuration you could use an embeddable scripting language, such as lua or python, in fact this is not an uncommon thing to do for configuration. That gives you multiline strings or here documents, and comments. It also makes it easier to have things like the boolean function you describe. However, the scripting languages are, of course, Turing complete.
There is also ELDF.
Although it does not support comments, they can be emulated via empty keys:
config_var1 = value1
=some comment
config_var2 = value2

Are whitespace characters insignificant in JSON?

Are blank characters like spaces, tabs and carriage returns ignored in json strings?
For example, is {"a":"b"} equal to {"a" : "b"}?
Yes, blanks outside a double-quoted string literal are ignored in the syntax. Specifically, the ws production in the JSON grammar in RFC 4627 shows:
Insignificant whitespace is allowed before or after any of the six
structural characters.
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)
In standard JSON, whitespace outside of string literals is ignored, as has been said.
However, since your question is tagged C#, I should note that there's at least one other case in C#/.NET where whitespace in JSON does matter.
The DataContractJsonSerializer uses a special __type property to support deserializing to the correct subclass. This property is required to be the first property in an object, and to have no whitespace between the property name and the preceeding {. See this previous thread:
DataContractJsonSerializer doesn't work with formatted JSON?
At least, I have tested that the no-whitespace requirement is true as of .NET 4. Perhaps this will be changed in a future version to bring it more into line with the JSON standard?

Do the JSON keys have to be surrounded by quotes?

Example:
Is the following code valid against the JSON Spec?
{
precision: "zip"
}
Or should I always use the following syntax? (And if so, why?)
{
"precision": "zip"
}
I haven't really found something about this in the JSON specifications. Although they use quotes around their keys in their examples.
Yes, you need quotation marks. This is to make it simpler and to avoid having to have another escape method for javascript reserved keywords, ie {for:"foo"}.
You are correct to use strings as the key. Here is an excerpt from RFC 4627 - The application/json Media Type for JavaScript Object Notation (JSON)
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
object = begin-object [ member *( value-separator member ) ] end-object
member = string name-separator value
[...]
2.5. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. [...]
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
Read the whole RFC here.
From 2.2. Objects
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string.
and from 2.5. Strings
A string begins and ends with quotation marks.
So I would say that according to the standard: yes, you should always quote the key (although some parsers may be more forgiving)
Yes, quotes are mandatory. http://json.org/ says:
string
""
" chars "
Not if you use JSON5
For regular JSON, yes keys must be quoted. But if you need otherwise, checkout widely used JSON5, which is so-named because is a superset of JSON that allows ES5 syntax, including:
unquoted property keys
single-quoted, escaped and multi-line strings
alternate number formats
comments
extra whitespace
The JSON5 reference implementation (json5 npm package) provides a JSON5 object that has parse and stringify methods with the same args and semantics as the built-in JSON object.
widely used, and depended on by many high profile projects
JSON5 was started in 2012, and as of 2022, now gets >65M downloads/week, ranks in the top 0.1% of the most depended-upon packages on npm, and has been adopted by major projects like Chromium, Next.js, Babel, Retool, WebStorm, and more. It's also natively supported on Apple platforms like MacOS and iOS.
~ json5.org homepage
In your situation, both of them are valid, meaning that both of them will work.
However, you still should use the one with quotation marks in the key names because it is more conventional, which leads to more simplicity and ability to have key names with white spaces etc.
Therefore, use the one with the quotation marks.
edit// check this: What is the difference between JSON and Object Literal Notation?
Since you can put "parent.child" dotted notation and you don't have to put parent["child"] which is also valid and useful, I'd say both ways is technically acceptable. The parsers all should do both ways just fine. If your parser does not need quotes on keys then it's probably better not to put them (saves space). It makes sense to call them strings because that is what they are, and since the square brackets gives you the ability to use values for keys essentially it makes perfect sense not to.
In Json you can put...
>var keyName = "someKey";
>var obj = {[keyName]:"someValue"};
>obj
Object {someKey: "someValue"}
just fine without issues, if you need a value for a key and none quoted won't work, so if it doesn't, you can't, so you won't so "you don't need quotes on keys". Even if it's right to say they are technically strings. Logic and usage argue otherwise. Nor does it officially output Object {"someKey": "someValue"} for obj in our example run from the console of any browser.