YAML as a JSON superset and TAB characters - json

I am unable to find a reference to this error exactly, but YAML 1.2 says it's a JSON superset, and if I use tab characters in a JSON it treats it like an error.
e.g.
"root": {
"key": "value"
}
(Online validation here says that '\t' that cannot start any token)
I know why YAML historically disallows tabs, but how can I interpret this in the context of JSON-superset?
(e.g. Is YAML not an actual superset or does JSON also disallow tabs? Or the spec does allow for tabs in this case but the implementation is not there yet?)
Thanks.

Tabs ARE allowed in YAML, but only where indentation does not apply.
According to YAML 1.2 Section 5.5:
YAML recognizes two white space characters: space and tab.
The following examples will use · to denote spaces and → to denote tabs. All examples can be validated using the official YAML Reference Parser.
YAML has a block style and flow style. In block style, indentation determines the structure of a document. The following document uses block style.
root:
··key: value
Validate
In flow style, special characters indicate the structure of the document. The following equivalent document uses flow style.
{
→ root: {
→ → key: value
→ }
}
Validate
You can even mix indentation in flow style.
{
→ root: {
··→ key: value
····}
}
Validate
If you're mixing block and flow style, the entire flow style part must respect the block style indentation.
root:
··{
····key: value
··}
Validate
But you can still mix your indentation within the flow style part.
root:
··{
··→ key: value
··}
Validate
If you have a single value document, you can surround the value with all manner of whitespace.
→ ··value··→
Validate
The point is, every JSON document that is parsed as YAML will put the document into flow style (because of the initial { or [ character) which supports tabs, unless it is a single value JSON document, in which case YAML still allows padding with whitespace.
If a YAML parser throws because of tabs in a JSON document, then it is not a valid parser.
That being said, your example is failing because a block style mapping value must always be indented if it's not on the same line as the mapping name.
root: {
··key: value
}
is not valid, however
root:
··{
····key: value
··}
is valid, and
root: { key: value }
is also valid.

I know why YAML historically disallows tabs, but how can I interpret this in the context of JSON-superset?
Taking the rest of the specifications into account, we can only conclude that the "superset" comment is inaccurate. The YAML specification is fundamentally inconsistent in the Relation to JSON section:
YAML can therefore be viewed as a natural superset of JSON, offering
improved human readability and a more complete information model. This
is also the case in practice; every JSON file is also a valid YAML
file. This makes it easy to migrate from JSON to YAML if/when the
additional features are required.
JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique,
while YAML insists they “MUST” be. Technically, YAML therefore
complies with the JSON spec, choosing to treat duplicates as an error.
In practice, since JSON is silent on the semantics of such duplicates,
the only portable JSON files are those with unique keys, which are
therefore valid YAML files.
Despite asserting YAML as a "natural superset of JSON" and stating that "every JSON file is also a valid YAML file", the spec immediately notes some differences regarding key uniqueness. Arguably, the spec should also note the differences around using tabs for indentation here as well.
Speaking of which, as the validator implied, YAML explicitly prohibits tabs as indentation characters:
To maintain portability, tab characters must not be used in
indentation, since different systems treat tabs differently. Note that
most modern editors may be configured so that pressing the tab key
results in the insertion of an appropriate number of spaces.
This is, of course, stricter than the JSON specification, which simply states:
Whitespace can be inserted between any pair of tokens.
So, to directly answer your questions...
(e.g. Is YAML not an actual superset or does JSON also disallow tabs? Or the spec does allow for tabs in this case but the implementation is not there yet?)
...YAML is not actually a superset, JSON does not disallow tabs, whereas the YAML specification does indeed disallow tabs explicitly.

According to specification tabs were never allowed. So, when JSON is used inside YAML, it is not allowing tabs.
The problem occurs when we think JSON as a pure subset of YAML. But it is not, according to Relation to JSON section in specification, there are some little things, that keeps json from being a pure subset of YAML.
If we are to address those dissimilarities, what we will need is something like YSON, which is also mentioned in the spec.
But fortunately there are some YAML engines that support tabs as indentations. Snakeyml is an example for that.

Related

Is there value/purpose in declaring a pattern ^(.*)$ for JSON properties of type string?

I'm learning REST webservices and I've been assigned the task of wrapping (creating a new JSON schema on top of) an existing REST API for which I have been given its JSON schema. The schema that I am wrapping specifies a "pattern": "^(.*)$" for properties (such as city or streetAddress) that are of "type": "string". The regex matches everything until a line terminator is encountered. I know that the REST API that I am wrapping in turn wraps a SOAP message (and may have been mechanically converted from SOAP to JSON - so I suspect a conversion artifact is at work here).
My question is, is this a typical pattern to apply to strings passed to and from webservice endpoints or is it's specificity redundant and unnecessary?
My thought is that the generation of this pattern within the JSON schema is an artifact of the automated conversion process and as such it would make sense to simplify my wrapper by omitting the "pattern": "^(.*)$".
I would make an informed guess that someone has previously taken a JSON instance, and used a tool to generate some or all of the JSON Schema files you are looking at.
I couldn't tell you why they have done this, but it seems pretty pointless.
It could be to make sure there are no line breaks in each of those fields, but I've also seen this in generated schemas more than a few times.

JSON REST endpoint returning / consuming JSON literals

Is it advisable or not in a RESTful web service to use JSON literal values (string / number) as input parameter in the payload or in the response body?
If I have an endpoint PUT /mytodolist is it OK for it to accept a JSON string literal value "Take out the rubbish" in the request payload (with Content-Type=application/json) or should it accept a JSON object instead ({"value":"Take out the rubbish"})?
Similarly, is it fine for GET /mytodolist/1 to return "Take out the rubbish" in the response body or should it return a proper JSON object {"value":"Take out the rubbish"}
Spring MVC to makes implementing and testing such endpoints easy, however clients have flagged this as non standard or hard to implement. In my point of view JSON literals are JSON, but not JSON objects, so I'd say it is fine. I have found no recommendations using Google.
EDIT 1: Clafirication
The question is entirely about the 'standard', if it allows this or not.
I understand the problem with the extensibility, but one can never design a fully extensible interface IMHO. If changes need to be done, we can try extending what we have in a backwards compatible way, but there will come a time when it becomes messy and an other approach is required - which is commonly handled by versioning the API in one way or another. I find it a fair point even though, because using literals as request/response body immediately becomes inextensible, while coming up with a reasonable one-attribute JSON object does not.
It is also understood that some frameworks have problems with handling JSON literals, this is the origin of this question. The tool I used happened to support this, so I thought this was all right, but the front-end library did not.
Still, what I am intending to find out right now, is if using JSON literals is according to the de-facto standard (even if it is a cornercase) or not.
I would recommend to use JSON object always. One reason is that for Content-Type application/json people expect something staring with "{" and not all frameworks will handle json literals properly. Second reason is that probably you will add some additional attributes to you list item (due date, category, priority, etc). And then you'll break backward compatibility, by adding new field.
It may be acceptable in the context of your example, but keep in mind that unambiguous interfaces are easier to use and that will encourage adoption.
For example, your interface could interpret "Take out he rubbish" as the same as {task:"take out the rubbish"}, but once you add additional properties (eg "when" or "who") the meaning of a solitary string in the request becomes ambiguous. It's inevitable that you'll add support for new properties as your interface matures.

What valid JSON files are not valid YAML 1.1 files?

YAML 1.2 is (with one minor caveat regarding duplicate keys) a superset of JSON, so any valid JSON file is also a valid YAML file. However, the YAML 1.1 specification (which has the most library support) doesn't mention JSON. Most valid JSON files are valid YAML 1.1 files, but I found at least one exception by experimenting with PyYaml and Python's standard JSON library:
a double-precision floating-point overflow such as 12345e999 is interpreted as a string by PyYAML and IEEE infinity by Python's JSON library.
Does anyone have a complete list of differences, determined more robustly than by testing edge cases in a particular implementation? (That is, from a comparison of the specifications?) For example, I want to generate JSON strings that will be interpreted the same way by a JSON parser and a YAML 1.1 parser: what constraints must I place on my strings?
See here (specifically footnote 25). It says:
The incompatibilities were as follows: JSON allows extended character
sets like UTF-32 and had incompatible unicode character escape syntax
relative to YAML; YAML required a space after separators like comma,
equals, and colon while JSON does not. Some non-standard
implementations of JSON extend the grammar to include Javascript's
/*...*/ comments. Handling such edge cases may require light
pre-processing of the JSON before parsing as in-line YAML
See also https://metacpan.org/pod/JSON::XS#JSON-and-YAML
Related
What is the difference between YAML and JSON? When to prefer one over the other
As you noticed, one thing is what the specifications say the other what commonly available parsers (both YAML and JSON) process. You should therefore take several aspects into account and use the least common denominator to not be able to load your JSON with a YAML parser.
On the JSON side there are multiple standards and best practises. Originally a JSON text would have to have an object or array at the topmost level. This is still so according to the fail1.json files available on the json.org site:
"A JSON payload should be an object or array, not a string."
According to RFC7159 any value can be at the top level (apart from using a string, this leads to rather boring JSON files):
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array. Implementations that generate only objects or arrays where a
JSON text is called for will be interoperable in the sense that all
implementations will accept these as conforming JSON texts.
Because of the problems with JSON hijacking *by redefining the array handing in older browsers) there have been implementations that only accept an object at the top level (i.e. the first character of the file has to be {.
On the YAML side there are fewer competing standards than with JSON, but things get muddled by the persistent usage of YAML 1.1, and is not helped by the fact that if you google for "yaml current spec" the first hit is yaml.org/spec/current.html and that is actually an old working-draft for YAML 1.1
Apart from the UTF-32 support the other answer mentioned, which is largely a non-issue in a world using UTF-8 almost exclusively, there are a few things to take into account, especially if you want PyYAML to to be able to parse your JSON (PyYAML still implements most of YAML 1.1 only, close to eight years after the YAML 1.2 spec release):
numbers in JSON don't need a dot in the mantissa, even if such a number has an exponent:
but the Floating-Point Language-Independent Type for YAML™ Version 1.1 does require that dot:
|[-]?0\.([0-9]*[1-9])?e[-+](0|[1-9][0-9]+) (scientific)
^--- no ? or * associated with this dot
(in the YAML 1.2 spec this regex has changed to:
-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.
allowing the dot to disappear even if there is an e (and no E) and exponent.
This is the cause for your 12345e999 being handled differently by JSON (overflow) and PyYAML (string). In YAML 1.1 this can only be interpreted as a string and hence doesn't need quotes and can be plain scalar.
In YAML 1.1 there are escape sequences, but this is not a superset from what JSON supports. The forward slash (/) can be escaped in JSON, but not in YAML 1.1 (it can in YAML 1.2, rule 53)
In JSON as well as in YAML 1.1 you can use \uNNNN to indicate a 16 bit unicode code point. Although the YAML 1.1 spec (and YAML 1.2) mentions surrogate pairs in conjunction with using UTF-16, nothing is mentioned about such pairs as escaped sequences ("\uD834\uDD1E"). This string sequence is explicitly mentioned in RFC 7159 as representing the G clef character (U+1D11E). I don't know of any YAML parser that support this, PyYAML throws a:
yaml.reader.ReaderError: unacceptable character #xd834: special characters are not allowed
So as long as you write your JSON
as UTF-8
with the top-level being an object
scientific numbers always with a dot
no \/ escape sequence
no \uNNNN characters between \uD7FF and \uE000 (exclusive), nor \uFFFE, nor \uFFFF
you should be fine for both JSON and YAML (1.1) parsers.
¹ In ruamel.yaml a YAML 1.2 parser of which I am the author, the \/ and scientific numbers without dot are handled correctly: your 12345e999 loads as type float and prints as inf.

OData Standard element names

I've been looking into the OData standard and would like to update my services to provide this standard and to consume it. I know that for the XML, it uses the ATOM XML standard so the names of the elements, such as id, title, author, etc. must be that exactly - conform to the Atom standard. The JSON format for OData has different names for their elements. Are those required to be that way or can I have my JSON structure use the same names for it's elements as the Atom XML structure?
An example is the link - in Atom it's called link, with an href and rel attribute. The JSON name for this element is __metadata with the key being uri. It seems like those names are standard and can't change. I'm wondering if the root elements, __metadata, resource, etc. are standard, but maybe the internal elements maybe more flexible. Like the title element for the Atom corresponds to the name element in the JSON structure, could I keep the JSON structure to use title instead of name?
The names mentioned in the started for JSON are part of the standard and can't be changed (otherwise clients won't be able to understand the payload OData JSON). This applies to pretty much anything starting with double underscore (so __metadata, __deferred and so on). Also the value of the _metadata property (or any other _ property) is defined by the standard and should exactly match the standard. There are couple of other places where the names are defined by the standard, just read through it.
The properties which are not defined by the standard are usually treated as OData properties, so those are defined by the model you're exposing through OData (they are the same as the elements under the m:properties element in OData ATOM). So those are somewhat customizable, by changing the OData model, but then you're changing both ATOM and JSON formats.

Can comments be used in JSON?

Can I use comments inside a JSON file? If so, how?
No.
JSON is data-only. If you include a comment, then it must be data too.
You could have a designated data element called "_comment" (or something) that should be ignored by apps that use the JSON data.
You would probably be better having the comment in the processes that generates/receives the JSON, as they are supposed to know what the JSON data will be in advance, or at least the structure of it.
But if you decided to:
{
"_comment": "comment text goes here...",
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
No, comments of the form //… or /*…*/ are not allowed in JSON. This answer is based on:
https://www.json.org
RFC 4627:
The application/json Media Type for JavaScript Object Notation (JSON)
RFC 8259 The JavaScript Object Notation (JSON) Data Interchange Format (supercedes RFCs 4627, 7158, 7159)
Include comments if you choose; strip them out with a minifier before parsing or transmitting.
I just released JSON.minify() which strips out comments and whitespace from a block of JSON and makes it valid JSON that can be parsed. So, you might use it like:
JSON.parse(JSON.minify(my_str));
When I released it, I got a huge backlash of people disagreeing with even the idea of it, so I decided that I'd write a comprehensive blog post on why comments make sense in JSON. It includes this notable comment from the creator of JSON:
Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser. - Douglas Crockford, 2012
Hopefully that's helpful to those who disagree with why JSON.minify() could be useful.
Comments were removed from JSON by design.
I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
Source: Public statement by Douglas Crockford on G+
JSON does not support comments. It was also never intended to be used for configuration files where comments would be needed.
Hjson is a configuration file format for humans. Relaxed syntax, fewer mistakes, more comments.
See hjson.github.io for JavaScript, Java, Python, PHP, Rust, Go, Ruby, C++ and C# libraries.
DISCLAIMER: YOUR WARRANTY IS VOID
As has been pointed out, this hack takes advantage of the implementation of the spec. Not all JSON parsers will understand this sort of JSON. Streaming parsers in particular will choke.
It's an interesting curiosity, but you should really not be using it for anything at all. Below is the original answer.
I've found a little hack that allows you to place comments in a JSON file that will not affect the parsing, or alter the data being represented in any way.
It appears that when declaring an object literal you can specify two values with the same key, and the last one takes precedence. Believe it or not, it turns out that JSON parsers work the same way. So we can use this to create comments in the source JSON that will not be present in a parsed object representation.
({a: 1, a: 2});
// => Object {a: 2}
Object.keys(JSON.parse('{"a": 1, "a": 2}')).length;
// => 1
If we apply this technique, your commented JSON file might look like this:
{
"api_host" : "The hostname of your API server. You may also specify the port.",
"api_host" : "hodorhodor.com",
"retry_interval" : "The interval in seconds between retrying failed API calls",
"retry_interval" : 10,
"auth_token" : "The authentication token. It is available in your developer dashboard under 'Settings'",
"auth_token" : "5ad0eb93697215bc0d48a7b69aa6fb8b",
"favorite_numbers": "An array containing my all-time favorite numbers",
"favorite_numbers": [19, 13, 53]
}
The above code is valid JSON. If you parse it, you'll get an object like this:
{
"api_host": "hodorhodor.com",
"retry_interval": 10,
"auth_token": "5ad0eb93697215bc0d48a7b69aa6fb8b",
"favorite_numbers": [19,13,53]
}
Which means there is no trace of the comments, and they won't have weird side-effects.
Happy hacking!
Consider using YAML. It's nearly a superset of JSON (virtually all valid JSON is valid YAML) and it allows comments.
You can't. At least that's my experience from a quick glance at json.org.
JSON has its syntax visualized on that page. There isn't any note about comments.
Comments are not an official standard, although some parsers support C++-style comments. One that I use is JsonCpp. In the examples there is this one:
// Configuration options
{
// Default encoding for text
"encoding" : "UTF-8",
// Plug-ins loaded at start-up
"plug-ins" : [
"python",
"c++",
"ruby"
],
// Tab indent size
"indent" : { "length" : 3, "use_space": true }
}
jsonlint does not validate this. So comments are a parser specific extension and not standard.
Another parser is JSON5.
An alternative to JSON TOML.
A further alternative is jsonc.
The latest version of nlohmann/json has optional support for ignoring comments on parsing.
Here is what I found in the Google Firebase documentation that allows you to put comments in JSON:
{
"//": "Some browsers will use this to enable push notifications.",
"//": "It is the same for all projects, this is not your project's sender ID",
"gcm_sender_id": "1234567890"
}
You should write a JSON schema instead. JSON schema is currently a proposed Internet draft specification. Besides documentation, the schema can also be used for validating your JSON data.
Example:
{
"description": "A person",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer",
"maximum": 125
}
}
}
You can provide documentation by using the description schema attribute.
If you are using Jackson as your JSON parser then this is how you enable it to allow comments:
ObjectMapper mapper = new ObjectMapper().configure(Feature.ALLOW_COMMENTS, true);
Then you can have comments like this:
{
key: "value" // Comment
}
And you can also have comments starting with # by setting:
mapper.configure(Feature.ALLOW_YAML_COMMENTS, true);
But in general (as answered before) the specification does not allow comments.
NO. JSON used to support comments but they were abused and removed from the standard.
From the creator of JSON:
I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't. - Douglas Crockford, 2012
The official JSON site is at JSON.org. JSON is defined as a standard by ECMA International. There is always a petition process to have standards revised. It is unlikely that annotations will be added to the JSON standard for several reasons.
JSON by design is an easily reverse-engineered (human parsed) alternative to XML. It is simplified even to the point that annotations are unnecessary. It is not even a markup language. The goal is stability and interoperablilty.
Anyone who understands the "has-a" relationship of object orientation can understand any JSON structure - that is the whole point. It is just a directed acyclic graph (DAG) with node tags (key/value pairs), which is a near universal data structure.
This only annotation required might be "//These are DAG tags". The key names can be as informative as required, allowing arbitrary semantic arity.
Any platform can parse JSON with just a few lines of code. XML requires complex OO libraries that are not viable on many platforms.
Annotations would just make JSON less interoperable. There is simply nothing else to add unless what you really need is a markup language (XML), and don't care if your persisted data is easily parsed.
BUT as the creator of JSON also observed, there has always been JS pipeline support for comments:
Go ahead and insert all the comments you like.
Then pipe it through JSMin before handing it to your JSON parser. - Douglas Crockford, 2012
If you are using the Newtonsoft.Json library with ASP.NET to read/deserialize you can use comments in the JSON content:
//"name": "string"
//"id": int
or
/* This is a
comment example */
PS: Single-line comments are only supported with 6+ versions of Newtonsoft Json.
Additional note for people who can't think out of the box: I use the JSON format for basic settings in an ASP.NET web application I made. I read the file, convert it into the settings object with the Newtonsoft library and use it when necessary.
I prefer writing comments about each individual setting in the JSON file itself, and I really don't care about the integrity of the JSON format as long as the library I use is OK with it.
I think this is an 'easier to use/understand' way than creating a separate 'settings.README' file and explaining the settings in it.
If you have a problem with this kind of usage; sorry, the genie is out of the lamp. People would find other usages for JSON format, and there is nothing you can do about it.
If your text file, which is a JSON string, is going to be read by some program, how difficult would it be to strip out either C or C++ style comments before using it?
Answer: It would be a one liner. If you do that then JSON files could be used as configuration files.
The idea behind JSON is to provide simple data exchange between applications. These are typically web based and the language is JavaScript.
It doesn't really allow for comments as such, however, passing a comment as one of the name/value pairs in the data would certainly work, although that data would obviously need to be ignored or handled specifically by the parsing code.
All that said, it's not the intention that the JSON file should contain comments in the traditional sense. It should just be the data.
Have a look at the JSON website for more detail.
JSON does not support comments natively, but you can make your own decoder or at least preprocessor to strip out comments, that's perfectly fine (as long as you just ignore comments and don't use them to guide how your application should process the JSON data).
JSON does not have comments. A JSON encoder MUST NOT output comments.
A JSON decoder MAY accept and ignore comments.
Comments should never be used to transmit anything meaningful. That is
what JSON is for.
Cf: Douglas Crockford, author of JSON spec.
I just encountering this for configuration files. I don't want to use XML (verbose, graphically, ugly, hard to read), or "ini" format (no hierarchy, no real standard, etc.) or Java "Properties" format (like .ini).
JSON can do all they can do, but it is way less verbose and more human readable - and parsers are easy and ubiquitous in many languages. It's just a tree of data. But out-of-band comments are a necessity often to document "default" configurations and the like. Configurations are never to be "full documents", but trees of saved data that can be human readable when needed.
I guess one could use "#": "comment", for "valid" JSON.
It depends on your JSON library. Json.NET supports JavaScript-style comments, /* commment */.
See another Stack Overflow question.
Yes, the new standard, JSON5 allows the C++ style comments, among many other extensions:
// A single line comment.
/* A multi-
line comment. */
The JSON5 Data Interchange Format (JSON5) is a superset of JSON that aims to alleviate some of the limitations of JSON. It is fully backwards compatible, and using it is probably better than writing the custom non standard parser, turning non standard features on for the existing one or using various hacks like string fields for commenting. Or, if the parser in use supports, simply agree we are using JSON 5 subset that is JSON and C++ style comments. It is much better than we tweak JSON standard the way we see fit.
There is already npm package, Python package, Java package and C library available. It is backwards compatible. I see no reason to stay with the "official" JSON restrictions.
I think that removing comments from JSON has been driven by the same reasons as removing the operator overloading in Java: can be used the wrong way yet some clearly legitimate use cases were overlooked. For operator overloading, it is matrix algebra and complex numbers. For JSON comments, its is configuration files and other documents that may be written, edited or read by humans and not just by parser.
JSON makes a lot of sense for config files and other local usage because it's ubiquitous and because it's much simpler than XML.
If people have strong reasons against having comments in JSON when communicating data (whether valid or not), then possibly JSON could be split into two:
JSON-COM: JSON on the wire, or rules that apply when communicating JSON data.
JSON-DOC: JSON document, or JSON in files or locally. Rules that define a valid JSON document.
JSON-DOC will allow comments, and other minor differences might exist such as handling whitespace. Parsers can easily convert from one spec to the other.
With regards to the remark made by Douglas Crockford on this issues (referenced by #Artur Czajka)
Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
We're talking about a generic config file issue (cross language/platform), and he's answering with a JS specific utility!
Sure a JSON specific minify can be implemented in any language,
but standardize this so it becomes ubiquitous across parsers in all languages and platforms so people stop wasting their time lacking the feature because they have good use-cases for it, looking the issue up in online forums, and getting people telling them it's a bad idea or suggesting it's easy to implement stripping comments out of text files.
The other issue is interoperability. Suppose you have a library or API or any kind of subsystem which has some config or data files associated with it. And this subsystem is
to be accessed from different languages. Then do you go about telling people: by the way
don't forget to strip out the comments from the JSON files before passing them to the parser!
If you use JSON5 you can include comments.
JSON5 is a proposed extension to JSON that aims to make it easier for humans to write and maintain by hand. It does this by adding some minimal syntax features directly from ECMAScript 5.
The Dojo Toolkit JavaScript toolkit (at least as of version 1.4), allows you to include comments in your JSON. The comments can be of /* */ format. Dojo Toolkit consumes the JSON via the dojo.xhrGet() call.
Other JavaScript toolkits may work similarly.
This can be helpful when experimenting with alternate data structures (or even data lists) before choosing a final option.
JSON is not a framed protocol. It is a language free format. So a comment's format is not defined for JSON.
As many people have suggested, there are some tricks, for example, duplicate keys or a specific key _comment that you can use. It's up to you.
Disclaimer: This is silly
There is actually a way to add comments, and stay within the specification (no additional parser needed). It will not result into human-readable comments without any sort of parsing though.
You could abuse the following:
Insignificant whitespace is allowed before or after any token.
Whitespace is any sequence of one or more of the following code
points: character tabulation (U+0009), line feed (U+000A), carriage
return (U+000D), and space (U+0020).
In a hacky way, you can abuse this to add a comment. For instance: start and end your comment with a tab. Encode the comment in base3 and use the other whitespace characters to represent them. For instance.
010212 010202 011000 011000 011010 001012 010122 010121 011021 010202 001012 011022 010212 011020 010202 010202
(hello base three in ASCII) But instead of 0 use space, for 1 use line feed and for 2 use carriage return.
This will just leave you with a lot of unreadable whitespace (unless you make an IDE plugin to encode/decode it on the fly).
I never even tried this, for obvious reasons and neither should you.
You can have comments in JSONP, but not in pure JSON. I've just spent an hour trying to make my program work with this example from Highcharts.
If you follow the link, you will see
?(/* AAPL historical OHLC data from the Google Finance API */
[
/* May 2006 */
[1147651200000,67.79],
[1147737600000,64.98],
...
[1368057600000,456.77],
[1368144000000,452.97]
]);
Since I had a similar file in my local folder, there were no issues with the Same-origin policy, so I decided to use pure JSON… and, of course, $.getJSON was failing silently because of the comments.
Eventually I just sent a manual HTTP request to the address above and realized that the content-type was text/javascript since, well, JSONP returns pure JavaScript. In this case comments are allowed. But my application returned content-type application/json, so I had to remove the comments.
JSON doesn't allow comments, per se. The reasoning is utterly foolish, because you can use JSON itself to create comments, which obviates the reasoning entirely, and loads the parser data space for no good reason at all for exactly the same result and potential issues, such as they are: a JSON file with comments.
If you try to put comments in (using // or /* */ or # for instance), then some parsers will fail because this is strictly not
within the JSON specification. So you should never do that.
Here, for instance, where my image manipulation system has saved image notations and some basic formatted (comment) information relating to them (at the bottom):
{
"Notations": [
{
"anchorX": 333,
"anchorY": 265,
"areaMode": "Ellipse",
"extentX": 356,
"extentY": 294,
"opacity": 0.5,
"text": "Elliptical area on top",
"textX": 333,
"textY": 265,
"title": "Notation 1"
},
{
"anchorX": 87,
"anchorY": 385,
"areaMode": "Rectangle",
"extentX": 109,
"extentY": 412,
"opacity": 0.5,
"text": "Rect area\non bottom",
"textX": 98,
"textY": 385,
"title": "Notation 2"
},
{
"anchorX": 69,
"anchorY": 104,
"areaMode": "Polygon",
"extentX": 102,
"extentY": 136,
"opacity": 0.5,
"pointList": [
{
"i": 0,
"x": 83,
"y": 104
},
{
"i": 1,
"x": 69,
"y": 136
},
{
"i": 2,
"x": 102,
"y": 132
},
{
"i": 3,
"x": 83,
"y": 104
}
],
"text": "Simple polygon",
"textX": 85,
"textY": 104,
"title": "Notation 3"
}
],
"imageXW": 512,
"imageYW": 512,
"imageName": "lena_std.ato",
"tinyDocs": {
"c01": "JSON image notation data:",
"c02": "-------------------------",
"c03": "",
"c04": "This data contains image notations and related area",
"c05": "selection information that provides a means for an",
"c06": "image gallery to display notations with elliptical,",
"c07": "rectangular, polygonal or freehand area indications",
"c08": "over an image displayed to a gallery visitor.",
"c09": "",
"c10": "X and Y positions are all in image space. The image",
"c11": "resolution is given as imageXW and imageYW, which",
"c12": "you use to scale the notation areas to their proper",
"c13": "locations and sizes for your display of the image,",
"c14": "regardless of scale.",
"c15": "",
"c16": "For Ellipses, anchor is the center of the ellipse,",
"c17": "and the extents are the X and Y radii respectively.",
"c18": "",
"c19": "For Rectangles, the anchor is the top left and the",
"c20": "extents are the bottom right.",
"c21": "",
"c22": "For Freehand and Polygon area modes, the pointList",
"c23": "contains a series of numbered XY points. If the area",
"c24": "is closed, the last point will be the same as the",
"c25": "first, so all you have to be concerned with is drawing",
"c26": "lines between the points in the list. Anchor and extent",
"c27": "are set to the top left and bottom right of the indicated",
"c28": "region, and can be used as a simplistic rectangular",
"c29": "detect for the mouse hover position over these types",
"c30": "of areas.",
"c31": "",
"c32": "The textx and texty positions provide basic positioning",
"c33": "information to help you locate the text information",
"c34": "in a reasonable location associated with the area",
"c35": "indication.",
"c36": "",
"c37": "Opacity is a value between 0 and 1, where .5 represents",
"c38": "a 50% opaque backdrop and 1.0 represents a fully opaque",
"c39": "backdrop. Recommendation is that regions be drawn",
"c40": "only if the user hovers the pointer over the image,",
"c41": "and that the text associated with the regions be drawn",
"c42": "only if the user hovers the pointer over the indicated",
"c43": "region."
}
}
This is a "can you" question. And here is a "yes" answer.
No, you shouldn't use duplicative object members to stuff side channel data into a JSON encoding. (See "The names within an object SHOULD be unique" in the RFC).
And yes, you could insert comments around the JSON, which you could parse out.
But if you want a way of inserting and extracting arbitrary side-channel data to a valid JSON, here is an answer. We take advantage of the non-unique representation of data in a JSON encoding. This is allowed* in section two of the RFC under "whitespace is allowed before or after any of the six structural characters".
*The RFC only states "whitespace is allowed before or after any of the six structural characters", not explicitly mentioning strings, numbers, "false", "true", and "null". This omission is ignored in ALL implementations.
First, canonicalize your JSON by minifying it:
$jsonMin = json_encode(json_decode($json));
Then encode your comment in binary:
$hex = unpack('H*', $comment);
$commentBinary = base_convert($hex[1], 16, 2);
Then steg your binary:
$steg = str_replace('0', ' ', $commentBinary);
$steg = str_replace('1', "\t", $steg);
Here is your output:
$jsonWithComment = $steg . $jsonMin;
In my case, I need to use comments for debug purposes just before the output of the JSON. So I put the debug information in the HTTP header, to avoid breaking the client:
header("My-Json-Comment: Yes, I know it's a workaround ;-) ");
We are using strip-json-comments for our project. It supports something like:
/*
* Description
*/
{
// rainbows
"unicorn": /* ❤ */ "cake"
}
Simply npm install --save strip-json-comments to install and use it like:
var strip_json_comments = require('strip-json-comments')
var json = '{/*rainbows*/"unicorn":"cake"}';
JSON.parse(strip_json_comments(json));
//=> {unicorn: 'cake'}