YAML 1.2 is (with one minor caveat regarding duplicate keys) a superset of JSON, so any valid JSON file is also a valid YAML file. However, the YAML 1.1 specification (which has the most library support) doesn't mention JSON. Most valid JSON files are valid YAML 1.1 files, but I found at least one exception by experimenting with PyYaml and Python's standard JSON library:
a double-precision floating-point overflow such as 12345e999 is interpreted as a string by PyYAML and IEEE infinity by Python's JSON library.
Does anyone have a complete list of differences, determined more robustly than by testing edge cases in a particular implementation? (That is, from a comparison of the specifications?) For example, I want to generate JSON strings that will be interpreted the same way by a JSON parser and a YAML 1.1 parser: what constraints must I place on my strings?
See here (specifically footnote 25). It says:
The incompatibilities were as follows: JSON allows extended character
sets like UTF-32 and had incompatible unicode character escape syntax
relative to YAML; YAML required a space after separators like comma,
equals, and colon while JSON does not. Some non-standard
implementations of JSON extend the grammar to include Javascript's
/*...*/ comments. Handling such edge cases may require light
pre-processing of the JSON before parsing as in-line YAML
See also https://metacpan.org/pod/JSON::XS#JSON-and-YAML
Related
What is the difference between YAML and JSON? When to prefer one over the other
As you noticed, one thing is what the specifications say the other what commonly available parsers (both YAML and JSON) process. You should therefore take several aspects into account and use the least common denominator to not be able to load your JSON with a YAML parser.
On the JSON side there are multiple standards and best practises. Originally a JSON text would have to have an object or array at the topmost level. This is still so according to the fail1.json files available on the json.org site:
"A JSON payload should be an object or array, not a string."
According to RFC7159 any value can be at the top level (apart from using a string, this leads to rather boring JSON files):
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array. Implementations that generate only objects or arrays where a
JSON text is called for will be interoperable in the sense that all
implementations will accept these as conforming JSON texts.
Because of the problems with JSON hijacking *by redefining the array handing in older browsers) there have been implementations that only accept an object at the top level (i.e. the first character of the file has to be {.
On the YAML side there are fewer competing standards than with JSON, but things get muddled by the persistent usage of YAML 1.1, and is not helped by the fact that if you google for "yaml current spec" the first hit is yaml.org/spec/current.html and that is actually an old working-draft for YAML 1.1
Apart from the UTF-32 support the other answer mentioned, which is largely a non-issue in a world using UTF-8 almost exclusively, there are a few things to take into account, especially if you want PyYAML to to be able to parse your JSON (PyYAML still implements most of YAML 1.1 only, close to eight years after the YAML 1.2 spec release):
numbers in JSON don't need a dot in the mantissa, even if such a number has an exponent:
but the Floating-Point Language-Independent Type for YAML™ Version 1.1 does require that dot:
|[-]?0\.([0-9]*[1-9])?e[-+](0|[1-9][0-9]+) (scientific)
^--- no ? or * associated with this dot
(in the YAML 1.2 spec this regex has changed to:
-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.
allowing the dot to disappear even if there is an e (and no E) and exponent.
This is the cause for your 12345e999 being handled differently by JSON (overflow) and PyYAML (string). In YAML 1.1 this can only be interpreted as a string and hence doesn't need quotes and can be plain scalar.
In YAML 1.1 there are escape sequences, but this is not a superset from what JSON supports. The forward slash (/) can be escaped in JSON, but not in YAML 1.1 (it can in YAML 1.2, rule 53)
In JSON as well as in YAML 1.1 you can use \uNNNN to indicate a 16 bit unicode code point. Although the YAML 1.1 spec (and YAML 1.2) mentions surrogate pairs in conjunction with using UTF-16, nothing is mentioned about such pairs as escaped sequences ("\uD834\uDD1E"). This string sequence is explicitly mentioned in RFC 7159 as representing the G clef character (U+1D11E). I don't know of any YAML parser that support this, PyYAML throws a:
yaml.reader.ReaderError: unacceptable character #xd834: special characters are not allowed
So as long as you write your JSON
as UTF-8
with the top-level being an object
scientific numbers always with a dot
no \/ escape sequence
no \uNNNN characters between \uD7FF and \uE000 (exclusive), nor \uFFFE, nor \uFFFF
you should be fine for both JSON and YAML (1.1) parsers.
¹ In ruamel.yaml a YAML 1.2 parser of which I am the author, the \/ and scientific numbers without dot are handled correctly: your 12345e999 loads as type float and prints as inf.
Related
I've got a project that gets metadata from Minecraft mods, and I'm having some trouble with the Minecraft Forge's old mcmod.info format - which is a JSON format read with GSON for those that don't know.
Specifically, GSON unfortunately allows for strings to be multi-line (it allows for unescaped newlines in a string) - which Go's encoding/json doesn't allow for. See the below example from the Chisel mod to see what I mean.
[{
"credits": "AUTOMATIC_MAIDEN for the original mod,
asie for porting to 1.7.2,
and Pokenfenn/Cricket for continuing it in 1.7.
This mod uses textures from the Painterly Pack: http://painterlypack.net/."
}]
This results in an error of invalid character '\n' in string literal.
I did take a brief look at using an alternative JSON parser (the aptly-named jsonparser specifically took my eye), but without testing them all - I've been unable to determine which, if any, support what I need.
I suspect the solution to this problem will be in using an alternative JSON parser, I'm just not aware enough of the available libraries or JSON's use in Golang to make a highly informed decision.
I'm confused what free-form JSON means. Does it mean you can put any types as values in {}?
I thought all JSON are like that (i.e. {'hi':123, 'abc':{'d':[1,2,3],'e':'yay'}}).
So what does free-form JSON mean. Is this a redundant term? Is there such thing as non-free-form JSON?
What you have just posted looks like a JavaScript object, but it is not valid JSON. JSON has some very strict rules, such as all strings must be in double quotes. See JSON.org
There's no such thing as "free-form"/"non-free-form" JSON as every valid JSON-string must be formatted according to the specific rules mentioned on JSON.org.
It's possible that "free-form" JSON is JSON that has been generated "by hand" meaning that you typed it out manually (error prone). Most languages either have built-in methods, or libraries that are available, that turn native data structures (like multi-dimensional arrays) into valid JSON.
PHP, for instance, has a native method called json_encode.
I am unable to find a reference to this error exactly, but YAML 1.2 says it's a JSON superset, and if I use tab characters in a JSON it treats it like an error.
e.g.
"root": {
"key": "value"
}
(Online validation here says that '\t' that cannot start any token)
I know why YAML historically disallows tabs, but how can I interpret this in the context of JSON-superset?
(e.g. Is YAML not an actual superset or does JSON also disallow tabs? Or the spec does allow for tabs in this case but the implementation is not there yet?)
Thanks.
Tabs ARE allowed in YAML, but only where indentation does not apply.
According to YAML 1.2 Section 5.5:
YAML recognizes two white space characters: space and tab.
The following examples will use · to denote spaces and → to denote tabs. All examples can be validated using the official YAML Reference Parser.
YAML has a block style and flow style. In block style, indentation determines the structure of a document. The following document uses block style.
root:
··key: value
Validate
In flow style, special characters indicate the structure of the document. The following equivalent document uses flow style.
{
→ root: {
→ → key: value
→ }
}
Validate
You can even mix indentation in flow style.
{
→ root: {
··→ key: value
····}
}
Validate
If you're mixing block and flow style, the entire flow style part must respect the block style indentation.
root:
··{
····key: value
··}
Validate
But you can still mix your indentation within the flow style part.
root:
··{
··→ key: value
··}
Validate
If you have a single value document, you can surround the value with all manner of whitespace.
→ ··value··→
Validate
The point is, every JSON document that is parsed as YAML will put the document into flow style (because of the initial { or [ character) which supports tabs, unless it is a single value JSON document, in which case YAML still allows padding with whitespace.
If a YAML parser throws because of tabs in a JSON document, then it is not a valid parser.
That being said, your example is failing because a block style mapping value must always be indented if it's not on the same line as the mapping name.
root: {
··key: value
}
is not valid, however
root:
··{
····key: value
··}
is valid, and
root: { key: value }
is also valid.
I know why YAML historically disallows tabs, but how can I interpret this in the context of JSON-superset?
Taking the rest of the specifications into account, we can only conclude that the "superset" comment is inaccurate. The YAML specification is fundamentally inconsistent in the Relation to JSON section:
YAML can therefore be viewed as a natural superset of JSON, offering
improved human readability and a more complete information model. This
is also the case in practice; every JSON file is also a valid YAML
file. This makes it easy to migrate from JSON to YAML if/when the
additional features are required.
JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique,
while YAML insists they “MUST” be. Technically, YAML therefore
complies with the JSON spec, choosing to treat duplicates as an error.
In practice, since JSON is silent on the semantics of such duplicates,
the only portable JSON files are those with unique keys, which are
therefore valid YAML files.
Despite asserting YAML as a "natural superset of JSON" and stating that "every JSON file is also a valid YAML file", the spec immediately notes some differences regarding key uniqueness. Arguably, the spec should also note the differences around using tabs for indentation here as well.
Speaking of which, as the validator implied, YAML explicitly prohibits tabs as indentation characters:
To maintain portability, tab characters must not be used in
indentation, since different systems treat tabs differently. Note that
most modern editors may be configured so that pressing the tab key
results in the insertion of an appropriate number of spaces.
This is, of course, stricter than the JSON specification, which simply states:
Whitespace can be inserted between any pair of tokens.
So, to directly answer your questions...
(e.g. Is YAML not an actual superset or does JSON also disallow tabs? Or the spec does allow for tabs in this case but the implementation is not there yet?)
...YAML is not actually a superset, JSON does not disallow tabs, whereas the YAML specification does indeed disallow tabs explicitly.
According to specification tabs were never allowed. So, when JSON is used inside YAML, it is not allowing tabs.
The problem occurs when we think JSON as a pure subset of YAML. But it is not, according to Relation to JSON section in specification, there are some little things, that keeps json from being a pure subset of YAML.
If we are to address those dissimilarities, what we will need is something like YSON, which is also mentioned in the spec.
But fortunately there are some YAML engines that support tabs as indentations. Snakeyml is an example for that.
Is there a standard or specification which defines json file extensions?
I've seen .json used - is this just a commonly accepted practice or is it a requirement of some standards body for json saved in file format?
According to Douglas Crockford's draft of the JSON format found here:
"A JSON parser transforms a JSON text
into another representation. A
JSON parser MUST accept all texts that
conform to the JSON grammar. A JSON
parser MAY accept non-JSON forms or
extensions."
So, it's just a commonly-accepted practice; as long as your file conforms to the JSON grammar the extension doesn't necessarily need to be *.json (although it can certainly be helpful to you and other developers if it is).
{ members: [
[
{
c1: [{fft: 5,v: 'asdead#asdas.com'}],
c2: [{fft: 9,v: 'tst'}],
c3: [{sft: 1,v: 'Corporate Member'}]},
{
c1: [{fft: 5,v: 'asdk#asda.com'}],
c2: [{fft: 9,v: 'asd'}],
c3: [{sft: 1,v: 'Company'}]}
...etc
What is this JSON format? The full version is here.
It just doesn't look like any other JSON I've seen. I would be very thankful for a pointer in the right direction to parse this. So long as it's not just regex it, which I'm sure is possible but not something I can accomplish.
This appears to be the result of an ASP .NET web service based on the .asmx in the URL. What looks non-standard to me (based on the http://www.json.org/ definition) is the lack of double-quotes around the keys, and single-quotes instead of double-quotes wrapping the string values. E.g. v: 'asdk#asda.com' should be "v": "asdk#asda.com". I believe this is object literal notation of JavaScript (http://www.dyn-web.com/tutorials/obj_lit.php) rather than strict JSON, which is itself a subset of object literal notation.
How you choose to parse it could depend on what language/platform constraints you have, but I believe JavaScript will handle it. For an example, see this JSON/JavaScript code on Google Code Playground: http://code.google.com/apis/ajax/playground/#json_data_table. It constructs a JSON object using object literal notation for its visualization service.
Judging by this question and its followup on the Wild Apricot forums, you're poking at an undocumented tool primarily intended for internal use. Your best bet is to leave it alone. Your second-best bet is to hack at an existing parser in whatever language you are handling this with so that the parser tolerates unquoted keys.
You would probably be best off using a standard JSON library to parse it. A full list, organized by platform, is available at the json.org site.
That's not JSON. It actually looks like a lua source code encoding of the data. But if it is undocumented, it could be anything, so you're probably not going to be able to handle it reliably.