What is the format of this data? - json

I'm sorry if this is really trivial, but I've got a series of data as follows:
{"color":{"red":255,"blue":123,"green",1}}
I know it's in this format because, for some reason, it's easy to work with. What is this format called so that I might look it up?
\edit: If there is any significance to the organization of the data, of course.

That's JSON, a serialised text data storage based on a subset of JavaScript Object Notation. To learn more about JSON, vist: http://json.org
In JSON, there are the following data types:
object
array
number
string
null
boolean
Objects are represented using the following syntax, and are key-value pairings, similar to a dictionary (the key must be a string):
{ "number": 1, "string": "test" }
Like dictionaries, objects are unordered.
An array is a ordered, heterogeneous data structure, represented using the following syntax:
[0, true, false, "1", null]
Numbers are what you'd expect, however unlike JavaScript itself they cannot be Infinity or NaN (i.e. they must be decimals or integers) and contain no leading 0s. Exponents are represented using the following format (the e is not case sensitive):
10e6
where 10 is the base and 6 is the exponent - this is equivalent to 1000000 (1 million). The exponent section may have leading 0s, though there is not much point and may lower compatibility with parsers which are not 100% compliant.
Booleans are case sensitive and are both lowercase. In JSON, there are only two booleans:
true
false
To represent an intentionally left out or otherwise unknown field, use null (case sensitive too).
Strings must be delimited using double quotes (single quotes are invalid syntax), and single quotes need not be escaped.
"This string is valid, and it's alright to do this."
'No, this won't work'
'Nor will this.'
There are numerous escapes available using the backslash character - to use a literal backslash, use \\.
As JSON is a data transmission format, there is no syntax for comments available.

Related

ConvertFrom-Json converting lowercase e's into capital case (sometimes)

I'm processing JSON files in PowerShell, and it seems that ConvertFrom-Json changes case on its inputs only on some (rare) occasions.
For example, when I do:
$JsonStringSrc = '{"x":2.2737367544323206e-13,"y":1759,"z":33000,"width":664}'
$JsonStringTarget = $JsonStringSrc | ConvertFrom-Json | ConvertTo-Json -Depth 100 -Compress
$JsonStringTarget
It returns:
{"x":2.2737367544323206E-13,"y":1759,"z":33000,"width":664}
Lower case e became an uppercase E, messing up my hashes when validating proper i/o during processing.
Is this expected behavior (perhaps a regional setting)? Is there a setting for ConvertFrom-Json to leave my inputs alone for the output?
The problem lies in the way PowerShell's JSON library output the CLR foating point numbers. By converting from JSON you turn the JSON string into a CLR/PowerShell object with associated types for numbers and strings and such. Converting back to JSON serializes that object back to JSON, but uses the .NET default formatter configuration to do so. There is no metadata from the original JSON document to aid the conversion. Rounding errors and truncation, different order for elements may happen here too.
The JSON spec for canonical form (the form you want to use when hashing) is as follows:
MUST represent all non-integer numbers in exponential notation
including a nonzero single-digit significant integer part, and
including a nonempty significant fractional part, and
including no trailing zeroes in the significant fractional part (other than as part of a “.0” required to satisfy the preceding point), and
including a capital “E”, and
including no plus sign in the exponent, and
including no insignificant leading zeroes in the exponent
Source: https://gibson042.github.io/canonicaljson-spec/
Though the specs for JSON supports both options (e and E).
exponent
""
'E' sign digits
'e' sign digits
Source: https://www.crockford.com/mckeeman.html
You may be able to convert the object to JSON using the Newtonsoft.Json classes directly and passing in a custom Convertor.
https://stackoverflow.com/a/28743082/736079
A better solution would probably be to use a specialized formatter component that directly manipulates the existing JSON document without converting it to CLR objects first.

Are both comma and colon redundant for JSON parser?

JSON I mentioned below is valid JSON.
I finished writing a parser of JSON which allowing only two basic data types of String and Object. Let me show what parser does in case of any ambiguity.
parse("{ "Mon": "weekday", "Tue": "weekday", "Sun": "weekend" }").get("Sun");//return value: "weekend"
parse("{ "weekday" : { "Mon": "1", "Tue": "2"} }").get("weekday").get("Mon");//return value: "1"
Function parse returns a dictionary from which we can get what we want.
I found that I didn't use any commas or colons to parse JSON, then I guess those notations may be also redundant for a full-data-type-supported JSON parser, is that true? If it is, they are for readability, right?
PS: what if it's invalid JSON? Same answer?
According to RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), the colon and comma are listed as name-separator and value-separator respectively.
See under section 2. JSON Grammar:
These are the six structural characters:
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
So, they are both valid JSON separators with specific uses.
Refer section 9. Parsers:
A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.
An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range and precision
of numbers. An implementation may set limits on the length and
character contents of strings.
From the Parsers section, one can gather that there's no mention of skipping (ignoring) colon and/or comma because then the parser in question would not be conforming to JSON grammar.
Summing up, from the above sections, it is safe to say that any such decision of ignoring the JSON grammar would certainly be completely subjective implying that such parser is not conforming to the grammar.
So, that answers the question that the colon or comma are not redundant and they are essential part of the JSON grammar.
Hope that helps!
Json is a subset of JavaScript syntax. It's very small subset, and so not all of the punctuation is necessary. But it is necessary in full expression syntax, because in many cases you cannot know where one expression in a list ends and the next one starts, unless there is a comma between them.
(There are alternatives to commas, of course. Lisp S-expressions don't need commas, as Ira Baxter points out, but they use more parentheses, which many people find noisier than commas.)
So as long as you consider it important to be able to insert JSON into a JavaScript text, you need to keep the JavaScript form, commas and colons and all.
One important aspect of JSON is that correct JSON is safe. You cannot insert untested JSON into an executable string, of course. That would be insane. But a JSON parser should validate its input, and validated JSON is safe to the ninect into code. If your parser lets you leave out the commas, that would no longer be rhe case.

Can JSON start with "["?

From what I can read on json.org, all JSON strings should start with { (curly brace), and [ characters (square brackets) represent an array element in JSON.
I use the json4j library, and I got an input that starts with [, so I didn't think this was valid JSON. I looked briefly at the JSON schema, but I couldn't really find it stated that a JSON file cannot start with [, or that it can only start with {.
JSON can be either an array or an object. Specifically off of json.org:
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record,
struct, dictionary, hash table,
keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an
array, vector, list, or sequence.
It then goes on to describe the two structures as:
Note that the starting and ending characters are curly brackets and square brackets respectively.
Edit
And from here: http://www.ietf.org/rfc/rfc4627.txt
A JSON text is a sequence of tokens.
The set of tokens includes six
structural characters, strings,
numbers, and three literal names.
A JSON text is a serialized object or array.
Update (2014)
As of March 2014, there is a new JSON RFC (7159) that modifies the definition slightly (see pages 4/5).
The definition per RFC 4627 was: JSON-text = object / array
This has been changed in RFC 7159 to: JSON-text = ws value ws
Where ws represents whitespace and value is defined as follows:
A JSON value MUST be an object, array, number, or string, or one of
the following three literal names:
false null true
So, the answer to the question is still yes, JSON text can start with a square bracket (i.e. an array). But in addition to objects and arrays, it can now also be a number, string or the values false, null or true.
Also, this has changed from my previous RFC 4627 quote (emphasis added):
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array. Implementations that generate only objects or arrays where a
JSON text is called for will be interoperable in the sense that all
implementations will accept these as conforming JSON texts.
If the string you are parsing begins with a left brace ([) you can use JSONArray.parse to get back a JSONArray object and then you can use get(i) where i is an index from 0 through the returned JSONArray's size()-1.
import java.io.IOException;
import com.ibm.json.java.JSONArray;
import com.ibm.json.java.JSONObject;
public class BookListTest {
public static void main(String[] args) {
String jsonBookList = "{\"book_list\":{\"book\":[{\"title\":\"title 1\"},{\"title\":\"title 2\"}]}}";
Object book_list;
try {
book_list = JSONObject.parse(jsonBookList);
System.out.println(book_list);
Object bookList = JSONObject.parse(book_list.toString()).get("book_list");
System.out.println(bookList);
Object books = JSONObject.parse(bookList.toString()).get("book");
System.out.println(books);
JSONArray bookArray = JSONArray.parse(books.toString());
for (Object book : bookArray) {
System.out.println(book);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Which produced output like:
{"book_list":{"book":[{"title":"title 1"},{"title":"title 2"}]}}
{"book":[{"title":"title 1"},{"title":"title 2"}]}
[{"title":"title 1"}, {"title":"title 2"}]
{"title":"title 1"}
{"title":"title 2"}
Note: if you attempted to call JSONObject.parse(books.toString()); you would get the error you encountered:
java.io.IOException: Expecting '{' on line 1, column 2 instead, obtained token: 'Token: ['
JSON.ORG WEBSITE SAYS ....
https://www.json.org/
The site clearly states the following:
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most languages, this is realized as
an array, vector, list, or sequence.
These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.
In JSON, they take on these forms:
OBJECT:
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
{string: value, string: value}
ARRAY:
An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
[value, value, value ….]
VALUE:
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
STRING:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.
NUMBER:
A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
ABOUT WHITESPACE:
Whitespace can be inserted between any pair of tokens. Excepting a few encoding details, that completely describes the language.
Short answer is YES
In a .json file you can put Numbers (even just 10), Strings (even just "hello"), Booleans (true, false), Null (even just null), arrays and objects.
https://www.json.org/json-en.html
Using just Numbers, Strings, Booleans and Null are not logical because in .jon files we use more complicated structured data like arrays and object (mostly mix nested versions).
Below you can find a sample JSON data with array of object and start with "["
https://jsonplaceholder.typicode.com/posts

mochijson2 or mochijson

I'm encoding some data using mochijson2.
But I found that it behaves strange on strings as lists.
Example:
mochijson2:encode("foo").
[91,"102",44,"111",44,"111",93]
Where "102", "111", "111" are $f, $o, $o encoded as strings
44 are commas and 91 and 93 are square brakets.
Of course if I output this somewhere I'll get string "[102,111,111]" which is obviously not that what I what.
If i try
mochijson2:encode(<<"foo">>).
[34,<<"foo">>,34]
So I again i get a list of two doublequotes and binary part within which can be translated to binary with list_to_binary/1
Here is the question - why is it so inconsistent. I understand that there is a problem distingushing erlang list that should be encoded as json array and erlang string which should be encoded as json string, but at least can it output binary when i pass it binary?
And the second question:
Looks like mochijson outputs everything nice (cause it uses special tuple to designate arrays {array, ...})
mochijson:encode(<<"foo">>).
"\"foo\""
What's the difference between mochijson2 and mochijson? Performance? Unicode handling? Anything else?
Thanks
My guess is that the decision in mochijson is that it treats a binary as a string, and it treats a list of integers as a list of integers. (Un?)fortunately strings in Erlang are in fact a list of integers.
As a result your "foo", or in other words, your [102,111,111] is translated into text representing "[102,111,111]". In the second case your <<"foo">> string becomes "foo"
Regarding the second question, mochijson seems to always return a string, whereas mochijson2 returns an iodata type. Iodata is basically a recursive list of strings, binaries and iodatas (in fact iolists). If you only intend to send the result "through the wire", it is more efficient to just nest them in a list than convert them to a flat string.

Do the JSON keys have to be surrounded by quotes?

Example:
Is the following code valid against the JSON Spec?
{
precision: "zip"
}
Or should I always use the following syntax? (And if so, why?)
{
"precision": "zip"
}
I haven't really found something about this in the JSON specifications. Although they use quotes around their keys in their examples.
Yes, you need quotation marks. This is to make it simpler and to avoid having to have another escape method for javascript reserved keywords, ie {for:"foo"}.
You are correct to use strings as the key. Here is an excerpt from RFC 4627 - The application/json Media Type for JavaScript Object Notation (JSON)
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
object = begin-object [ member *( value-separator member ) ] end-object
member = string name-separator value
[...]
2.5. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. [...]
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
Read the whole RFC here.
From 2.2. Objects
An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string.
and from 2.5. Strings
A string begins and ends with quotation marks.
So I would say that according to the standard: yes, you should always quote the key (although some parsers may be more forgiving)
Yes, quotes are mandatory. http://json.org/ says:
string
""
" chars "
Not if you use JSON5
For regular JSON, yes keys must be quoted. But if you need otherwise, checkout widely used JSON5, which is so-named because is a superset of JSON that allows ES5 syntax, including:
unquoted property keys
single-quoted, escaped and multi-line strings
alternate number formats
comments
extra whitespace
The JSON5 reference implementation (json5 npm package) provides a JSON5 object that has parse and stringify methods with the same args and semantics as the built-in JSON object.
widely used, and depended on by many high profile projects
JSON5 was started in 2012, and as of 2022, now gets >65M downloads/week, ranks in the top 0.1% of the most depended-upon packages on npm, and has been adopted by major projects like Chromium, Next.js, Babel, Retool, WebStorm, and more. It's also natively supported on Apple platforms like MacOS and iOS.
~ json5.org homepage
In your situation, both of them are valid, meaning that both of them will work.
However, you still should use the one with quotation marks in the key names because it is more conventional, which leads to more simplicity and ability to have key names with white spaces etc.
Therefore, use the one with the quotation marks.
edit// check this: What is the difference between JSON and Object Literal Notation?
Since you can put "parent.child" dotted notation and you don't have to put parent["child"] which is also valid and useful, I'd say both ways is technically acceptable. The parsers all should do both ways just fine. If your parser does not need quotes on keys then it's probably better not to put them (saves space). It makes sense to call them strings because that is what they are, and since the square brackets gives you the ability to use values for keys essentially it makes perfect sense not to.
In Json you can put...
>var keyName = "someKey";
>var obj = {[keyName]:"someValue"};
>obj
Object {someKey: "someValue"}
just fine without issues, if you need a value for a key and none quoted won't work, so if it doesn't, you can't, so you won't so "you don't need quotes on keys". Even if it's right to say they are technically strings. Logic and usage argue otherwise. Nor does it officially output Object {"someKey": "someValue"} for obj in our example run from the console of any browser.