Unified data model for JSON, BSON and YAML - json

Originally, JSON borrowed its syntax from JavaScript (object literals), but then became a programming language agnostic data interchange format. Its structures (string, array, object) can be mapped directly to primitive data types in most dynamic programming languages and vice versa.
Now, since it is no longer tied to JavaScript, what is the abstract data model of JSON today? In other words, if we compare XML with JSON, is there a XML Infoset equivalent for JSON?
Obviously, JSON is not the only format that can be used for serialization of JSON-like documents. Alternatives include YAML, BSON, and even XML. Is there a name for that unified data model and perhaps a formal specification available?

XML is more complicated that JSON format. Some common features that XML has and JSON lacks are: namespaces, attributes, comments. However, both formats can represent any kind of data, but potentially with a different structure logic.
What's the abstract data model of JSON ? The same as it was when it was created, nothing changed. JSON served as a data format for server-client communication. It was never tied to JavaScript, since it is just a formatted data string and not some kind of binary executable. Its format originates from javascript yes, but any language can interpret it with a text parser.
I am not sure what kind of information you are looking for, but the name of the process that converts language-specific structured data into strings and vise versa is called Serialization/Unserialization, but you already know these terms ...
"Unified data model", "formal specification", what are you even looking for ? Are you looking for principles of data formatting ? Data storing ? People need to store/transmit/present their data and they come up with ways to do it, there is nothing more to it.

Related

JSON parse with replaceable values

Is there currently any syntax to allow for replaceable values in a .json ?
If not I would propose to have this capability.
After all this is 2022 shouldn't we be able to populate values easily without have to write our own parsers?
simple example:
{
"%h" : "some value",
}
this would replace "%h" with the system specific hostname, and would be completely option, if the "parameter" does not exist, no change in the parse.
And if .json has a specific "parameter" syntax that is native to them, that's fine.
Just the idea, we could have this option, "%h", or "", whatever syntax .json would like.
The JSON standard does not allow for dynamic values. It is a very simple text representation format meant for encoding data interchange the standard publcation describes it like this:
JSON is a lightweight, text-based, language-independent syntax for defining data interchange formats. It was derived from the ECMAScript programming language, but is programming language independent. JSON defines a small set of structuring rules for the portable representation of structured data.
The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax. It also intentionally does not define how a valid JSON text might be internalized into the data structures of a programming language. There are many possible semantics that could be applied to the JSON syntax and many ways that a JSON text can be processed or mapped by a programming language. Meaningful interchange of information using JSON requires agreement among the involved parties on the specific semantics to be applied. Defining specific semantic interpretations of JSON is potentially a topic for other specifications. Similarly, language mappings of JSON can also be independently specified. For example, ECMA-262 defines mappings between valid JSON texts and ECMAScript’s runtime data structures.
Therefore there is no way to define a template value of some sort. You can achieve such functionality by processing your JSON separately but that would depend on the technology stack and the tools you use. However, it would only work for your project, not in any other third party projects.

JSON.stringify versus serialization

Is JSON.stringify( ) equivalent to serialization or effectively serialization or is it just a necessary step towards
serialization?
In other words, is JSON.stringify( ) sufficient but not necessary for serialization? Or is necessary but not sufficient? Or is it neither necessary nor sufficient for serialization of JavaScript objects?
Serialization is the act of converting data into a format that can be written to disk or transmitted over the network (or written on paper if that's what you want). Usually, serialization is transforming objects to text but that's not necessary since there are several serialization formats such as bittorrent's bencoding and the old/ancient standard asn.1 formats which are binary.
JSON is one form of text-based serialization format and is currently very popular due to it's simplicity. It's not the only one though. Other popular formats include XML and CSV.
Due to its popularity and its origin as javascript object literal syntax ES5 introduced JSON.stringify() to generate a JSON string from an object. Previously you had to use libraries or write a recursive descent parser to do the job.
So, is JSON.stringify() enough for serialization? Yes, if the output format you want is JSON. No, if you want other output formats such as XML or CSV or bencode.
There are limitations to the JSON format. One limitation is that JSON cannot encode functions so JSON.stringify() ignores functions/methods when serializing. JSON also can't encode circular references. Most other serialization formats have this limitation as well but since JSON looks like javascript syntax some people assume it can do what javascript object literals can. It can't.
So the relationship between "JSON" and "serialization" is like the relationship between "Toyota Prius" and "car". JSON.stringify() is simply a function that generates JSON strings so I guess that would make it a Toyota factory.
Old question, but the following information may be useful for posterity.
Of course, you can serialise any way you want, including any number of custom methods, but JSON has become an increasingly popular method.
The most obvious benefit of JSON is that it represents objects in the same way that JavaScript object literals do, though it is slightly less flexible. Nevertheless, if you can represent normal data in JavaScript then JSON is a good match.
The most significant feature is that, since it represents objects as well as arrays, it can represent fairly complex & hierarchical data.
For one reason or another, JSON has more-or-less supplanted XML as the preferred serialisation for sending data between the server and browser. It is so useful that many languages include their own JSON functions (PHP, for example, has the better named json_encode & json_decode functions), as do some modern Databases. I myself have found it convenient to use JSON functions to store a more complex data structure in a single field of a database without JavaScript anywhere in sight).
The short answer is yes, for the most part it is a sufficient step to serializing most data (non-binary). It is not, however, necessary as there are alternatives.
Serializing binary data, on the other hand, now that’s another story …
Short answer... Serialize means the same thing as Stringify, IMHO.

can json-ld be used to build a unique hash signature of a json object?

This is a near duplicate of How to reliably hash JavaScript objects?, where someone wants to reliably hash javascript objects ;
Now that the json-ld specification has been validated, I saw that there is a normalization procedure that they advertise as a potential way to normalize a json object :
normalize the data using the RDF Dataset normalization algorithm, and then dump the output to normalized NQuads format. The NQuads can then be processed via SHA-256, or similar algorithm, to get a deterministic hash of the contents of the Dataset.
Building a hash of a json object has always been a pain because something like
sha1(JSON.stringify(object))
does not work or is not guaranteed to work the same across implementations (the order of the keys is not defined of example).
Does json-ld work as advertized ? Is it safe to use it as universal json normalization procedure for hashing objects ? Can those objects be standard json objects or do they need some json-ld decorations (#context,..) to be normalized ?
Yes, normalization works with JSON-LD, but the objects do need to be given context (via the #context property) in order for them to produce any RDF. It is the RDF that is deterministically output in NQuads format (and that can then be hashed, for example).
If a property in a JSON-LD document is not defined via #context, then it will be dropped during processing. JSON-LD requires that you provide global meaning (semantics) to the properties in your document by associating them with URLs. These URLs may provide further machine-readable information about the meaning of the properties, their range, domain, etc. In this way data becomes "linked" -- you can both understand the meaning of a JSON document from one API in the context of another and you can traverse documents (via HTTP) to find more information.
So the short answer to the main question is "Yes, you can use JSON-LD normalization to build a unique hash for a JSON object", however, the caveat is that the JSON object must be a JSON-LD object, which really constitutes a subset of JSON. One of the main reasons for the invention of the normalization algorithm was for hashing and digitally-signing graphs (JSON-LD documents) for comparison.

Performance Advantages in Storing Documents as JSON in MarkLogic 6

If I were to store the same markup in 2 separate documents, one XML, the other JSON, in MarkLogic 6, does MarkLogic automatically convert the JSON equivalent to XML, and index it in that regard, or are both stored in their respective formats?
What I'm getting at is, does MarkLogic store ALL documents as XML, regardless, and simply apply JSON transformations to JSON documents when queried?
If documents are stored in native format, is there any advantage, in terms of performance, to storing documents in JSON over XML?
Below is an example code-snippet:
if($outputFormat="json") then (: result in json format :)
let $custom-config :=
let $config := json:config("custom")
return (map:put($config, "array-element-names",(xs:QName("lp:lesson_plan"),
xs:QName("lp:instructional_segment"),
xs:QName("lp:strand_type"),
xs:QName("lp:resource"),
xs:QName("lp:level"),
xs:QName("lp:discipline"),
xs:QName("lp:language"),
xs:QName("lp:program"),
xs:QName("lp:grade"),
xs:QName("res:strand_type"),
xs:QName("res:resource"),
xs:QName("res:ISBN"),
xs:QName("res:level"),
xs:QName("res:standard"),
xs:QName("res:secondaryURL"),
xs:QName("res:grade"),
xs:QName("res:keyword"))),
map:put($config, "whitespace","ignore"),
map:put($config, "text-value","value"),
$config)
return json:transform-to-json($finalResult, $custom-config)
else (: finalResult in xml format :)
$finalResult
MarkLogic is XML-native and does need to convert JSON to XML to store it in the database. There is a high-level JSON library to perform transformations. The main functions are json:transform-to-json and json:transform-from-json, and when configured correctly should provide lossless conversions.
I think the main difference from your example is whether you want to convert to XML using your own process or use MarkLogic's toolkit.
For more detailed information, see MarkLogic's docs:
http://docs.marklogic.com/guide/app-dev/json
On disk, MarkLogic stores highly compressed C++ data structures that represent hierarchical trees and corresponding indexes. (OK, that’s an over-simplification, but illustrative nonetheless.) There are two places where you as a developer will typically interact with those data structures: 1) building queries and application logic 2) deserializing/serializing data into and out of this internal data model. Today, MarkLogic uses the XML data model (XDM) for the latter and, correspondingly, XQuery, XPath, and XSLT for the former. We chose this stack for several reasons: XML is good at representing both text mark-up as well as data structures and the tooling around XML is mature and widespread.
Having said that, JSON has emerged as a popular serialization of hierarchical data structures—the “X” in AJAX. While we don't have the same watertight abstraction between JSON and MarkLogic’s internal data model today, we do provide a set of tools that allow you to efficiently and losslessly convert between JSON and the XML data model. Additionally, our REST and Java APIs allow you to store, retrieve, and even query tree structures that originated as JSON without having to think about this conversion step; the APIs handle this in the plumbing.
As for performance, there will be a little overhead converting between a JSON and XDM representation. However, I’d expect that to be negligible for most applications. The real benefits of XML will be in the expressiveness of XQuery, XPath, and XSLT in working with the data. There is no widespread equivalent to these in the JSON world today.
One footnote: The REST API (and thus the Java API wrapper around the REST API) provide a facade for the JSON conversion to XML -- that is, the APIs do the conversion to XML for you.
Usually, you don't need to think about the conversion except when you are creating range and geospatial indexes over the converted elements.
If you need to support JSON documents in your client, then the facade is convenient.
On the other hand, expressing the structure as JSON has no advantages for database operations and some limitations. (For instance, XML has the standards-based, baked atomic data types, schema validation, and server processing with XQuery or XSLT.) So, if you have complete control over the data structure, you might want to write it to the server as XML.
As of MarkLogic 8 (February 2015), JSON is now a native data type, just like XML. This eliminates the needs for a translation layer for applications that want to work exclusively in JSON. In addition, we’ve added JavaScript as a first-class language in the database itself (using Google’s V8 engine). This means that you can write stored procedures, triggers, and even full HTTP applications with JavaScript that runs in the database, close to the data.

ASN.1 vs JSON when is is appropriate to use them?

When is using ASN.1 preferable to using JSON? What are some advantages and disadvantages of both approaches?
ASN.1 and JSON aren't strictly comparable. JSON is a data format. ASN.1 is a schema language plus multiple sets of encoding rules, each of which produces different data formats for a given schema. So, the original question somewhat parallels the question "XML Schema vs. XML: when is it appropriate to use them?" A fairer comparison would be between ASN.1 and JSON Schema.
That said, a few points to consider:
ASN.1 has binary encoding rules. Consider whether binary or text encoding is preferable for your application.
ASN.1 also has XML and JSON encoding rules. You can opt to go with a text-based encoding using ASN.1, if you like.
ASN.1 allows other encoding rules to be developed. Before ITU-T specified encoding rules for JSON, we specified our own rules to encode ASN.1 to JSON. I blogged about this on our company website here
As with XML Schema, tools exist for compiling ASN.1. These are commonly referred to as data binding tools. The compiler output consists of data structures to hold your data, and code for encoding/decoding to/from the various encodings (binary, XML, JSON).
I am not sure what, if any, data binding tools exist for JSON Schema. I am also not sure how mature/stable JSON Schema is, whereas ASN.1 is quite mature and stable.
Choosing between JSON Schema and ASN.1, note that JSON Schema is bound to JSON, whereas ASN.1 is not bound to any particular representation.
You can use ASN.1 regardless of whether you need to serialize messages that might go to a recipient using C, C++, C#, Java, or any other programming language with ASN.1 encoder/decoder engine. ASN.1 also provides multiple encoding rules which have benefits under different circumstances. For example, DER is used when a canonical encoding is crucial, such as in digital certificates, while PER is used when bandwidth is critical such as in cellular protocols, and E-XER is used when you don't care about bandwidth and would like to display an encoding in XML for maniplulation in a browser or exchange messages with an XML Schema engine.
Note that with a good ASN.1 tool, you don't have to change you application code to switch between these ASN.1 encoding rules. A simple function call can select the encoding rules you would like to use.
Here can found a papper with a great study of JSON, XML, ASN.1, EXI and ProtoBuf