I'm wondering what is the difference between data parsing and transformation.
For example, if I need to convert data from XML format to JSON format will it be a transformation or parsing?
Transformation is a mapping from one form to another.
An XSLT transformation maps from XML to JSON, HTML, (different) XML, etc.
Parsing is an analysis of a sequential form to identify structural parts.
An XML parser reads XML and identifies its elements, attributes, and other parts.
Data conversion is fundamentally a transformation. Note, though, that transformations often leverage structure identified during parsing of the input form to create the output form.
Parsing technically is the process of establishing the logical structure of the textual input: for example establishing that <a b="3"/> represents an element named a containing an attribute named b whose value is 3.
Unfortunately the term seems to be increasingly misunderstood, and programmers without formal computer science training often misuse the term to mean almost any processing of the parsed data: we see questions on SO saying "I am writing a parser", when actually they are writing an application that consumes the output of a parser.
Converting XML to JSON is a three-stage process: parsing the XML, transforming the resulting data structure to a different data structure, and then serializing the transformed data structure into JSON syntax.
Related
Originally, JSON borrowed its syntax from JavaScript (object literals), but then became a programming language agnostic data interchange format. Its structures (string, array, object) can be mapped directly to primitive data types in most dynamic programming languages and vice versa.
Now, since it is no longer tied to JavaScript, what is the abstract data model of JSON today? In other words, if we compare XML with JSON, is there a XML Infoset equivalent for JSON?
Obviously, JSON is not the only format that can be used for serialization of JSON-like documents. Alternatives include YAML, BSON, and even XML. Is there a name for that unified data model and perhaps a formal specification available?
XML is more complicated that JSON format. Some common features that XML has and JSON lacks are: namespaces, attributes, comments. However, both formats can represent any kind of data, but potentially with a different structure logic.
What's the abstract data model of JSON ? The same as it was when it was created, nothing changed. JSON served as a data format for server-client communication. It was never tied to JavaScript, since it is just a formatted data string and not some kind of binary executable. Its format originates from javascript yes, but any language can interpret it with a text parser.
I am not sure what kind of information you are looking for, but the name of the process that converts language-specific structured data into strings and vise versa is called Serialization/Unserialization, but you already know these terms ...
"Unified data model", "formal specification", what are you even looking for ? Are you looking for principles of data formatting ? Data storing ? People need to store/transmit/present their data and they come up with ways to do it, there is nothing more to it.
I'm as new as can be to JSON. I understand that both JSON-LD and JSON Schema are used to validate JSON data. I, however, cannot find much information comparing and contrasting the two.
Which one is better?
Why use one over the other?
Advantages vs disadvantages?
Can these two even be compared?
Am I misunderstanding what JSON-LD and JSON Schema are?
JSON-LD's goal is to make JSON documents understandable by machines by linking it to well-defined vocabularies. It is not used to validate JSON data. JSON Schema is used for that purpose though. So you can't really compare the two.
Am I misunderstanding what JSON-LD and JSON Schema are?
About JSON-LD, yes; In JSON-LD document, JSON-LD is defined as:
a JSON-based format to serialize Linked Data
Can these two even be compared?
It would be better to compare JSON-LD with JSON (not JSON Schema). Or you could compare JSON Schema with the other encoding syntax schemas e.g. XML Schema.
Note: The following section is about the remaining questions (considering the difference between JSON-LD and JSON).
Choosing between JSON and JSON-LD depends on the situation and context of use. Generally, JSON is a markup language in the syntactic level where the data encoded is just machine-readable. But JSON-LD is being used to semantically markup the data, to make them become not only machine-readable, but also machine-understanable by providing additional syntax to JSON for serialization of Linked Data.
Recommended resource to understand the detailed differences between JSON-LD and JSON is the JSON-LD document published by W3C.
Following is my requirement:
Application A is creating a JSON based on its Java Beans and sending to my Application.
I have to take this JSON and convert it into XML (XSD for this is completely different than my JSON structure) and send to Application B.
Solution 1) I am currently converting this json to xml using json.org library.Then using Apache-xalan and XSL stylesheet, I am converting this to xml format as required by App B.
Solution 2) Converting this json to Java Bean (JB1).Then converting this JB1 to another Java Bean (JB2) as per the xml structure required by Application B.Then convert JB2 to XML for app B.
Solution 3) Using Apache Xalan and Xerces to parse through the input json and make the XML in Java itself without using XSL.
Which is better approach (in simplicity of code, throughput )? As JSON becomes more complex, is it easy to use solution 1 ? Please suggest if there is better approach other than these 3 ?
XSLT 3.0 offers a built-in json-to-xml() function. Once you have the XML, you can easily transform it to your required format. It is implemented in Saxon 9.7 (PE or higher) and I believe in Exselt.
Solution 1: Yes. This is the conventional and best path for both simple and complex JSON and simple or complex targeted XML.
Solution 2: No. There's no reason to introduce Java Beans as an intermediate form, especially if you have no other need for Java Beans. This option unnecessarily introduces transformational and marshalling complexity.
Solution 3: No. Neither Xalan nor Xerces are designed to parse JSON; they are designed to parse XML.
There are sample programs that will map a JSON document into an equivalent XML document and back; I wrote one as a demo for Liberty's support of json-p (javax.json), using an XML vocabulary I called JinX (JSON in XML). That could be used as a pre/post processor wrapped around XSLT, if desired.
Better solutions are possible -- redefine XSLT to operate on JSON trees, for example -- but would take a bit more work.
JSON is, pure-and-simple, "a communications protocol." In other words, "it specifically exists(!) to allow 'arbitrary (JavaScript) data structures' to be conveyed between some-client and some-host," over "the HTTP(S) protocol."
Therefore: "it is not(!) XML," and therefore must never be considered to be "appropriate input to XSLT!"
"Thou shalt not mix Apples and Oranges!"
If you wish to apply "XSLT" technologies to a "JSON-derived" input (which is, by definition, "a data structure ...") then you must first, and "by whatever suitable means," convert that data structure into XML.
Is JSON.stringify( ) equivalent to serialization or effectively serialization or is it just a necessary step towards
serialization?
In other words, is JSON.stringify( ) sufficient but not necessary for serialization? Or is necessary but not sufficient? Or is it neither necessary nor sufficient for serialization of JavaScript objects?
Serialization is the act of converting data into a format that can be written to disk or transmitted over the network (or written on paper if that's what you want). Usually, serialization is transforming objects to text but that's not necessary since there are several serialization formats such as bittorrent's bencoding and the old/ancient standard asn.1 formats which are binary.
JSON is one form of text-based serialization format and is currently very popular due to it's simplicity. It's not the only one though. Other popular formats include XML and CSV.
Due to its popularity and its origin as javascript object literal syntax ES5 introduced JSON.stringify() to generate a JSON string from an object. Previously you had to use libraries or write a recursive descent parser to do the job.
So, is JSON.stringify() enough for serialization? Yes, if the output format you want is JSON. No, if you want other output formats such as XML or CSV or bencode.
There are limitations to the JSON format. One limitation is that JSON cannot encode functions so JSON.stringify() ignores functions/methods when serializing. JSON also can't encode circular references. Most other serialization formats have this limitation as well but since JSON looks like javascript syntax some people assume it can do what javascript object literals can. It can't.
So the relationship between "JSON" and "serialization" is like the relationship between "Toyota Prius" and "car". JSON.stringify() is simply a function that generates JSON strings so I guess that would make it a Toyota factory.
Old question, but the following information may be useful for posterity.
Of course, you can serialise any way you want, including any number of custom methods, but JSON has become an increasingly popular method.
The most obvious benefit of JSON is that it represents objects in the same way that JavaScript object literals do, though it is slightly less flexible. Nevertheless, if you can represent normal data in JavaScript then JSON is a good match.
The most significant feature is that, since it represents objects as well as arrays, it can represent fairly complex & hierarchical data.
For one reason or another, JSON has more-or-less supplanted XML as the preferred serialisation for sending data between the server and browser. It is so useful that many languages include their own JSON functions (PHP, for example, has the better named json_encode & json_decode functions), as do some modern Databases. I myself have found it convenient to use JSON functions to store a more complex data structure in a single field of a database without JavaScript anywhere in sight).
The short answer is yes, for the most part it is a sufficient step to serializing most data (non-binary). It is not, however, necessary as there are alternatives.
Serializing binary data, on the other hand, now that’s another story …
Short answer... Serialize means the same thing as Stringify, IMHO.
If I were to store the same markup in 2 separate documents, one XML, the other JSON, in MarkLogic 6, does MarkLogic automatically convert the JSON equivalent to XML, and index it in that regard, or are both stored in their respective formats?
What I'm getting at is, does MarkLogic store ALL documents as XML, regardless, and simply apply JSON transformations to JSON documents when queried?
If documents are stored in native format, is there any advantage, in terms of performance, to storing documents in JSON over XML?
Below is an example code-snippet:
if($outputFormat="json") then (: result in json format :)
let $custom-config :=
let $config := json:config("custom")
return (map:put($config, "array-element-names",(xs:QName("lp:lesson_plan"),
xs:QName("lp:instructional_segment"),
xs:QName("lp:strand_type"),
xs:QName("lp:resource"),
xs:QName("lp:level"),
xs:QName("lp:discipline"),
xs:QName("lp:language"),
xs:QName("lp:program"),
xs:QName("lp:grade"),
xs:QName("res:strand_type"),
xs:QName("res:resource"),
xs:QName("res:ISBN"),
xs:QName("res:level"),
xs:QName("res:standard"),
xs:QName("res:secondaryURL"),
xs:QName("res:grade"),
xs:QName("res:keyword"))),
map:put($config, "whitespace","ignore"),
map:put($config, "text-value","value"),
$config)
return json:transform-to-json($finalResult, $custom-config)
else (: finalResult in xml format :)
$finalResult
MarkLogic is XML-native and does need to convert JSON to XML to store it in the database. There is a high-level JSON library to perform transformations. The main functions are json:transform-to-json and json:transform-from-json, and when configured correctly should provide lossless conversions.
I think the main difference from your example is whether you want to convert to XML using your own process or use MarkLogic's toolkit.
For more detailed information, see MarkLogic's docs:
http://docs.marklogic.com/guide/app-dev/json
On disk, MarkLogic stores highly compressed C++ data structures that represent hierarchical trees and corresponding indexes. (OK, that’s an over-simplification, but illustrative nonetheless.) There are two places where you as a developer will typically interact with those data structures: 1) building queries and application logic 2) deserializing/serializing data into and out of this internal data model. Today, MarkLogic uses the XML data model (XDM) for the latter and, correspondingly, XQuery, XPath, and XSLT for the former. We chose this stack for several reasons: XML is good at representing both text mark-up as well as data structures and the tooling around XML is mature and widespread.
Having said that, JSON has emerged as a popular serialization of hierarchical data structures—the “X” in AJAX. While we don't have the same watertight abstraction between JSON and MarkLogic’s internal data model today, we do provide a set of tools that allow you to efficiently and losslessly convert between JSON and the XML data model. Additionally, our REST and Java APIs allow you to store, retrieve, and even query tree structures that originated as JSON without having to think about this conversion step; the APIs handle this in the plumbing.
As for performance, there will be a little overhead converting between a JSON and XDM representation. However, I’d expect that to be negligible for most applications. The real benefits of XML will be in the expressiveness of XQuery, XPath, and XSLT in working with the data. There is no widespread equivalent to these in the JSON world today.
One footnote: The REST API (and thus the Java API wrapper around the REST API) provide a facade for the JSON conversion to XML -- that is, the APIs do the conversion to XML for you.
Usually, you don't need to think about the conversion except when you are creating range and geospatial indexes over the converted elements.
If you need to support JSON documents in your client, then the facade is convenient.
On the other hand, expressing the structure as JSON has no advantages for database operations and some limitations. (For instance, XML has the standards-based, baked atomic data types, schema validation, and server processing with XQuery or XSLT.) So, if you have complete control over the data structure, you might want to write it to the server as XML.
As of MarkLogic 8 (February 2015), JSON is now a native data type, just like XML. This eliminates the needs for a translation layer for applications that want to work exclusively in JSON. In addition, we’ve added JavaScript as a first-class language in the database itself (using Google’s V8 engine). This means that you can write stored procedures, triggers, and even full HTTP applications with JavaScript that runs in the database, close to the data.