Potential problems of mapping JSON to XML - json

What are the major problems of mapping JSON to XML and viceversa? I have a set of problems that I can run into, but it would be very helpful if others can add what they have ran into when converting between both.
My list is:
Root object required in JSON
Unique keys (although only one of the two specifications requires this)
Keys cannot start with a number
Order may not be preserved (see http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html)
Any other one?

Disclaimer: I am the author of Jsonix, a XML<->JSON conversion library written in JavaScript. So I'm speaking a bit from experience of mapping between complex XML and JSON.
Top-level production in JSON may be JSONArray or JSONObject (in JSON interchange format even any JSONText - also null, boolean, string, number). XML requires a single root element.
JSON objects have properties, XML elements may have attributes, contain sub-elements and text values (I'm even leaving comments and PIs out).
You're mentioning "keys cannot start with a number", but there's more syntactical incompatibilities. JSON object properties can be basically any strings. XML element and attribute names are restricted in syntax.
Normally no namespaces in JSON, often namespaces in XML.
Strict typing. You always know JSON type just by looking at the value. In XML, you can't guess type from the value. For instance 1 may be string, boolean, a dozen of numeric types etc. You have to know the schema to know types.
In JSON, you can guess the structure from value (object or array). In XML, if you see an single element, you don't know if it may be repeated or not. You have to know the schema to know the structure.
Collections are normally expressed as arrays in JSON. In XML, you can express a collection as repeatable elements (item*), possibly wrapped (items/item*), or in case of simple types as list types (<items>a b c d</items>).
In XML, the order of sub-elements or text nodes of the element is significant. In JSON, properties of the JSONObject are not ordered. (You mention this.)
In XML, an element may contain several sub-elements of the same name. In JSONObject, property names will be unique. (You mention this.)
In XML, an element may contain attributes, sub-elements and text nodes. In JSON, the only complex structures are JSONObject and JSONArray. In JSONArray you just have items, no named components (which would be analogous to attributes or sub-elements). In JSONObject you just have properties (JSONMembers) which are always "named" (this would be analogous to attributes and sub-elements of XML, but not to text nodes).
Processing instructions and comments in XML, no direct analogs in JSON.
There's also xsi:type construct which is a bit hard to handle. Specifies the type of the element value in the document instance.
In XML, values of certain types (like QNames) depend on the declarations in other parts of the XML document. For example, having my:Element as xs:QName-value somewhere, this value will depend on how the my namespace prefix is declared in the document. Since namespaces may be declared and re-declared, you have to follow their declaraition quite precisely to be able to find out the namespace of the qualified name.

Converting a specific JSON object (or class of objects) into XML is usually no problem at all. What is difficult is writing a converter that can handle any JSON object. The problem essentially arises because you want simple JSON to end up as simple XML, but you find yourself contorting the design to handle edge cases, such as characters that are legal in JSON but not in XML, preserving distinctions such as the distinction between the number 10 and the string "10", or worrying about the best representation of a JSON "null".

Related

Does JSON to XML lose me anything?

We have a program that accepts as data XML, JSON, SQL, OData, etc. For the XML we use Saxon and its XPath support and that works fantastic.
For JSON we use the jsonPath library which is not as powerful as XPath 3.1. And jsonPath is a little squirrelly in some corner cases.
So... what if we convert the JSON we get to XML and then use Saxon? Are there limitations to that approach? Are there JSON constructs that won't convert to XML, like anonymous arrays?
The headline question: The json-to-xml() function in XPath 3.1 is lossless, except that by default, characters that are invalid in XML (such as NUL, or unpaired surrogates) are replaced by a SUB character -- you can change this behaviour with the option escape=true.
The losslessness has been achieved at some cost in convenience. For example, JSON property names are not translated to XML element or attribute names, but rather to values of the key attribute.
Lots of different people have come up with lots of different conversions of JSON to XML. As already pointed out, the XPath 3.1 and the XSLT 3.0 spec have a loss-less, round-tripping conversion with json-to-xml and xml-to-json that can handle any JSON.
There are simpler conversions that handle limited sets of JSON, the main problem is how to represent property names of JSON that don't map to XML names e.g. { "prop 1" : "value" } is represented by json-to-xml as <string key="prop 1">value</string> while conversions trying to map the property name to an element or attribute name either fail to create well-formed XML (e.g. <prop 1>value</prop 1>) or have to escape the space in the element name (e.g. <prop_1>value</prop_1> or some hex representation of the Unicode of the space inserted).
In the end I guess you want to select the property foo in { "foo" : "value" } as foo which the simple conversion would give you; in XPath 3.1 you would need ?foo for the XDM map or fn:string[#key = 'foo'] for the json-to-xml result format.
With { "prop 1" : "value" } the latter kind of remains as fn:string[#key = 'prop 1'], the ? approach needs to be changed to ?('prop 1') or .('prop 1'). Any conversion that has escaped the space in an element name requires you to change the path to e.g. prop_1.
There is no ideal way for all kind of JSON I think, in the end it depends on the JSON formats you expect and the willingness or time of users to learn a new selection/querying approach.
Of course you can use other JSON to XML conversions than the json-to-xml and then use XPath 3.1 on any XML format; I think that is what the oXygen guys opted for, they had some JSON to XML conversion before XPath 3.1 provided one and are mainly sticking with it, so in oXygen you can write "path" expressions against JSON as under the hood the path is evaluated against an XML conversion of the JSON. I am not sure which effort it takes to indicate which JSON values in the original JSON have been selected by XPath path expressions in the XML format, that is probably not that easy and straightforward.

Json Object with one attribute or primitive Json Data type?

I am building a REST API which creates a resource. The resource has only one attribute which is a rather long and unique string. I am planning to send this data to the API as JSON. I see two choices for modeling the data as JSON
A primitive JSON String data type
A JSON object with one String attribute.
Both the options work.
Which of these two options is preferred for this context? And why?
Basic Answer for Returning
I would personally use option 2, which is: `A JSON object with one String attribute.'
Also, in terms of design: I prefer to return an object, that has a key/value. The key is also a name that provides context as to what has been returned.
Returning just a string, basically a "" or {""} lacks that context ( the name of the returned variable.
Debate: Are primitive Strings Json Objects?
There seems to be also some confusion as to if a String by itself is a valid JSON document.
This confusion and debate, are quite evident in the following posts where various technical specs are mentioned: Is a primitive type considered JSON?
The only thing for sure is that a JSON object with a key-value pair is definitely valid!
As to a string by itself.. I'm not sure ( requires more reading).
Update: Answer In terms of creating/updating an entity (Post/Put)
In the specific case above, relating to such a large string that "runs into a few kilobytes"... my feeling is that this would be included within the request body.
In the specific context of sending data, I would actually be comfortable with using either 1 or 2. Additionally, 1 seems more optimized ( if your frameworks support it), since the context about what the data is, is related to the rest API method.
However, if in the future you need to add one more parameter, you will have to use a JSON entity with more than one key.

How to create diff of two generic T:Codable structs in Swift?

Given: I have two structs of the same type, conforming to Codable Protocol.
The structs can be multi-level (nested properties, surely also are conforming to Codable). The type is not known at the time of implementation, so i consider it generic, conforming to Codable.
One object is "base" (say, received from server), second (actually the copy of "base"), but modified inside application.
The intention is: To send a request for saving new data, but sending only the "diff" of two structs. So, only the fields, that are different should be present in resulting JSON.
The straightforward way with getting JSON strings for both structs and manipulating with them, is understandable, but seem to be the last-chance approach...
I've tried the approach with Mirror, and recursion, but now have managed to make it work only for first level - on the second level of nesting i've lost the type of nested property (if struct or array), and cannot cast it right then...
I wonder if it can be made somehow with custom encoder?
P.S.: the generic type should have all properties as Optionals, so should not provide any explicit initializers.
Instead of your "last-chance approach" -- matching JSON strings -- you could use JSONSerialization.jsonObject to convert the JSON data to Foundation objects and perform your comparison on that higher level of abstraction (if that's what you meant in your question in the first place, then sorry - nevermind).
Of course you'd pay an extra penalty of converting your Codable objects to data and then parsing that data into an object hierarchy.

Is it valid for JSON data structure to vary between a list and a boolean

The json data structure for jstree is define in https://github.com/vakata/jstree, here is an example
[ { "text" : "Root node", "children" : [ "Child node 1", "Child node 2" ] } ]
Notably it says
The children key can be used to add children to the branch, it should
be an array
However later on in section Populating the tree using AJAX and lazy loading nodes it shows to use set children to false to indicate when a child has not be processed
[{
"id":1,"text":"Root node","children":[
{"id":2,"text":"Child node 1","children":true},
{"id":3,"text":"Child node 2"}
]
}]
So here we see children used as both as an array and as a boolean
I am using jstree as an example because this is where I encountered the issue, but my question is really a general json question. My question is this, is it valid JSON for the same element in json to be two different types (an array and a boolean)
Structure wise, both are valid JSON packets. This is okay, as JSON is somewhat less stricter than XML(with a XSD or a DTD). As per: https://www.w3schools.com/js/js_json_objects.asp,
JSON objects are surrounded by curly braces {}.
JSON objects are written in key/value pairs.
Keys must be strings, and values must be a valid JSON data type (string, number, object, array, boolean or null).
Keys and values are separated by a colon.
Each key/value pair is separated by a comma.
Having said that, if the sender is allowed to send such JSONs, only caveat is that server side will have to handle this discrepancy upon receiving such different packets. This is a bad-looking-contract, and hence server might need to do extra work to manage it. Server side handling of such incoming JSON packets can become tricky.
See: How do I create JSON data structure when element can be different types in for use by
You could validate whether a JSON is okay or not at https://jsonlint.com/
See more about JSON in this answer: https://stackoverflow.com/a/4862511/945214
It is valid Json. JSON RFC 8259 defines a general syntax but it contains nothing that would allow a tool to identify that two equally named entries are meant to describe the same conceptual thing.
The need to have a criteria to check two JSON structures for instance equality has been one motivation to create something like Json Schema.
I also think it is not too unusual for javascript to provide this kind of mixed data. Sometimes it might help to explicitly convert the javascript object to JSON. Like in JSON.stringify(testObject)
A thing for json validation
https://www.npmjs.com/package/json-validation
https://davidwalsh.name/json-validation.

Does JSON syntax allow duplicate keys in an object?

Is this valid json?
{
"a" : "x",
"a" : "y"
}
http://jsonlint.com/ says yes.
http://www.json.org/ doesn't say anything about it being forbidden.
But obviously it doesn't make much sense, does it?
Most implementations probably use a hashtable so it is being overriden anyways.
The short answer: Yes but is not recommended.
The long answer: It depends on what you call valid...
ECMA-404 "The JSON Data Interchange Syntax" doesn't say anything about duplicated names (keys).
However, RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" says:
The names within an object SHOULD be unique.
In this context SHOULD must be understood as specified in BCP 14:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
RFC 8259 explains why unique names (keys) are good:
An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
Also, as Serguei pointed out in the comments: ECMA-262 "ECMAScript® Language Specification", reads:
In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.
In other words, last-value-wins.
Trying to parse a string with duplicated names with the Java implementation by Douglas Crockford (the creator of JSON) results in an exception:
org.json.JSONException: Duplicate key "status" at
org.json.JSONObject.putOnce(JSONObject.java:1076)
From the standard (p. ii):
It is expected that other standards will refer to this one, strictly adhering to the JSON text format, while
imposing restrictions on various encoding details. Such standards may require specific behaviours. JSON
itself specifies no behaviour.
Further down in the standard (p. 2), the specification for a JSON object:
An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs.
A name is a string. A single colon token follows each name, separating the name from the value. A single
comma token separates a value from a following name.
It does not make any mention of duplicate keys being invalid or valid, so according to the specification I would safely assume that means they are allowed.
That most implementations of JSON libraries do not accept duplicate keys does not conflict with the standard, because of the first quote.
Here are two examples related to the C++ standard library. When deserializing some JSON object into a std::map it would make sense to refuse duplicate keys. But when deserializing some JSON object into a std::multimap it would make sense to accept duplicate keys as normal.
There are 2 documents specifying the JSON format:
http://json.org/
https://www.rfc-editor.org/rfc/rfc7159
The accepted answer quotes from the 1st document. I think the 1st document is more clear, but the 2nd contains more detail.
The 2nd document says:
Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
So it is not forbidden to have a duplicate name, but it is discouraged.
I came across a similar question when dealing with an API that accepts both XML and JSON, but doesn't document how it would handle what you'd expect to be duplicate keys in the JSON accepted.
The following is a valid XML representation of your sample JSON:
<object>
<a>x</a>
<a>y</a>
</object>
When this is converted into JSON, you get the following:
{
"object": {
"a": [
"x",
"y"
]
}
}
A natural mapping from a language that handles what you might call duplicate keys to another, can serve as a potential best practice reference here.
Hope that helps someone!
The JSON spec says this:
An object is an unordered set of name/value pairs.
The important part here is "unordered": it implies uniqueness of keys, because the only thing you can use to refer to a specific pair is its key.
In addition, most JSON libs will deserialize JSON objects to hash maps/dictionaries, where keys are guaranteed unique. What happens when you deserialize a JSON object with duplicate keys depends on the library: in most cases, you'll either get an error, or only the last value for each duplicate key will be taken into account.
For example, in Python, json.loads('{"a": 1, "a": 2}') returns {"a": 2}.
Posting and answer because there is a lot of outdated ideas and confusion about the standards. As of December 2017, there are two competing standards:
RFC 8259 - https://www.rfc-editor.org/rfc/rfc8259
ECMA-404 - http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
json.org suggests ECMA-404 is the standard, but this site does not appear to be an authority. While I think it's fair to consider ECMA the authority, what's important here is, the only difference between the standards (regarding unique keys) is that RFC 8259 says the keys should be unique, and the ECMA-404 says they are not required to be unique.
RFC-8259:
"The names within an object SHOULD be unique."
The word "should" in all caps like that, has a meaning within the RFC world, that is specifically defined in another standard (BCP 14, RFC 2119 - https://www.rfc-editor.org/rfc/rfc2119) as,
SHOULD This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore
a particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
ECMA-404:
"The JSON syntax does not impose any restrictions on the strings used
as names, does not require that name strings be unique, and does not
assign any significance to the ordering of name/value pairs."
So, no matter how you slice it, it's syntactically valid JSON.
The reason given for the unique key recommendation in RFC 8259 is,
An object whose names are all unique is interoperable in the sense
that all software implementations receiving that object will agree on
the name-value mappings. When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
In other words, from the RFC 8259 viewpoint, it's valid but your parser may barf and there's no promise as to which, if any, value will be paired with that key. From the ECMA-404 viewpoint (which I'd personally take as the authority), it's valid, period. To me this means that any parser that refuses to parse it is broken. It should at least parse according to both of these standards. But how it gets turned into your native object of choice is, in any case, unique keys or not, completely dependent on the environment and the situation, and none of that is in the standard to begin with.
SHOULD be unique does not mean MUST be unique. However, as stated, some parsers would fail and others would just use the last value parsed. However, if the spec was cleaned up a little to allow for duplicates then I could see a use where you may have an event handler which is transforming the JSON to HTML or some other format... In such cases it would be perfectly valid to parse the JSON and create another document format...
[
"div":
{
"p": "hello",
"p": "universe"
},
"div":
{
"h1": "Heading 1",
"p": "another paragraph"
}
]
could then easily parse to html for example:
<body>
<div>
<p>hello</p>
<p>universe</p>
</div>
<div>
<h1>Heading 1</h1>
<p>another paragraph</p>
</div>
</body>
I can see the reasoning behind the question but as it stands... I wouldn't trust it.
It's not defined in the ECMA JSON standard. And generally speaking, a lack of definition in a standard means, "Don't count on this working the same way everywhere."
If you're a gambler, "many" JSON engines will allow duplication and simply use the last-specified value. This:
var o = {"a": 1, "b": 2, "a": 3}
Becomes this:
Object {a: 3, b: 2}
But if you're not a gambler, don't count on it!
Asking for purpose, there are different answers:
Using JSON to serialize objects (JavaScriptObjectNotation), each dictionary element maps to an indivual object property, so different entries defining a value for the same property has no meaning.
However, I came over the same question from a very specific use case:
Writing JSON samples for API testing, I was wondering how to add comments into our JSON file without breaking the usability. The JSON spec does not know comments, so I came up with a very simple approach:
To use duplicate keys to comment our JSON samples.
Example:
{
"property1" : "value1", "REMARK" : "... prop1 controls ...",
"property2" : "value2", "REMARK" : "... value2 raises an exception ...",
}
The JSON serializers which we are using have no problems with these "REMARK" duplicates and our application code simply ignores this little overhead.
So, even though there is no meaning on the application layer, these duplicates for us provide a valuable workaround to add comments to our testing samples without breaking the usability of the JSON.
The standard does say this:
Programming languages vary widely on whether they support objects, and
if so, what characteristics and constraints the objects offer. The
models of object systems can be wildly divergent and are continuing to
evolve. JSON instead provides a simple notation for expressing
collections of name/value pairs. Most programming languages will have
some feature for representing such collections, which can go by names
like record, struct, dict, map, hash, or object.
The bug is in node.js at least. This code succeeds in node.js.
try {
var json = {"name":"n","name":"v"};
console.log(json); // outputs { name: 'v' }
} catch (e) {
console.log(e);
}
According to RFC-7159, the current standard for JSON published by the Internet Engineering Task Force (IETF), states "The names within an object SHOULD be unique". However, according to RFC-2119 which defines the terminology used in IETF documents, the word "should" in fact means "... there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." What this essentially means is that while having unique keys is recommended, it is not a must. We can have duplicate keys in a JSON object, and it would still be valid.
From practical application, I have seen the value from the last key is considered when duplicate keys are found in a JSON.
In C# if you deserialise to a Dictionary<string, string> it takes the last key value pair:
string json = #"{""a"": ""x"", ""a"": ""y""}";
var d = JsonConvert.DeserializeObject<Dictionary<string, string>>(json);
// { "a" : "y" }
if you try to deserialise to
class Foo
{
[JsonProperty("a")]
public string Bar { get; set; }
[JsonProperty("a")]
public string Baz { get; set; }
}
var f = JsonConvert.DeserializeObject<Foo>(json);
you get a Newtonsoft.Json.JsonSerializationException exception.