Generate XML from XSD repeatable elements

Generate XML from XSD repeatable elements - json

Whats the tool to generate xml from xsd and the generated xml should contain more than one entries for the repeatable elements? I tried out tools that are available on eclipse and some online tools like xml-generator, but none of these work. They all generate only one entry for the repeatable elements.
Note: I want to convert the generated xml to json, but the xml-json convertor treats the repeatable elements in the xml as an array only if it has more than one entry.

Generating XMLs from XSD can be quite challenging, if only because what people expect to see may not be possible to be captured by an XSD.
QTAssistant (I am associated with it) has quite extensive features when it comes to sample XML creation.
The simplest (and dumbest) one (available by right clicking on the element in the graphical XSD visualizer) is still able to create two elements if the associated maxOccurs is greater than one.
However, the XML may be off: just because one may have named a field dateTime, it doesn't mean the generated text node will be a valid date time value, if the schema defined it as a string. The tool will also only create one of the choices (if your schema uses xsd:choice), etc.
QTAssistant can make use of additional metadata which gives the user ultimate control over the generated samples. It can even create thousands of XMLs by doing combinations captured using metadata items. (You should contact us on the support site if you're interested in these scenarios).
Regarding XML to JSON conversion, QTAssistant can also correctly convert XMLs to JSON for repeating fields. Given this XML:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<fundamo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:common="http://www.myorg.com/xsd/gen_fin">
<response>
<common:code>code1</common:code>
<common:description>description1</common:description>
</response>
<transaction>
<transactionRef>transactionRef1</transactionRef>
<dateTime>dateTime1</dateTime>
<userName>userName1</userName>
</transaction>
</fundamo>
The corresponding JSON is:
{
"response": {
"code": "code1",
"description": "description1"
},
"transaction": [
{
"transactionRef": "transactionRef1",
"dateTime": "dateTime1",
"userName": "userName1"
}
]
}
You may notice that transaction is an array, even if XML has only one of those elements. This conversion works for valid XMLs, as long as you have a defined XSD for all its content. For the past 2+ years we've been calling it "XSD-aware JSON conversion". It is also possible to define casing conversion strategies (e.g. change upper case to lower case since XML elements tend to be upper case, while JSON "people" prefer them lower case).
If you're in for a free tool or to write your own, I am sure you can use the free evaluation as a source of inspiration to address only the specific features you're interested in.

Related

Does JSON to XML lose me anything?

We have a program that accepts as data XML, JSON, SQL, OData, etc. For the XML we use Saxon and its XPath support and that works fantastic.
For JSON we use the jsonPath library which is not as powerful as XPath 3.1. And jsonPath is a little squirrelly in some corner cases.
So... what if we convert the JSON we get to XML and then use Saxon? Are there limitations to that approach? Are there JSON constructs that won't convert to XML, like anonymous arrays?

The headline question: The json-to-xml() function in XPath 3.1 is lossless, except that by default, characters that are invalid in XML (such as NUL, or unpaired surrogates) are replaced by a SUB character -- you can change this behaviour with the option escape=true.
The losslessness has been achieved at some cost in convenience. For example, JSON property names are not translated to XML element or attribute names, but rather to values of the key attribute.

Lots of different people have come up with lots of different conversions of JSON to XML. As already pointed out, the XPath 3.1 and the XSLT 3.0 spec have a loss-less, round-tripping conversion with json-to-xml and xml-to-json that can handle any JSON.
There are simpler conversions that handle limited sets of JSON, the main problem is how to represent property names of JSON that don't map to XML names e.g. { "prop 1" : "value" } is represented by json-to-xml as <string key="prop 1">value</string> while conversions trying to map the property name to an element or attribute name either fail to create well-formed XML (e.g. <prop 1>value</prop 1>) or have to escape the space in the element name (e.g. <prop_1>value</prop_1> or some hex representation of the Unicode of the space inserted).
In the end I guess you want to select the property foo in { "foo" : "value" } as foo which the simple conversion would give you; in XPath 3.1 you would need ?foo for the XDM map or fn:string[#key = 'foo'] for the json-to-xml result format.
With { "prop 1" : "value" } the latter kind of remains as fn:string[#key = 'prop 1'], the ? approach needs to be changed to ?('prop 1') or .('prop 1'). Any conversion that has escaped the space in an element name requires you to change the path to e.g. prop_1.
There is no ideal way for all kind of JSON I think, in the end it depends on the JSON formats you expect and the willingness or time of users to learn a new selection/querying approach.
Of course you can use other JSON to XML conversions than the json-to-xml and then use XPath 3.1 on any XML format; I think that is what the oXygen guys opted for, they had some JSON to XML conversion before XPath 3.1 provided one and are mainly sticking with it, so in oXygen you can write "path" expressions against JSON as under the hood the path is evaluated against an XML conversion of the JSON. I am not sure which effort it takes to indicate which JSON values in the original JSON have been selected by XPath path expressions in the XML format, that is probably not that easy and straightforward.

Should I be using JSON or XML when the order of the properties, of the server-returned object, matters?

Client uses ReactJS and server is using Node, Express and MySQL for the db. I'm making a website that allows users to post news articles, e.g. long bodies of text with images inserted at random points inside them. Think of a website like Medium that allows users to write blog posts and insert images at any point.
My issue here lies with how I am to store the information for each article. Originally, I was going to use XML - use and tags and represent each article as an XML document. This way, when the server parses the XML document after pulling it from the DB, the order of the elements is preserved.
For example:
<<text> /* long body of text (e.g. about 300 - 500 words?) */ </text>
<img> First image path goes here </img>
<text> /* Another body of text! */ </text>
<img> Some cool graph that's relevant for the article </img>
When my server tries to parse this XML document, the order is very important. The contents of the first text tag (line 1 in the code sample above) must be parsed first. Then below that should be the first image (line 2). Once the full XML document is parsed, it should be sent to the client, where React will iterate through the returned object and, create a paragraph or image ReactElement for each object.
I use JSON for most things to represent the objects that client and server exchange. But, I am quite aware that within JSON objects, the order of the keys {key: value}, is not preserved. Therefore, it would be possible that (referring to my silly code snippet above), line 2 could be added to my VirtualDOM before line 1, making the outputted order incorrect.
Therefore, should I be using JSON or XML as 1) the format of the object that the server returns and 2) the representation of each article.

Both XML and JSON have the ability to represent data in a way that preserves order. The nearest JSON equivalent to your XML might be:
[
{"text":"long body of text (e.g. about 300 - 500 words?)"},
{"img":"First image path goes here"},
{"text":"Another body of text!"},
{"img":"Some cool graph that's relevant for the article"}
]
Because order matters, you have to use an array here.
You could argue that this structure looks pretty strange, and you would be right. JSON was never designed for representing document structures. I would strongly question the wisdom of using JSON to represent free-format articles in the way you are proposing. There are some things JSON does better than XML, and there are other things XML does better than JSON, and your application is firmly in the second category.

MySQL's JSON data type does not preserve the order however you can store it as a string which will save idempotency. But you lose the ability to use the MySQL JSON functions, which you might be able to live with if your processing is mainly in your application.

strategy for returning JSON from XML canonical model

I am using the envelope pattern, and my canonical model part is in XML format. I usually return the model in full or in a summary version. Retrieval of documents is pretty quick, but when returning as part of my REST call, where I need to return JSON to the browser, my json:transform-to-json takes double the version of the call that just returns the XML.
Is a strategy to also have the canonical model in JSON format as well in the envelope, or to maybe have rendered json in full and summary formats in other documents outside of the envelope, which don't get searched, but are mainly used when returning results? This way I don't have to incur the hit for transforming the canonical model to JSON all the time.
Are there any other ways that this has been done?

Conversion from XML to JSON should be relatively light, but the mere fact it has to do something will take overhead. Doing that work upfront will definitely save time. You can put both formats in the same envelop (though JSON will have to be stored as string then), or in a different document as you suggest. Alternatively you could also store it in document-properties. Unfortunately, that only takes XML as well, so you will be storing your JSON as string in there too.
Alternatively, have you profiled the transform to see if there is a particular reason why it slows down so much? Using XSLT versus XQuery for the transform could make a difference too..
HTH!

json:transform-to-json has 3 algorithms optimized for different purposes and will perform with different tradeoffs of flexabilty, fidelity and performance.
"basic" (default) useful only to reverse json:transform-from-json()
"full" - to preserve as much information fidelity as possible, in exchange for a non 'prety' format in many cases.
"custom" - is ... custom ... designed when the json format is fixed or when you want control over the json output at the expense of handling a subset of XML accurately.
Basic and full are the most efficient. However all variants are fairly involved and require completely traversing the XML node tree and creating bottom up a JSON object tree. In ML version 8 this is then translated into the native JSON node structure. In a REST call it would then be serialized as text.
Compared to a direct return of an xml document vi fn:doc("file.xml") there is atleast 2 orders of magnitude more operations involved in the transform case.
For small documents in a REST call that still a small fraction of the total request time, expecialy if the REST call was performing a complex operation itself then returning a small result. Your use case seems the opposite - returning a xml document directly bypasses almost all of the XQuery processing and is sent directly from internal to the output or assigned to a variable.
If that an important use case to optimize, especially if the documents can be large, then saving them as text or binary will be much faster -- at the expense of more storage used. If this is only a variant representation of the xml, try storing the text JSON as binary as it will not incur any indexing overhead.
Otherwise if you need to query over the JSON then in ML7 storing as text gives you simple word queries, in ML8 storing as native JSON gives you structured queries -- both with efficient text serialization.

Potential problems of mapping JSON to XML

What are the major problems of mapping JSON to XML and viceversa? I have a set of problems that I can run into, but it would be very helpful if others can add what they have ran into when converting between both.
My list is:
Root object required in JSON
Unique keys (although only one of the two specifications requires this)
Keys cannot start with a number
Order may not be preserved (see http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html)
Any other one?

Disclaimer: I am the author of Jsonix, a XML<->JSON conversion library written in JavaScript. So I'm speaking a bit from experience of mapping between complex XML and JSON.
Top-level production in JSON may be JSONArray or JSONObject (in JSON interchange format even any JSONText - also null, boolean, string, number). XML requires a single root element.
JSON objects have properties, XML elements may have attributes, contain sub-elements and text values (I'm even leaving comments and PIs out).
You're mentioning "keys cannot start with a number", but there's more syntactical incompatibilities. JSON object properties can be basically any strings. XML element and attribute names are restricted in syntax.
Normally no namespaces in JSON, often namespaces in XML.
Strict typing. You always know JSON type just by looking at the value. In XML, you can't guess type from the value. For instance 1 may be string, boolean, a dozen of numeric types etc. You have to know the schema to know types.
In JSON, you can guess the structure from value (object or array). In XML, if you see an single element, you don't know if it may be repeated or not. You have to know the schema to know the structure.
Collections are normally expressed as arrays in JSON. In XML, you can express a collection as repeatable elements (item*), possibly wrapped (items/item*), or in case of simple types as list types (<items>a b c d</items>).
In XML, the order of sub-elements or text nodes of the element is significant. In JSON, properties of the JSONObject are not ordered. (You mention this.)
In XML, an element may contain several sub-elements of the same name. In JSONObject, property names will be unique. (You mention this.)
In XML, an element may contain attributes, sub-elements and text nodes. In JSON, the only complex structures are JSONObject and JSONArray. In JSONArray you just have items, no named components (which would be analogous to attributes or sub-elements). In JSONObject you just have properties (JSONMembers) which are always "named" (this would be analogous to attributes and sub-elements of XML, but not to text nodes).
Processing instructions and comments in XML, no direct analogs in JSON.
There's also xsi:type construct which is a bit hard to handle. Specifies the type of the element value in the document instance.
In XML, values of certain types (like QNames) depend on the declarations in other parts of the XML document. For example, having my:Element as xs:QName-value somewhere, this value will depend on how the my namespace prefix is declared in the document. Since namespaces may be declared and re-declared, you have to follow their declaraition quite precisely to be able to find out the namespace of the qualified name.

Converting a specific JSON object (or class of objects) into XML is usually no problem at all. What is difficult is writing a converter that can handle any JSON object. The problem essentially arises because you want simple JSON to end up as simple XML, but you find yourself contorting the design to handle edge cases, such as characters that are legal in JSON but not in XML, preserving distinctions such as the distinction between the number 10 and the string "10", or worrying about the best representation of a JSON "null".

JSON posting, am i pushing JSON too far?

Im just wondering if I am pushing JSON too far? and if anyone has hit this before?
I have a xml file:
<?xml version="1.0" encoding="UTF-8"?>
<customermodel:Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:customermodel="http://customermodel" xmlns:personal="http://customermodel/personal" id="1" age="1" name="Joe">
<bankAccounts xsi:type="customermodel:BankAccount" accountNo="10" bankName="HSBC" testBoolean="true" testDate="2006-10-23" testDateTime="2006-10-23T22:15:01+08:00" testDecimal="20.2" testTime="22:15:01+08:00">
<count>0</count>
<bankAddressLine>HSBC</bankAddressLine>
<bankAddressLine>London</bankAddressLine>
<bankAddressLine>31 florence</bankAddressLine>
<bankAddressLine>Swindon</bankAddressLine>
</bankAccounts>
</customermodel:Customer>
Which contains elements and attributes....
Which when i convert to JSON gives me:
{"customermodel:Customer":{"id":"1","name":"Joe","age":"1","xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","bankAccounts":{"testDate":"2006-10-23","testDecimal":"20.2","count":"0","testDateTime":"2006-10-23T22:15:01+08:00","bankAddressLine":["HSBC","London","31 florence","Swindon"],"testBoolean":"true","bankName":"HSBC","accountNo":"10","xsi:type":"customermodel:BankAccount","testTime":"22:15:01+08:00"},"xmlns:personal":"http://customermodel/personal","xmlns:customermodel":"http://customermodel"}}
So then i send this too the client.. which coverts to a js object (or whatever) edits some values (the elements) and then sends it back to the server.
So i get the JSON string, and convert this back into XML:
<customermodel:Customer>
<id>1</id>
<age>1</age>
<name>Joe</name>
<xmlns:xsi>http://www.w3.org/2001/XMLSchema-instance</xmlns:xsi>
<bankAccounts>
<testDate>2006-10-23</testDate>
<testDecimal>20.2</testDecimal>
<testDateTime>2006-10-23T22:15:01+08:00</testDateTime>
<count>0</count>
<bankAddressLine>HSBC</bankAddressLine>
<bankAddressLine>London</bankAddressLine>
<bankAddressLine>31 florence</bankAddressLine>
<bankAddressLine>Swindon</bankAddressLine>
<accountNo>10</accountNo>
<bankName>HSBC</bankName>
<testBoolean>true</testBoolean>
<xsi:type>customermodel:BankAccount</xsi:type>
<testTime>22:15:01+08:00</testTime>
</bankAccounts>
<xmlns:personal>http://customermodel/personal</xmlns:personal>
<xmlns:customermodel>http://customermodel</xmlns:customermodel>
</customermodel:Customer>
And there is the problem, is doesn't seem to know the difference between elements/attributes so i can not check against a XSD to check this is now valid?
Is there a solution to this?
I cannot be the first to hit this problem?

JSON does not make sense as an XML encoding, no. If you want to be working with and manipulating XML, then work with and manipulate XML.
JSON is for when you need something that's lighter weight, easier to parse, and easier to write and read. It has a fairly simple structure, that is neither better nor worse than XML, just different. It has lists, associations, strings, and numbers, while XML has nested elements, attributes, and entities. While you could encode each one in the other precisely, you have to ask yourself why you're doing that; if you want JSON use JSON, and if you want XML use XML.

JsonML provides a well thought out standard mapping from XML<->JSON. If you use it, you'll get the benefit of ease-of-manipulation you're looking for on the client with no loss of fidelity in elements/attributes.

I wouldn't encode the xml schema information in the json string-- that seems a little backwards. If you're going to send them JSON, they shouldn't have any inkling that this is anything but JSON. The extra xml will serve to confuse and make the your interface look "leaky".
You might even consider just using xml and avoid the additional layer of abstraction. JSON makes the most sense when you know at least one party is actually using javascript. If this isn't the case it'll still work as well as any other transport format. But if you already have an xml representation it's a little excessive.
On the other hand, if your customer is really using javascript it will make it easier for them to use the data. The only concern is the return trip, and once it's in JSON who do you trust more to do the conversion back to xml correctly? You're probably better qualified for that, since it's your schema.

For this to work you would need to build additional logic/data into your serialize/unserialize methods - probably create something like "attributes" and "data" to hold the different parts:
{"customermodel:Customer":
{
"attributes": {"xmlns:xsi":"...", "xmlns:customermodel":"..."},
"data":
{
"bankAccounts":
{
"attributes": { ... }
"data" :
{
"count":0,
"bankAddressLine":"..."
}
}
}
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008