I have an XML and XSD file. I am using Apache NiFi to convert XML to JSON. However, it is nested in many levels and hence I want to validate if the conversion is fine. I want to validate the same using XSD in Apache NiFi.
I will not be able to share the company sensitive information.
Is there any processor or script that I can use? there is an option of writing Python script in a processor called ExecuteScript.
Thanks in advance
There are two parts to your question.
Can JSON be validated via XSD?
Does nifi have a processor that validates JSON via XSD?
The first part already is answered here:
Validate JSON against XML Schema (XSD)
Now for the second part, depending on the solution you end up going with, neither one is implemented in a nifi processor, and attempting to use the ExecuteScript will not work for you because these require use of imported non-native modules. Instead you would need to create your own custom processor with java and import that into nifi which would solve your problem. This is all a bit labor intensive.
Alternatively, you could try a reverse conversion back to XML into an attribute and then validate that attribute content against the original XSD. This is a method I use a lot when writing unit tests. I haven't personally tried this in nifi, but it sounds like it would be possible and would likely be the least complicated solution.
Related
I came across this service from stackoverflow
https://api.stackexchange.com/2.3/questions?fromdate=1519862400&todate=1522368000&order=desc&sort=activity&site=stackoverflow&tagged=python
I believe the source is from a database. How do I build an Xml to spit me out data in similar format?
I use the below logical lines
xmldoc.Load(xmlFileName);
Newtonsoft.Json.JsonConver.SerializeXmlNode(xmldoc);
Any recommendation of how to build the Xml which is a reverse process? My solutions are heavily dependant on Xml and flatFiles
According to https://api.stackexchange.com/docs the StackExchange API only supports JSON output, not XML. So you will have to convert the JSON to XML.
My own preference is to do this conversion "by hand" using XSLT 3.0 rather than using a standard library, because standard libraries often give you XML that's rather difficult to work with.
Is there a tool like Google's Protobuf for JSON? I know you can convert from a Protobuf format to JSON but that requires a whole lot of extra serialization/deserialization, and I was wondering if there is some kind of tool that lets you specify the structure of a JSON message and then automatically generates libraries for use in a specified language (direct serialization/deserialization not just a wrapper around Protobuf's JSON formatter class)
I know nearly all languages provide their own in house way of handling JSON, and many higher level ones even allow you to avoid the boiler plate parsing code, but I was looking for a universal tool where you would only need to specify the format once, and then just get the generated libraries for use in multiple languages.
The Protobuf equivalent would be JSON-Schema, but still is language dependent on having a serializer or code generator available, just as Protobuf is.
If you're looking at making a REST-API, then OpenAPI Spec + swagger-codegen could be an option.
Morning!
I've got an app with a config file that's become unwieldy - many switches with no intuition as to which combinations are valid. Right now, all the switches are stored in an XML file. The config file specifies inputs for a large HPC job.
I'm thinking of writing some a formal grammar for a run - that is, the sort of combinations that are acceptable, and from the parsing of it, the switches needed will automatically be inferred. The values would still be read from the XML file, but only when needed.
Is this sort of approach reasonable? How would I go about implementing a grammar without a parser?
If I understand you correctly, you want to implement a Domain Specific Language (DSL), the purpose of which is to specify validation rules for the contents of an XML-based configuration file.
Some people implement a DSL by defining a parser specific to the needs of the DSL. However, some other people shoehorn the semantics of their DSL into the syntax of an existing file format, such as XML or JSON. So if you want to avoid having to write a parser, you could express your DSL in XML syntax.
Following is my requirement:
Application A is creating a JSON based on its Java Beans and sending to my Application.
I have to take this JSON and convert it into XML (XSD for this is completely different than my JSON structure) and send to Application B.
Solution 1) I am currently converting this json to xml using json.org library.Then using Apache-xalan and XSL stylesheet, I am converting this to xml format as required by App B.
Solution 2) Converting this json to Java Bean (JB1).Then converting this JB1 to another Java Bean (JB2) as per the xml structure required by Application B.Then convert JB2 to XML for app B.
Solution 3) Using Apache Xalan and Xerces to parse through the input json and make the XML in Java itself without using XSL.
Which is better approach (in simplicity of code, throughput )? As JSON becomes more complex, is it easy to use solution 1 ? Please suggest if there is better approach other than these 3 ?
XSLT 3.0 offers a built-in json-to-xml() function. Once you have the XML, you can easily transform it to your required format. It is implemented in Saxon 9.7 (PE or higher) and I believe in Exselt.
Solution 1: Yes. This is the conventional and best path for both simple and complex JSON and simple or complex targeted XML.
Solution 2: No. There's no reason to introduce Java Beans as an intermediate form, especially if you have no other need for Java Beans. This option unnecessarily introduces transformational and marshalling complexity.
Solution 3: No. Neither Xalan nor Xerces are designed to parse JSON; they are designed to parse XML.
There are sample programs that will map a JSON document into an equivalent XML document and back; I wrote one as a demo for Liberty's support of json-p (javax.json), using an XML vocabulary I called JinX (JSON in XML). That could be used as a pre/post processor wrapped around XSLT, if desired.
Better solutions are possible -- redefine XSLT to operate on JSON trees, for example -- but would take a bit more work.
JSON is, pure-and-simple, "a communications protocol." In other words, "it specifically exists(!) to allow 'arbitrary (JavaScript) data structures' to be conveyed between some-client and some-host," over "the HTTP(S) protocol."
Therefore: "it is not(!) XML," and therefore must never be considered to be "appropriate input to XSLT!"
"Thou shalt not mix Apples and Oranges!"
If you wish to apply "XSLT" technologies to a "JSON-derived" input (which is, by definition, "a data structure ...") then you must first, and "by whatever suitable means," convert that data structure into XML.
I'm creating a web apllication and i want to load a json file to a visualization library. the thing is the json file needs to be in a certain format.
I'm using jena to get data in a json file that is in the TALIS format. How can i get the data writen in a custom format?Is it easier to first get them in talis and then transform them or get them in the desired form from the beginning?
I'd appreciate every possible help!
You don't say how you are serving your data to the client-side JavaScript application. I'm going to take a guess, and assume you are using Jena Fuseki to serve the data. If that's not a correct guess, you'll need to update the question to be more precise about your setup.
I don't think that Fuseki currently supports pluggable writers. So your best solution would be to apply a transformation in the client-side JavaScript to turn the JSON you get from the server into the format that's needed by the visualisation library. I've done this myself in a number of rich-client applications that consume RDF data. I usually find that I would need to apply client-side transform code in any case - often it's not just a difference in the format of the JSON, but also that you need to project some slice or aggregation of the data that's just easier to express in JavaScript rather than in SPARQL or equivalent.