What is MFL? Why is it used? Does it actually convert data from one encoding to another?
According to Oracle:
A Message Format Language (MFL)
document is a specialized XML document
used to describe the layout of binary
data. It is an Oracle proprietary
language used to define rules to
transform formatted binary data into
XML data.
Related
We have a program that accepts as data XML, JSON, SQL, OData, etc. For the XML we use Saxon and its XPath support and that works fantastic.
For JSON we use the jsonPath library which is not as powerful as XPath 3.1. And jsonPath is a little squirrelly in some corner cases.
So... what if we convert the JSON we get to XML and then use Saxon? Are there limitations to that approach? Are there JSON constructs that won't convert to XML, like anonymous arrays?
The headline question: The json-to-xml() function in XPath 3.1 is lossless, except that by default, characters that are invalid in XML (such as NUL, or unpaired surrogates) are replaced by a SUB character -- you can change this behaviour with the option escape=true.
The losslessness has been achieved at some cost in convenience. For example, JSON property names are not translated to XML element or attribute names, but rather to values of the key attribute.
Lots of different people have come up with lots of different conversions of JSON to XML. As already pointed out, the XPath 3.1 and the XSLT 3.0 spec have a loss-less, round-tripping conversion with json-to-xml and xml-to-json that can handle any JSON.
There are simpler conversions that handle limited sets of JSON, the main problem is how to represent property names of JSON that don't map to XML names e.g. { "prop 1" : "value" } is represented by json-to-xml as <string key="prop 1">value</string> while conversions trying to map the property name to an element or attribute name either fail to create well-formed XML (e.g. <prop 1>value</prop 1>) or have to escape the space in the element name (e.g. <prop_1>value</prop_1> or some hex representation of the Unicode of the space inserted).
In the end I guess you want to select the property foo in { "foo" : "value" } as foo which the simple conversion would give you; in XPath 3.1 you would need ?foo for the XDM map or fn:string[#key = 'foo'] for the json-to-xml result format.
With { "prop 1" : "value" } the latter kind of remains as fn:string[#key = 'prop 1'], the ? approach needs to be changed to ?('prop 1') or .('prop 1'). Any conversion that has escaped the space in an element name requires you to change the path to e.g. prop_1.
There is no ideal way for all kind of JSON I think, in the end it depends on the JSON formats you expect and the willingness or time of users to learn a new selection/querying approach.
Of course you can use other JSON to XML conversions than the json-to-xml and then use XPath 3.1 on any XML format; I think that is what the oXygen guys opted for, they had some JSON to XML conversion before XPath 3.1 provided one and are mainly sticking with it, so in oXygen you can write "path" expressions against JSON as under the hood the path is evaluated against an XML conversion of the JSON. I am not sure which effort it takes to indicate which JSON values in the original JSON have been selected by XPath path expressions in the XML format, that is probably not that easy and straightforward.
It appears that the standard Apache NiFi readers/writers can only parse JSON input based on Avro schema.
Avro schema is limiting for JSON, e.g. it does not allow valid JSON properties starting with digits.
JoltTransformJSON processor can help here (it doesn't impose Avro limitations to how the input JSON may look like), but it seems that this processor does not support batch FlowFiles. It is also not based on the readers and writers (maybe because of that).
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form
{"myprop":"myval","12345":"12345",...}
{"myprop":"myval2","12345":"67890",...}
and transform it to other JSON structure, e.g. defined by JSON schema, and e.g. using JSON Patch transformation, without writing my own processor?
Update
I am using Apache NiFi 1.7.1
Update 2
Unfortunately, #Shu's suggestion did work. I am getting same error.
Reduced the case to a single UpdateRecord processor that reads JSON with numeric properties and writes to a JSON without such properties using
myprop : /data/5836c846e4b0f28d05b40202
mapping. Still same error :(
it does not allow valid JSON properties starting with digits?
This bug NiFi-4612 fixed in NiFi-1.5 version, We can use AvroSchemaRegistry to defined your schema and change the
Validate Field Names
false
Then we can have avro schema field names starting with digits.
For more details refer to this link.
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form?
This bug NiFi-4456 fixed in NiFi-1.7, if you are not using this version of NiFi then we can do workaround to create an array of json messages with ,(comma delimiter) by using.
Flow:
1.SplitText //split the flowfile with 1 line count
2.MergeRecord //merge the flowfiles into one
3.ConvertRecord
For more details regards to this particular issues refer to this link(i have explained with the flow).
What is more correct? Say that JSON is a data structure or a data format?
It's almost certainly ambiguous and depends on your interpration of the words.
The way I see it:
A datastructure, in conventional Comp Sci / Programming i.e. array, queue, binary tree usually has a specific purpose. Json can be pretty much be used for anything, hence why it's a data-format. But both definitions make sense
In my opinion both is correct.
JSON is also a data format (.json) and also a data strucuture which you can use for instance in Java etc.
But more correct is data strucutre.
JSON (canonically pronounced /ˈdʒeɪsən/ jay-sən;[1] sometimes
JavaScript Object Notation) is an open-standard format that uses
human-readable text to transmit data objects consisting of
attribute–value pairs. It is the most common data format used for
asynchronous browser/server communication (AJAJ), largely replacing
XML which is used by AJAX.
Source: Wikipedia
I am doing XML to HTML Transformation and I need to convert some character entities. My XML file has Unicode values like è which I need to convert to its corresponding html value è. Other entities also need to be converted respectively. Character mapping for each and every entity can be quiet difficult as there are many.
I am using XSLT 2.0. My output method is xhtml. And currently I am getting the actual characters (in the above case è) in my HTML code. Need help. My Saxon Processor version is 9.1.0.5.
With normal XSLT processing Saxon will simply use an XML parser like Xerces or the version of Xerces the Sun/Oracle JRE comes with and once the parser has done its work and Saxon operates on its tree model there is no way to know whether the original input had a literal character like è or a decimal character references like è or a hexadecimal one like è.
And when serializing the result tree of a transformation you can of course use a character map to map characters to any representation you want but that will then happy of any è in the result tree, not only for ones resulting from hexadecimal character references in the input.
If you want to make sure all non-ASCII characters are serialized as character references then you need to use xsl:output encoding="US-ASCII".
Saxon 9.1 also provides http://saxonica.com/documentation9.1/extensions/output-extras/character-representation.html to control the format.
But I agree with the comments made, these days having UTF-8 as the output encoding and then simply literal characters in the serialization of the result tree should not pose any problems.
I have MS Access 2010 forms linking to a mySQL5 (utf8) database.
I have the following data stored in a varchar field:
"Jarosław Kot"
MS Access is just display this raw, as opposed to converting it to:
Jarosław Kot
Can anyone offer assistance?
Thanks Paul
The notation ł is a character reference in SGML, HTML, and XML. There is in general no reason to expect any software to treat it as anything but a literal of six characters “&”, “#”, etc., unless the software is interpreting the data as SGML, HTML, or XML.
So if you have data stored so that ł should be interpreted as a character reference, then you should convert the data, statically or dynamically. The specifics depend on what actual data there is—for example, do all the constructs use decimal notation (not hexadecimal), and is it certain that all numbers are to be interpreted as Unicode numbers for characters?
If I understand you correctly you can use Replace function:
Replace("Jarosław Kot", "ł", "ł")
Assuming that your mySQL database character set is effectively set to UTF8 and that all inserts and updates are utf8 compatible (I do not know that much about mySQL, but SQL Server has some specific syntax rules for utf8-compliant data...), you could then convert the available HTML data to plain UTF8 data.
You will not have any problem finding some conversion table (here for example), and, if you are lucky, you could even find a conversion function...