NIFI XML to JSON Null values - json

I am trying to run a very simple conversion flow that takes an input XML document and converts it to json using an Avro schema.
I must be doing something very wrong because I always get null values for even something simple like this:
<key1>value1</key1>
and the output is:
[{"key1":null}]
I followed this tutorial line by line here: https://pierrevillard.com/2018/06/28/nifi-1-7-xml-reader-writer-and-forkrecord-processor/
My AVRO Schema is defined as:
{
"type" : "record",
"name" : "MyClass",
"namespace" : "com.test.avro",
"fields" : [ {
"name" : "key1",
"type" : "string"
} ]
}
What am I missing? The only difference I can see is that my xml has a slightly different structure than the example. in the example, the XML seems to have structure of:
<key1 value="value1"/>
But even trying to change my input to that results in the same null values.
I see other posts on this questions but no real solutions. Even the some of the comments on those threads about incorrect XML structure, I have changed to that and it is still not working. I'm sure it is something simple, any help would be appreciated.
I'm very new to NIFI, but really like the potential of the tool!

Turns out that the NIFI component ConvertRecord expects XML data to not be at the root level, and instead the defined to start at the 2nd level of the XML document.
So my defined AVRO Schema as correct, I just had to wrap my input XML data in a root tag, like this:
<root><key1>value1</key1></root>

Related

In Power Automate Parse JSON step I get the errors "missing required properties"

I am working on a Power Automate flow to get a JSON file from SharePoint and Parse it. On one of my previous questions I received a solution that worked with a testing JSON file. However when I ran a couple of tests with some JSON files that I need to use, the Parse JSON step gives out errors regarding "missing" required properties.
Basically, the JSON file's arrays do not always have exactly the same elements (properties). For example (below) the element "minimun_version" does not always appear under the element "link", like the image below
and because of this syntax I get the errors below
How can I Parse such a JSON file successfully?
Any idea or suggestion will help me get unstuck.
Thank you
You can allow null values in your Parse Json schema by simply adding that to the schema. April Dunnam has a nice article about this approach:
https://www.sharepointsiren.com/2018/10/flow-parse-json-null-error-fix/
I assume you have something like below in your schema?
"minimum_version": {
"type": "number"
}
You can change that to this to allow null values
"minimum_version": {
"type": [
"number",
"null"
]
}

manipulating (nested) JSON keys and there values, using nifi

I am currently facing an issue where I have to read a JSON file that has mostly the same structure, has about 10k+ lines, and is nested.
I thought about creating my own custom processor which reads the JSON and replaces several matching key/values to the ones needed. As I am trying to use NiFi I assume that there should be a more comfortable way as the JSON-structure itself is mostly consistent.
I already tried using the ReplaceText processor as well as the JoltTransformJson processor, but I could not figure out. How can I transform both keys and values, if needed? For example: if there is something like this:
{
"id": "test"
},
{
"id": "14"
}
It might be necessary to turn the "id" into "Number" and map "test" to "3", as I am using different keys/values in my jsonfiles/database, so they need to fit those. Is there a way of doing so without having to create my own processor?
Regards,
Steve

Using regex in notepad++ to reduce json objects

I have a huge Json file containing objects of the form:
{ "country" : "UK", "city" : "London" }
I want to limit the number of instances for each "country". Can I do this using regex?
I want to remove the whole block of object that contains "UK".
I tried something like ^{.*"UK".*},
(starts with {
contains "UK"
ends with },)
But this is wrong and I can't figure out the correct way.
Any help would be appreciated.
In general, you should not be using regex alone to handle JSON content, even less so if that JSON be nested. If your JSON is always single level as you wrote above, then the following pattern might work:
\{[^}]*"UK"[^}]*\}
Replace the above by nothing, and do the find/replace in regex mode.
Demo

Convert XML document to JSON string preserving boolean types

I have XML files following an XSD and I need to transform them into JSON.
The files are typically like this
example.xml :
<object name="foo">
<values>one</values>
<values>two</values>
<values>three</values>
<param attr="2" value="true" />
</object>
Which translate into JSON to this
{
"name" : "foo",
"values" : [
"one",
"two",
"three"
],
"param" : {
"attr" : "2",
"value" : "true"
}
}
This is almost fine, except that I would like the data to be typed, so that param becomes :
"param" : {
"attr" : 2,
"value" : true
}
The XML files reference an XSD schema that defines the data type for each element or attribute, such as
<xs:attribute name="attr" type="xs:integer"
The XML to JSON transformation is done using XML::Simple to read the XML into a Perl hash and the JSON module is used to encode into JSON.
How could I do the same job but using the definitions from the XSD Schema to load the XML with the right type for each field?
I need to use the XSD because it may happen that text field are made of only numbers.
Well, the summary answer is - you can't do what you're trying to do, the way you're trying to do it.
XML is a 'deeper' structure than JSON, in that it has style sheets and attributes, so inherently you'll be discarding data in the transformation process. How much is acceptable to discard is always going to be a case by case sort of thing.
More importantly - both XML and JSON are designed with similar use cases - to be machine readable/parsable. Almost everything that can 'read' JSON programatically can also read XML because libraries for both are generally available.
And most importantly of all - Don't use XML::Simple as it's not "Simple" it's for "Simple" XML. Which yours isn't. Both XML::Twig and XML::LibXML are much better tools for almost any XML parsing job.
So really - what you need to do is backtrack a bit, and explain what you're trying to accomplish and why.
Failing that though - I would probably try a simplistic 'type test' within perl, using regex to detect if something is 'just' numeric, or 'just' boolean, and treat everything else as string.

Are JSON schemas necessary for defining the structure of a JSON?

I am asking this because I see that the current JSON schema draft (http://json-schema.org/) proposes to have the schema of JSON in the following way:
for the JSON :
{
"a":"abc"
"b": 123
}
the schema proposed in the draft is like
{
"type":"object"
"properties":{
"a": {"type":"string"}
"b": {"type":"integer"}
}
}
My question here is does the JSON itself not define its structure? Is a separate schema necessary?
The schema proposed by the draft validates the JSON that have the above structure and those JSON are always of the format
{
"a":"string"
"b": 1 (or some number)
}
So what is the need of a separate schema for JSON. We can simply use the JSON to define its structure also.
PS. I know that we can specify some restrictions on the values that the JSON can take through the schemas proposed in the draft, but from the point of view of defining structure of a JSON, are the proposed schemas necessary?
The JSON itself does not define the structure. For example, I could write:
{
"a": "string",
"b": "another string"
}
That's also valid JSON - but it's "differently structured" JSON, because "b" is now a string. But your API might only accept JSON with a particular structure, so although it's valid JSON, it's not the shape you need.
Now, do you need JSON Schema to define the structure of your JSON data? No. You could instead say:
The value must be an object. It must have two properties:
"a" - must be a string
"b" - must be an integer
A programmer could understand this very easily, with no squiggly brackets or anything.
However, there are advantages to having a machine-readable description of the format, because it lets you automate various things (e.g. testing, generating documentation, generating code/classes, etc.)
Edit: As pointed out in the comments, you can take the type information from some example data, and use that as a model for other data. In this case, you're basically using your example data as a super-basic schema.
For very simple constraints (basic type), this works. However, how would you say that "b" has to be an integer instead of a float? How do you say that "b" must be > 0? How do you say that "a" must not be the empty string ("")?
There are indeed tools that generate a basic JSON Schema from example data - however, the resulting schema usually requires a bit of tweaking to actually describe the format (e.g. min/max, required/optional properties, etc.).
Until now have never been necessary and in the short term I don't think it gets to be.
I personally like JSON because its simplicity, portability and flexibility.
I don't see even major brands using that schema so until now they don't take its serious.