How to convert XML to JSON with new JSON structure in Nifi? - json

I get different XMLs from web services. I want to convert this XML to JSON, but structure must be changed.
For example, I have XML structure like this;
<root>
<A attr="attr1">VAL</A>
<B attr="attr2">VAL</B>
</root>
And result of JSON that I want.
{
"root":{
"Items":[
{
"tag_name":"A",
"attr":"attr1",
"value":"VAL"
},
{
"tag_name":"B",
"attr":"attr2",
"value":"VAL"
}
]
}
}
How can I do this in Nifi? ConvertRecord or UpdateRecord? Also, how should read and write schema for this if record based processors may be used?

You can do it with a pure NiFi flow, the steps to do this are:
Convert the XML to JSON, this can be done with a ValidateRecord processor, you must define the schema of the json, so during this step you are going to check that the input data is ok.
Modify the JSON structure using the JoltTransform processor.

Related

ConvertRecord processor issue with XML to JSON conversion Apache Nifi

I have a requirement where I am converting data from an API to JSON format. The output of the API is initially in XML, so I am using XMLReader controller service to read the XML data, and JSONRecordSetWriter controller service to convert it to JSON format in Apache Nifi 1.9.2.
When I use ConvertRecord processor with the same controller services, my output merely shows the avro schema and not the data expected. I have tried out many options like using AvroSchemaRegistry controller service, but only the schema is seen and null values are passed. Can anyone explain this behavior?
XML flowfile output:
<field1 value="AAAA"/>
<field2 value="BBBB"/>
<field3 value="male"/>
JSON output:
[ {
"field1" : null,
"field2" : null,
"field3" : null
} ]
The documentation specifies that "Records are expected in the second level of XML data, embedded in an enclosing root tag." Your input file appears to be a list of XML tags with no root tag enclosing them.
You could use ReplaceText to wrap the XML in a root tag, then the XMLReader should parse the fields as expected.

How can customize the conversion of json to xml?

I have a JSON object such as { name: "Tyler", age: 10, dogName: "Spot", dogAge: "40" }
Using NiFi, I want to convert it to XMLin a format similar to
<person>
<name>Tyler</tyler>
<age>10</age>
<dog>
<dogName>Spot</dogName>
<dogAge>40</dogAge>
</dog>
</person>
I am using a ConvertRecord processor. I am using JsonTreeReader for the Record Reader and XMLRecordSetWriter for the Record Writer. I am able to read in the JSON just fine. Is there a way to customize XMLRecordSetWriter to be able to output the xml in a specific format? Right now all I can do is turn the above json object into the following:
<name>Tyler</tyler>
<age>10</age>
<dogName>Spot</dogName>
<dogAge>40</dogAge>
It just directly converts JSON to XML. Is there a way to customize this? Is there an alternative to XMLRecordSetWriter that I could use?

Using NiFi to ingest json data in HBase

I'm trying to write a pretty simple XML file stored in HDFS to HBase. I'd like to transform the XML file into json format and create one row in HBase for each element within the json array. See following the XML structure:
<?xml version="1.0" encoding="UTF-8"?>
<customers>
<customer customerid="1" name="John Doe"></customer>
<customer customerid="2" name="Tommy Mels"></customer>
</customers>
And see following the desired HBase output rows:
1 {"customerid"="1","name"="John Doe"}
2 {"customerid"="2","name"="Tommy Mels"}
I've tried out many different processors for my flow but this is what I've got now: GetHDFS -> ConvertRecord -> SplitJson -> PutHBaseCell. The ConvertRecord is working fine and is converting the XML file to json format properly but I can't manage to split the json records into 2. See following what I've managed to write in HBase so far (with a different processors combination):
c5927a55-d217-4dc1-af04-0aff743 column=person:rowkey, timestamp=1574329272237, value={"customerid":"1","name":"John Doe"}\x0A{
cfe4e "customerid":"2","name":"Tommy Mels"}
For the splitjson processor I'm using the following jsonpathexpression: $.*
As of now, I'm getting an IllegalArgumentException in the PutHBaseCell processor stating that the Row length is 0, see following the PutHBaseCell processor settings:
Any hints?
I think the issue is that SplitJson isn't working properly since technically the content of your flow file is multiple json documents, with one per-line. I think SplitJson would expect them to be inside an array like:
[
{"customerid"="1","name"="John Doe"},
{"customerid"="2","name"="Tommy Mels"}
]
One option is to use SplitRecord with a JsonTreeReader which should be able to understand the json-per-line format.
Another option is to avoid splitting all together and go from ConvertRecord -> PutHBaseRecord with a JsonTreeReader.

Is there any tool to flatten and convert JSON schema or object in JSON format to display in plain object notation

I have a need to convert object in JSON format or JSON schema to something as follows:
{
"ArrayofObjects":[
"Item":{
"property":""
}
]
}
I want to convert and write it as:
ArrayObjects[].Item.property
So that I can represent the structure in one line conveying the structure of the object and talk about a property.
Actually I have two huge JSON schemas that I want to compare and talk about relation between them. I thought it is more convenient to represent them in this format ArrayObjects[].Item.property side by side.
Is there any tool that can achieve the same?

Avro Schema: force to interpret value (map, array) as string

I want to convert JSON to Avro via NiFi. Unfortunately the JSON has complex types as values that I want to see as a simple string!
JSON:
"FLAGS" : {"FLAG" : ["STORED","ACTIVE"]}
How can I tell AVRO to simply store "{"FLAG" : ["STORED","ACTIVE"]}" or "[1,2,3,"X"]" as a string?
Thank you sincerely!
The JSON to Avro conversion performed in NiFi's ConvertJSONToAvro processor does not really do transformation in the same step. There is a very limited ability to transform based on the Avro schema, mostly omitting input data fields in the output. But it won't coerce a complex structure to a string.
Instead, you should do a JSON-to-JSON transformation first, then convert your summarized JSON to Avro. I think what you are looking for is a structure like this:
{
"FLAGS": "{\"FLAG\":[\"STORED\",\"ACTIVE\"]}"
}
NiFi's JoltTransformJSON, ExecuteScript processors are great for this. If your records are simple enough, maybe even a combination of EvaluateJsonPath $.FLAGS and ReplaceText { "FLAGS": "${flags:escapeJson()}" }.