Avro Schema: force to interpret value (map, array) as string - json

I want to convert JSON to Avro via NiFi. Unfortunately the JSON has complex types as values that I want to see as a simple string!
JSON:
"FLAGS" : {"FLAG" : ["STORED","ACTIVE"]}
How can I tell AVRO to simply store "{"FLAG" : ["STORED","ACTIVE"]}" or "[1,2,3,"X"]" as a string?
Thank you sincerely!

The JSON to Avro conversion performed in NiFi's ConvertJSONToAvro processor does not really do transformation in the same step. There is a very limited ability to transform based on the Avro schema, mostly omitting input data fields in the output. But it won't coerce a complex structure to a string.
Instead, you should do a JSON-to-JSON transformation first, then convert your summarized JSON to Avro. I think what you are looking for is a structure like this:
{
"FLAGS": "{\"FLAG\":[\"STORED\",\"ACTIVE\"]}"
}
NiFi's JoltTransformJSON, ExecuteScript processors are great for this. If your records are simple enough, maybe even a combination of EvaluateJsonPath $.FLAGS and ReplaceText { "FLAGS": "${flags:escapeJson()}" }.

Related

Read JSON data values

For example,
if the data in kafka toipc looks like this
{
"header": {
"name": "jake"
},
"body": {
"Data":"!#$%&&"
}
}
So how do I read the value "!#$%&&" from my consumer application? I need to process the data once I get that data
You'll need to consume the data using String Serde, JSON Serde, or define your own.
If you define your own, then you'd call value.getBody().getData(), like any other Java Object, where value is the argument from mapValues, peek, filter, etc. Kafka Streams DSL
For the others, the answer will depend on what JSON library you're using, but the answer isn't unique to Kafka, so read that library's documentation on parsing strings.
Here's one example of consuming using String Serde - https://github.com/confluentinc/kafka-streams-examples/blob/7.1.1-post/src/main/java/io/confluent/examples/streams/JsonToAvroExample.java#L118

Extracting a JSON out of a string using JSONPath

I have Json data as follows:
{
"template" : [
"{
"Id": "abc"
}"
]
}
I am using JSONPath to extract data from the Json above. I would like to extract the "Id" data from the Json using JsonPath.
The problem I see is, the data is being treated as a string and not as a Json as shown below.
"{
"Id": "abc"
}"
If there were no double-quotes I could have used JsonPath as follows:
$.template[0].Id
But due to the double-quotes, I am unable to access the "Id" data. I suspect there is a way to access this data using JsonPath-Expression but I am pretty much a novice here and no amount of research helped me out with a resolution.
How do I treat it as a Json and not as a string using JsonPath? Kindly help me out here.
JSON Path isn't going to be able to parse JSON that's encoded within a string. You need to perform three operations:
Get the string (use JSON Path or something else)
Parse the string as JSON.
Get the data you're looking for on that (JSON Path or something else)

How to convert XML to JSON with new JSON structure in Nifi?

I get different XMLs from web services. I want to convert this XML to JSON, but structure must be changed.
For example, I have XML structure like this;
<root>
<A attr="attr1">VAL</A>
<B attr="attr2">VAL</B>
</root>
And result of JSON that I want.
{
"root":{
"Items":[
{
"tag_name":"A",
"attr":"attr1",
"value":"VAL"
},
{
"tag_name":"B",
"attr":"attr2",
"value":"VAL"
}
]
}
}
How can I do this in Nifi? ConvertRecord or UpdateRecord? Also, how should read and write schema for this if record based processors may be used?
You can do it with a pure NiFi flow, the steps to do this are:
Convert the XML to JSON, this can be done with a ValidateRecord processor, you must define the schema of the json, so during this step you are going to check that the input data is ok.
Modify the JSON structure using the JoltTransform processor.

Clickhouse/Kafka: reading a JSON Object type into a field

I have this kind of data in a Kafka Topic:
{..., fields: { "a": "aval", "b": "bval" } }
If I create a Kafka Engine table, I get an error when using a field definition like this:
fields String
because it (correctly) doesn't recognize it as a String:
2018.07.09 17:09:54.362061 [ 27 ] <Error> void DB::StorageKafka::streamThread(): Code: 26, e.displayText() = DB::Exception: Cannot parse JSON string: expected opening quote: (while read the value of key fields): (at row 1)
As ClickHouse does not currently have a Map or JSONObject type, what would be the best way to work over it, provided I don't know in advance the name of the inner fields ("a" or "b" in the example - so I cannot see Nested structures helping)?
Apparently, at the moment ClickHouse does not support complex JSON parsing.
From this answer in ClickHouse Github:
Clickhouse uses quick and dirty JSON parser, which does not how to read complex deep structures. So it can't skip that field as it does not know where that nested structure ends.
Sorry. :/
So you should preprocess your json with some external tools, of you can contribute to Clickhouse and improve JSON parser.

Is there any tool to flatten and convert JSON schema or object in JSON format to display in plain object notation

I have a need to convert object in JSON format or JSON schema to something as follows:
{
"ArrayofObjects":[
"Item":{
"property":""
}
]
}
I want to convert and write it as:
ArrayObjects[].Item.property
So that I can represent the structure in one line conveying the structure of the object and talk about a property.
Actually I have two huge JSON schemas that I want to compare and talk about relation between them. I thought it is more convenient to represent them in this format ArrayObjects[].Item.property side by side.
Is there any tool that can achieve the same?