How to specify key for kafka producer in apache nifi? - json

I have simple pipeline using apache nifi and i want to publish some messages in kafka topic using existing kafka puplisher processor.
The problem is how to specify kafka key using apache nifi expression language?
I tired something like ${message:jsonPath('$.key')} but, of course, i got an error because object message does not exist.
I also tried to use filename object which is something like a default object name for input messages, but it didn't help
Using another kafka publisher processor it is possible by setting message key field property, but what about PublishKafka processor?

NiFi expression language can only reference flow file attributes, and cannot directly reference the content (this is done on purpose).
So if you want to use the value of a field from your json document as the key, then you need to first use another processor like EvaluateJsonPath to extract the value of that field into a flow file attribute.
Lets say you have a field "foo" in your json document, you might use EvaluateJsonPath with destination to set to "flow file attributes" and then add a dynamic property like:
foo = $.foo
Then in PublishKafka set the key property to ${foo}.
Keep in mind this only makes sense if you have a single json document per flow file, otherwise if you have multiple then it is unclear what the key is since you can only have one "foo" attribute for the flow file, but many "foo" fields in the content of the flow file.

Related

Apache NiFi: Changing Date and Time format in csv

I have a csv which contains a column with a date and time. I want to change the format of the date-time column. The first 3 rows of my csv looks like the following.
Dater,test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11
20011018182036,,,,,166366183,,,,,,
20191018182037,,27,94783564564,,162635463,817038655446,,,0,,
I want to change the csv to look like this.
Dater,test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11
2001-10-18-18-20-36,,,,,166366183,,,,,,
2019-10-18-18-20-37,,27,94783564564,,162635463,817038655446,,,0,,
How is this possible?
I tried using the UpdateRecord Processor.
My properties look like this:
But this approach doesn't work since the data gets routed as a failure from the UpdateRecord Processor. Suggest me a method to complete the task.
I was able to accomplish this using the UpdateRecord Processor. The expression language I used is ${field.value:toDate('yyyyMMddHHmmss'):format('yyyy-MM-dd HH:mm:ss')}.
Just this didn't work since every time, the data was routed towards the failure path from the UpdateRecord Processor.
To fix this error I changed the configuration of the CSVRecordSetWriter. The Schema Access Strategy must be changed to Use String Fields from Header. This is by default Use Schema Name Property
Strategy: use UpdateRecord to manipulate the timestamp value using expression language:
${field.value:toDate():format('ddMMyyyy')}
Flow:
GenerateFlowFile:
UpdateRecord:
Setup reader and writer to inherit schema. Include header line. Leave other properties untouched.
Result:
However this solution might not satisfy you because of a strange problem. When you format the date like that:
${field.value:toDate():format('dd-MM-yyyy')}
ConvertRecord routes to the failure relationship:
Type coercion does not work properly. Maybe it is a bug. I could not find a solution for this problem.

NiFi non-Avro JSON Reader/Writer

It appears that the standard Apache NiFi readers/writers can only parse JSON input based on Avro schema.
Avro schema is limiting for JSON, e.g. it does not allow valid JSON properties starting with digits.
JoltTransformJSON processor can help here (it doesn't impose Avro limitations to how the input JSON may look like), but it seems that this processor does not support batch FlowFiles. It is also not based on the readers and writers (maybe because of that).
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form
{"myprop":"myval","12345":"12345",...}
{"myprop":"myval2","12345":"67890",...}
and transform it to other JSON structure, e.g. defined by JSON schema, and e.g. using JSON Patch transformation, without writing my own processor?
Update
I am using Apache NiFi 1.7.1
Update 2
Unfortunately, #Shu's suggestion did work. I am getting same error.
Reduced the case to a single UpdateRecord processor that reads JSON with numeric properties and writes to a JSON without such properties using
myprop : /data/5836c846e4b0f28d05b40202
mapping. Still same error :(
it does not allow valid JSON properties starting with digits?
This bug NiFi-4612 fixed in NiFi-1.5 version, We can use AvroSchemaRegistry to defined your schema and change the
Validate Field Names
false
Then we can have avro schema field names starting with digits.
For more details refer to this link.
Is there a way to read arbitrary valid batch JSON input, e.g. in multi-line form?
This bug NiFi-4456 fixed in NiFi-1.7, if you are not using this version of NiFi then we can do workaround to create an array of json messages with ,(comma delimiter) by using.
Flow:
1.SplitText //split the flowfile with 1 line count
2.MergeRecord //merge the flowfiles into one
3.ConvertRecord
For more details regards to this particular issues refer to this link(i have explained with the flow).

Kafka connection transformations. Parse input string and get a record key

I use a simple file source reader
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
File content is a simple JSON object in each line. I found that there is a way to replace a record key and use transformations to do this, like
# Add the `id` field as the key using Simple Message Transformations
transforms=InsertKey
# `ValueToKey`: push an object of one of the column fields (`id`) into the key
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=ip
But I got an error
Only Struct objects supported for [copying fields from value to key],
found: java.lang.String
Is there a way to parse string json and get a key from there like I can do with Flume and regex_extractor?
When using Transformation on SourceConnector, the transformation is done on the List<SourceRecord> that is returned by the SourceConnector.poll()
In your case, the FileStreamSourceConnector reads the lines of the file and puts each line as a String in the SourceRecord object. Therefore, when the transformation gets the SourceRecord, it only sees it as a String and does not know the structure of the object.
To solve this problem,
Either you modify the FileStreamSourceConnector code so that it returns the SourceRecord with a valid Struct and Schema of your input json String. You can use the Kafka's SchemaBuilder class for this.
Or in case you're consuming this data in sink connector, you can have KafkaConnect convert it to JSON by setting following config on the sink connector and then do the transformations on the sink connector.
"value.converter":"org.apache.kafka.connect.json.JsonConverter"
"value.converter.schemas.enable": "false"
If you go with the 2nd option, don't forget to put these configs on your SourceConnector.
"value.convertor":"org.apache.kafka.connect.storage.StringConverter"
"value.converter.schemas.enable": "false"
there is a way to replace a record key
There is a separate transform called org.apache.kafka.connect.transforms.ReplaceField$Key
InsertKey will take a value and attempt to insert into a Struct/Map, but you seem to be using a String key

How to read the values of attributes in Apache NiFi

I'm working on a sample NiFi flow where I get a Json file as input. I use EvaluateJsonPath processor to get the value of the desired path. I've set the destination of EvaluateJsonPath as "flow-file attribute" and I added new properties with the required JsonPath. For ex: Property name: username, value: $.input.username. Now, I will be needing this value in the next processor. So I want to know which processor should I use to read the attributes of the flow-file.
You don't need a special processor to read the attributes of a FlowFile.
If this is your attribute key/value pair.
username : $.input.username
You can read that value like below in any processor property that supports Expression Language.
${username}

Deserialize an anonymous JSON array?

I got an anonymous array which I want to deserialize, here the example of the first array object
[
{ "time":"08:55:54",
"date":"2016-05-27",
"timestamp":1464332154807,
"level":3,
"message":"registerResourcePath ('', '/sap/bc/ui5_ui5/ui2/ushell/resources/')",
"details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":"URL prefixes set to:","details":"","component":"sap.ui.ModuleSystem"},
{"time":"08:55:54","date":"2016-05-27","timestamp":1464332154808,"level":3,"message":" (default) : /sap/bc/ui5_ui5/ui2/ushell/resources/","details":"","component":"sap.ui.ModuleSystem"}
]
I tried deserializing using CL_TREX_JSON_SERIALIZER, but it is corrupt and does not work with my JSON, here is why
Then I tried /UI2/CL_JSON, but it needs a "structure" that perfectly fits the object given by the JSON Object. "Structure" means in my case an internal table of objects with the attributes time, date, timestamp, level, messageanddetails. And there was the problem: it does not properly handle references and uses class description to describe the field assigned to the field-symbol. Since I can not have a list of objects but only a list of references to objects that solution also doesn't works.
As a third attempt I tried with the CALL TRANSFORMATION as described by Horst Keller, but with this method I was not able to read in an anonymous array, and here is why
My major points:
I do not want to change the JSON, since that is what I get from sap.ui.log
I prefere to use built-in functionality and not a thirdparty framework
Your problem comes out not from the anonymity of array, but from the awkwardness of SAP JSON (De)serializer, which doesn't respect double quotes, which enclose JSON attributes. The issue is thoroughly described in this answer.
If you don't want to change your JSON on-the-fly, the only way you have is to change CL_TREX_JSON_DESERIALIZER class like this.
/UI5/CL_JSON_PARSER parses JSONs with unknown format.
Note that it's got "for internal use" written on it so many times that you probably should take it seriously and clone its code to fixate it.