Does nifi supports Transfering single Flow file to multiple Relationships in a Processor? - minify

I am exploring options for Transferring single Flowfile to two or more Relationships in my custom processor,i didn't found any help in documentation for this, Does this feature supported by NIFI?
Example Code:
session.transfer(flowFile, REL_SUCCESS_1);
session.transfer(flowFile, REL_SUCCESS_2);

you can use session.clone() method:
FlowFile flowFile2 = session.clone(flowFile);
session.transfer(flowFile, REL_SUCCESS_1);
session.transfer(flowFile2, REL_SUCCESS_2);
to create a full clone of the flow file.
the content and attributes will be the same...

Related

Processing json data from kafka using structured streaming

I want to convert incoming JSON data from Kafka into a dataframe.
I am using structured streaming with Scala 2.12
Most people add a hard coded schema, but if the json can have additional fields, it requires changing the code base every-time, which is tedious.
One approach is to write it into a file and infer it with but I rather avoid doing that.
Is there any other way to approach this problem?
Edit: Found a way to turn a json string into a dataframe but cant extract it from the stream source, it is possible to extract it?
One way is to store the schema itself in the message headers (not in the key or value).
Though, this increases message size, it will be easy to parse the JSON value without the need for any external resource like a file or a schema registry.
New messages can have new schemas while at the same time old messages can still be processed using their old schema itself, because the schema is within the message itself.
Alternatively, you can version the schemas and include an id for every schema in the message headers (or) a magic byte in the key or value and infer the schema from there.
This approach is followed by Confluent Schema registry. It allows you to basically go through different versions of same schema and see how your schema has evolved over time.
Read the data as string and then convert it to map[string,String], this way you can process the any json without even knowing its schema
based on JavaTechnical answer , the best approach would be to use a schema registry and
avro data instead of json, there is no going around hardcoding a schema (for now).
include your schema name and id as a header and use them to read the schema from the schema registry.
use the from_avro fucntion to turn that data into a df!

JSON logfile into Solr via Apache nifi

I'm tring to read a json log file and insert into solr collection using apache nifi.logfile is in following format(one json object perline)
{"#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
{"#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
{ "#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
I was able to load the file and split by lines using different processes. How can i proceed further ?
You can use the PutSolrContentStream processor to write content to Solr from Apache NiFi. If each flowfile contains a single JSON record (and you should ensure you are splitting the JSON correctly even if it covers multiple lines, so examine SplitJSON vs. SplitText), each will be written to Solr as a different document. You can also use MergeContent to write in batches and be more efficient.
Bryan Bende wrote a good article on the Apache site on how to use this processor.

Apache Nifi - Extract Attributes From Avro

I'm trying to get my head around on extracting attributes from Avro and JSON. I'm able to extract attributes from JSON by using EvaluateJsonPath processor. I'm trying to do the same on Avro, but i'm not sure whether it is achievable.
Here is my flow, ExecuteSQL -> SplitAvro -> UpdateAttribute
UpdateAttribute is the processor where i want to extract the attributes. Please find below snapshot of UpdateAttribute processor,
So, my basic question is, could we extract attributes form Avro? If yes, please provide me the right approach. Or is it necessary to use ConvertAvroToJSON always before extracting the attributes?
Currently, there is no way in NiFi to extract attributes directly from Avro (there is not yet an AvroPath like XPath for XML or JsonPath for JSON) so as you said you can use ConvertAvroToJSON before extracting the attributes.
Alternatively, I wrote a Groovy script for use in an ExecuteScript processor, it takes "Avro path" values as dynamic properties (each starting with avro.path and whose value is really JsonPath), does the conversion of Avro to JSON in memory, and requires you download and point to the Avro JARs. I can post it here if you are interested, but really its only advantage is to maintain the flow file content in Avro, and although it might be annoying, you could use ConvertAvroToJson -> EvaluateJsonPath -> ConvertJsonToAvro as the workaround.

How to validate against a definition within a schema?

I want to have a single schema file with many definitions.
I then want to validate messages against different definitions within that schema.
Is there a way of doing this with a JSON Schema?
I'm trying two NodeJS validators to see which works best:
https://github.com/geraintluff/tv4
and https://github.com/tdegrunt/jsonschema
Apologies if this is not logically possible - I'm new to JSON Schema.
Cross-posted to https://github.com/geraintluff/tv4/issues/170 and https://github.com/tdegrunt/jsonschema/issues/94
I found what I needed in the API section.
tv4.addSchema() and tv4.getSchema(...#subschema_id')

Best practices to produce JSON from NotesViews or DocumentCollections

I'm working on a custom control that will be fed by JSON content and I'm trying to find the best approach to produce and consume it.
Let say the JSON could be from:
Notes View (all documents)
Notes View (subset of documents based on a category or filter)
Notes Document Collection (from database.Search or database.FTSearch)
What I have on my mind is to define some Custom Properties where I can define:
URL that produces the JSON
Object
etc.
So far I'm considering:
REST Service control from ExtLib
XAgent that produces JSON
Domino URL ?ReadViewEntries and OutputFormat=JSON
Does anyone knows if the JSON object loaded in memory has a size limit?
Any suggestion will be appreciated.
Definitely go for the REST Service control from the Extension Library, offers by far the best combination of flexibility vs performance vs development time.
Matt
What about creating the JSON in the view itself and then just read the column values? http://www.eknori.de/2011-07-23/formula-magic/
If you want to parse the json object using ssjs, you can fetch it using an URLConnection and put the resulting object into a repeat control using the eval statement.