How do I best construct complex NiFi routing - json

I'm a total noob when it comes to NiFi - so please feel free to highlight any stupidity/ignorance.
I'm reading messages from a Kafka topic using NiFi.
Each message contains JSON that contains a field called Function and then a whole bunch of different fields, based on the Function. For example, if Function ="Login", you can expect a username and password field, but if Function = "Pay", you can expect "From", "To" and "Amount" fields.
I need to process each type of Function differently. So, basically, I want to read the message from Kafka, determine the function and then route the message, based on the function to the appropriate set of rules.
It sounds like this should be simple - but for one small complication. I have about 500 different types of Functions. So, I don't want to add a RouteOnAttribute node for each function.
Is there a better way to do this? If this was "real code", I suppose that I'm looking for the difference between an "if" statements and some sort of "switch/case" statement....

You would first use EvaluateJsonPath to extract the function into a flow file attribute, then RouteOnAttribute which would need 500 conditions added to it, and then connect each of those 500 conditions to whatever follow on processing is required. The only other thing you could do is implement a custom processor that handles the 500 conditions internally.

Related

receive Excel data and turn into objects to format a JSON

I have this solution that helps me creating a Wizard to fill some data and turn into JSON, the problem now is that I have to receive a xlsx and turn specific data from it into JSON, not all the data but only the ones I want which are documented in the last link.
In this link: https://stackblitz.com/edit/xlsx-to-json I can access the excel data and turn into object (when I print document.getElementById('output').innerHTML = JSON.parse(dataString); it shows [object Object])
I want to implement this solution and automatically get the specified fields in the config.ts but can't get to work. For now, I have these in my HTML and app-component.ts
https://stackblitz.com/edit/angular-xbsxd9 (It's probably not compiling but it's to show the code only)
It wasn't quite clear what you were asking, but based on the assumption that what you are trying to do is:
Given the data in the spreadsheet that is uploaded
Use a config that holds the list of column names you want returned in the JSON when the user clicks to download
based on this, I've created a fork of your sample here -> Forked Stackbliz
what I've done is:
use the map operator on the array returned from the sheet_to_json method
Within the map, the process is looping through each key of the record (each key being a column in this case).
If a column in the row is defined in the propertymap file (config), then return it.
This approach strips out all columns you don't care about up front. so that by the time the user clicks to download the file, only the columns you want are returned. If you need to maintain the original columns, then you can move this logic somewhere more convenient for you.
I also augmented the property map a little to give you more granular control over how to format the data in the returned JSON. i.e. don't treat numbers as strings in the final output. you can use this as a template if it suites your needs for any additional formatting.
hope it helps.

How do I identify this JSON-like data structure?

I just came across a JSON wannabe that decides to "improve" it by adding datatypes... of course, the syntax makes it nearly impossible to google.
a:4:{
s:3:"cmd";
s:4:"save";
s:5:"token";
s:22:"5a7be6ad267d1599347886";
}
Full data is... much larger...
The first letter seems to be a for array, s for string, then the quantity of data (# of array items or length of string), then the actual piece of data.
With this type of syntax, I currently can't Google meaningful results. Does anyone recognize what god-forsaken language or framework this is from?
Note: some genius decided to stuff this data as a single field inside a database, and it included critical fields that I need to perform aggregate functions on. The rest I can handle if I can get a way to parse this data without resorting to ugly serial processing.
If this can be parsed using MSSQL 2008 that results in a view, I'll throw in a bounty...
I would parse it with a UDF written in .NET - https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
You can either write a custom aggregate function to parse and calculate these nutty fields, or a scalar value function that returns the field as JSON.
I'd probably opt for the latter in the name of separation of concerns.

Can an Input Operator be used in the middle of a DAG in Apache Apex

All examples of Apex say that the first operator of the DAG should be an input operator. Can this operator appear in the middle of the DAG somewhere.
Consider a case in which I have data to be fetched from the database, based on some data that has just been processed by a previous operator, this would mean that an input operator will come in the middle of the DAG somewhere.
According to the definition of an input operator it is one which does not have any input stream. But it also does the work of fetching data if a connector is used. So will it work if I fetch data somewhere in-between a DAG ?
This is an interesting use-case. You should be able to extend an input operator (say JdbcInputOperator since you want to read from a database) and add an input port to it. This input port receives data (tuples) from another operator from your DAG and updates the "where" clause of the JdbcInputOperator so it reads the data based on that. Hope that is what you were looking for.
Yes, it is possible. You may extend an existing InputOperator and add InputPort(s) to it. In this case, Apex platform will handle your operator as a generic operator and not call InputOperator.emitTuples(). It will be your extended operator responsibility to call super.emitTuples() or directly emit on the output port(s).
No, an input operator cannot be used in between the DAG.
As you have already pointed out, since there is no input stream, you will not be able to get data from previous operator for use with this operator.
For the example you pointed out, it would be better to write your own generic operator with an input stream which actually has similar functionality to the input operator where in it can read data from external source based on the data in the input stream.
Also, just a point to note :
If the query is too heavy, its better to have an asynchronous thread to query the database. This thread can write data to a queue from which the main thread can read the records and emit them on the output stream. This will ensure that the main operator thread is not blocked and the operator wont fail.

Passing a path as a parameter in Pentaho

In a Job I am checking if the file that I want to read is available or not. If this csv exists I want to read the data and save them in a database table within a transformation.
This is what I have done so far:
1) I have create the job, 2) I have implemented some parameters, one of them with the path for the file, 3) I have indicated that I am going to pass this value to the transformation.
Now, the thing is, I am sure this is should be something very simple to implement, but even when I have follow some blogs, I have not succeeded with this part of the process. I've tried to follow this example:
http://diethardsteiner.blogspot.com.co/2011/03/pentaho-data-integration-scheduling-and.html
My question remains the same. How can I indicate to the transformation that it has to use the parameter that I am given him from the job?
You just mixed up the columns
Parameter should be the name of the parameter in the transformation you are running.
Value is the value you are passing.
Since you are passing a variable, and not a constant value you use the ${} syntax to indicate this.

JSON output to the browser -> providing an order

So I've read that you cannot expect a default order when requesting json. I've seen this in action making a call to a little api that I built, that will return a jumbled, random order of elements each time I make a different call.
How does a site like ticketfly's api ( call it here http://www.ticketfly.com/api/events/upcoming.json?venueId=57 ) always ensure that the json returned is in a specific order?
The event ids always first, etc.
Thanks for shedding some light on the situation.
If you are in control of the endpoint API then you can hardcode the order in which you render the properties. Though I have to ask why exactly do you need the JSON properties in a particular order? You will finally be accessing the properties via there property names so the order in which they appear in the JSON should not ideally matter.
EDIT : Since your bosses insist on this (what can one say now?):
You can try and see if any of the following suits your needs:
Try hardcoding the display order in the view's representation. This means you will need to echo/print each property name explicitly in the view script. In PHP it could be something like echo $variable_representing_json["id"]; and so forth. Note that with this approach you needn't change the original JSON representation.
If you want the original JSON representation to be changed then depending on how you are doing the process it varies in difficulty:
If it's string concatenation that you are using to represent the json then hard-code the order in which the json properties get concatenated in the string.
In some languages the display order of properties is actually a representation of the order in which the properties were defined. In simple words if $var is an empty json representation then you should define $var["id"] = {some_val} first to display it first.
If you are using a framework for processing the JSON data it may have its own quirks irrespective of how you define your representation. In such cases you will have to try and see if you can work around the issue or if it gives any helper methods.