Can an Input Operator be used in the middle of a DAG in Apache Apex - apache-apex

All examples of Apex say that the first operator of the DAG should be an input operator. Can this operator appear in the middle of the DAG somewhere.
Consider a case in which I have data to be fetched from the database, based on some data that has just been processed by a previous operator, this would mean that an input operator will come in the middle of the DAG somewhere.
According to the definition of an input operator it is one which does not have any input stream. But it also does the work of fetching data if a connector is used. So will it work if I fetch data somewhere in-between a DAG ?

This is an interesting use-case. You should be able to extend an input operator (say JdbcInputOperator since you want to read from a database) and add an input port to it. This input port receives data (tuples) from another operator from your DAG and updates the "where" clause of the JdbcInputOperator so it reads the data based on that. Hope that is what you were looking for.

Yes, it is possible. You may extend an existing InputOperator and add InputPort(s) to it. In this case, Apex platform will handle your operator as a generic operator and not call InputOperator.emitTuples(). It will be your extended operator responsibility to call super.emitTuples() or directly emit on the output port(s).

No, an input operator cannot be used in between the DAG.
As you have already pointed out, since there is no input stream, you will not be able to get data from previous operator for use with this operator.
For the example you pointed out, it would be better to write your own generic operator with an input stream which actually has similar functionality to the input operator where in it can read data from external source based on the data in the input stream.
Also, just a point to note :
If the query is too heavy, its better to have an asynchronous thread to query the database. This thread can write data to a queue from which the main thread can read the records and emit them on the output stream. This will ensure that the main operator thread is not blocked and the operator wont fail.

Related

How do I best construct complex NiFi routing

I'm a total noob when it comes to NiFi - so please feel free to highlight any stupidity/ignorance.
I'm reading messages from a Kafka topic using NiFi.
Each message contains JSON that contains a field called Function and then a whole bunch of different fields, based on the Function. For example, if Function ="Login", you can expect a username and password field, but if Function = "Pay", you can expect "From", "To" and "Amount" fields.
I need to process each type of Function differently. So, basically, I want to read the message from Kafka, determine the function and then route the message, based on the function to the appropriate set of rules.
It sounds like this should be simple - but for one small complication. I have about 500 different types of Functions. So, I don't want to add a RouteOnAttribute node for each function.
Is there a better way to do this? If this was "real code", I suppose that I'm looking for the difference between an "if" statements and some sort of "switch/case" statement....
You would first use EvaluateJsonPath to extract the function into a flow file attribute, then RouteOnAttribute which would need 500 conditions added to it, and then connect each of those 500 conditions to whatever follow on processing is required. The only other thing you could do is implement a custom processor that handles the 500 conditions internally.

How do I identify this JSON-like data structure?

I just came across a JSON wannabe that decides to "improve" it by adding datatypes... of course, the syntax makes it nearly impossible to google.
a:4:{
s:3:"cmd";
s:4:"save";
s:5:"token";
s:22:"5a7be6ad267d1599347886";
}
Full data is... much larger...
The first letter seems to be a for array, s for string, then the quantity of data (# of array items or length of string), then the actual piece of data.
With this type of syntax, I currently can't Google meaningful results. Does anyone recognize what god-forsaken language or framework this is from?
Note: some genius decided to stuff this data as a single field inside a database, and it included critical fields that I need to perform aggregate functions on. The rest I can handle if I can get a way to parse this data without resorting to ugly serial processing.
If this can be parsed using MSSQL 2008 that results in a view, I'll throw in a bounty...
I would parse it with a UDF written in .NET - https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
You can either write a custom aggregate function to parse and calculate these nutty fields, or a scalar value function that returns the field as JSON.
I'd probably opt for the latter in the name of separation of concerns.

Passing a path as a parameter in Pentaho

In a Job I am checking if the file that I want to read is available or not. If this csv exists I want to read the data and save them in a database table within a transformation.
This is what I have done so far:
1) I have create the job, 2) I have implemented some parameters, one of them with the path for the file, 3) I have indicated that I am going to pass this value to the transformation.
Now, the thing is, I am sure this is should be something very simple to implement, but even when I have follow some blogs, I have not succeeded with this part of the process. I've tried to follow this example:
http://diethardsteiner.blogspot.com.co/2011/03/pentaho-data-integration-scheduling-and.html
My question remains the same. How can I indicate to the transformation that it has to use the parameter that I am given him from the job?
You just mixed up the columns
Parameter should be the name of the parameter in the transformation you are running.
Value is the value you are passing.
Since you are passing a variable, and not a constant value you use the ${} syntax to indicate this.

replace value of JSON field using SPEL in SpringXD

I get json in stream and try to replace value of one field in payload.
transform --expression=payload.replaceAll() does not fit my needs as it treat payload as String.
I think of such operation
transform --expression=#jsonPath(payload,'$.result.grupy[*].lp')='new_value'
but it does not perform this assigment. How construct SPEL/JsonPath expression to set new value?
I need something like payload.setField('lp','new_value')
It's not possible to do that; you would need a custom processor module, or a custom SpEL function, to make changes like that.
The #jsonPath function simply returns an element from the JSON.
Not sure why payload.replace() expression doesn't fit your requirements, but the #jsonPath() SpEL-fuction is for extraction the data from JSON, not for modification.
From other side you misunderstood a bit a concept of transformer component. It returns a new object, but doesn't modify the request.
To achieve your requirements you should take a look to the Content Enricher, which exactly is intended to modify the incoming payload and return it as a reply.
To simplify your life you should take a look to the <int:object-to-map-transformer> to have ability to change field from the next <int:enricher> component.
Right, for this purpose you should write your own processor module.

Gemfire pdxInstance datatype

I am writing pdxInstances to GemFire using the sequence: rabbitmq => springxd => gemfire.
If I put this JSON into rabbitmq {'ID':11,'value':5}, value appears as a byte value in GemFire. If I put {'ID':11,'value':500}, value appears as a word and if I put {'ID':11,'value':50000} it appears as an Integer.
A problem arises when I query data from GemFire and order them. For example, if I use a query such as select * from /my_region order by value it fails, saying it cannot compare a byte with a word (or byte with an integer).
Is there any way to declare the data type in JSON? Or any other method to get rid of this problem?
To add a bit of insight into this problem... in reviewing GemFire/Geode source code, it would seem it is not possible to configure the desired value type and override GemFire/Geode's default behavior, which can be seen in JSONFormatter.setNumberField(..).
I will not explain how GemFire/Geode involves the JSONFormatter during a Region.put(key, value) operation as it is rather involved and beyond the scope of this discussion.
However, one could argue that the problem is not necessarily with the JSONFormatter class, since storing a numeric value in a byte is more efficient than storing the value in an integer, especially when the value would indeed fit into a byte. Therefore, the problem is really that the Comparator used in the Query processor should be able to compare numeric values in the same type family (byte, short, int, long), upcasting where appropriate.
If you feel so inclined, feel free to file a JIRA ticket in the Apache Geode JIRA repository at https://issues.apache.org/jira/browse/GEODE-72?jql=project%20%3D%20GEODE
Note, Apache Geode is the open source "core" of Pivotal GemFire now. See the Apache Geode website for more details.
Cheers!
Your best bet would be to take care of this with a custom module or a groovy script. You can either write a custom module in Java to do the conversion and then upload the custom module into SpringXD, then you could reference your custom module like any other processor. Or you could write a script in Groovy and pass the incoming data through a transform processor.
http://docs.spring.io/spring-xd/docs/current/reference/html/#processors
The actual conversion probably won't be too tricky, but will vary depending on which method you use. The stream creation would look something like this when you're done.
stream create --name myRabbitStream --definition "rabbit | my-custom-module | gemfire-json-server etc....."
stream create --name myRabbitStream --definition "rabbit | transform --script=file:/transform.groovy | gemfire-json-server etc...."
It seems like you have your source and sink modules set up just fine, so all you need to do is get your processor module setup to do the conversion and you should be all set.