Passing a path as a parameter in Pentaho - parameter-passing

In a Job I am checking if the file that I want to read is available or not. If this csv exists I want to read the data and save them in a database table within a transformation.
This is what I have done so far:
1) I have create the job, 2) I have implemented some parameters, one of them with the path for the file, 3) I have indicated that I am going to pass this value to the transformation.
Now, the thing is, I am sure this is should be something very simple to implement, but even when I have follow some blogs, I have not succeeded with this part of the process. I've tried to follow this example:
http://diethardsteiner.blogspot.com.co/2011/03/pentaho-data-integration-scheduling-and.html
My question remains the same. How can I indicate to the transformation that it has to use the parameter that I am given him from the job?

You just mixed up the columns
Parameter should be the name of the parameter in the transformation you are running.
Value is the value you are passing.
Since you are passing a variable, and not a constant value you use the ${} syntax to indicate this.

Related

Azure DevOps Pipeline Json Variable Substitution - Microsoft.Hosting.Lifetime

How can I change the value of Logging:LogLevel:Microsoft.Hosting.Lifetime using a Json Variable Substitution task in Azure DevOps?
This article...
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/transforms-variable-substitution?view=azure-devops&tabs=Classic
...states:
If a variable name includes periods ("."), the transformation will attempt to locate the item within the hierarchy. For example, if the variable name is first.second.third, the transformation process will search for:
"first" : {
"second": {
"third" : "value"
}
}
as well as "first.second.third" : "value".
Neither of these would be able to target a nested value with periods (.) in the name? Right?
Neither of these would be able to target a nested value with periods
(.) in the name? Right?
Sorry but I'm afraid you can't be able to do that with JSON variable substitution for now. The official document has stated that the JSON variable substitution option doesn't support variables whose names contain periods. It's not supported by design, and it's already documented in the Notes.
As alternative workarounds:
You can define the variable name in another format, something like Logging:LogLevel:Microsoft_Hosting_Lifetime.
Try using Replace Token task to change the value of Logging:LogLevel:Microsoft.Hosting.Lifetime. This task should work for your scenario. For more details you can check this issue.
Also you can submit a feature request about JSON variable substitution option on our UserVoice site, which is our main forum for product suggestions. The product team would provide the updates if they view it. Thank you for helping us build a better Azure DevOps.

How do I best construct complex NiFi routing

I'm a total noob when it comes to NiFi - so please feel free to highlight any stupidity/ignorance.
I'm reading messages from a Kafka topic using NiFi.
Each message contains JSON that contains a field called Function and then a whole bunch of different fields, based on the Function. For example, if Function ="Login", you can expect a username and password field, but if Function = "Pay", you can expect "From", "To" and "Amount" fields.
I need to process each type of Function differently. So, basically, I want to read the message from Kafka, determine the function and then route the message, based on the function to the appropriate set of rules.
It sounds like this should be simple - but for one small complication. I have about 500 different types of Functions. So, I don't want to add a RouteOnAttribute node for each function.
Is there a better way to do this? If this was "real code", I suppose that I'm looking for the difference between an "if" statements and some sort of "switch/case" statement....
You would first use EvaluateJsonPath to extract the function into a flow file attribute, then RouteOnAttribute which would need 500 conditions added to it, and then connect each of those 500 conditions to whatever follow on processing is required. The only other thing you could do is implement a custom processor that handles the 500 conditions internally.

Azure | ADF | How to use a String variable to lookup a Key in an Object type Parameter and retrieve its Value

I am using Azure Data Factory. I'm trying to use a String variable to lookup a Key in a JSON array and retrieve its Value. I can't seem to figure out how to do this in ADF.
Details:
I have defined a Pipeline Parameter named "obj", type "Object" and content:
{"values":{"key1":"value1","key2":"value2"}}
Parameter definition
I need to use this pipeline to find a value named "key1" and return it as "value1"; "key2" and return it as "value2"... and so on. I'm planning to use my "obj" as a dictionary, to accomplish this.
Technically speaking, If i want to find the value for key2, I can use the code below, and it will be returned "value2":
#pipeline().parameters.obj.values.key2
What i can't figure out is how to do it using a variable (instead of hardcoded "key2").
To clear things out: I have a for-loop and, inside it, i have just a copy activity: for-each contents
The purpose of the copy activity is to copy the file named item().name, but save it in ADLS as whatever item().name translates to, according to "obj"
This is how the for-loop could be built, using Python: python-for-loop
In ADF, I tried a lot of things (using concat, replace...), but none worked. The simpliest woult be this:
#pipeline().parameters.obj.values.item().name
but it throws the following error:
{"code":"BadRequest","message":"ErrorCode=InvalidTemplate, ErrorMessage=Unable to parse expression 'pipeline().parameters.obj.values.item().name'","target":"pipeline/name_of_the_pipeline/runid/run_id","details":null,"error":null}
So, can you please give any ideas how to define my expression?
I feel this must be really obvious, but I'm not getting there.....
Thanks.
Hello fellow Pythonista!
The solution in ADF is actually to reference just as you would in Python by enclosing the 'variable' in square brackets.
I created a pipeline with a parameter obj like yours
and, as a demo, the pipeline has a single Set Variable activity that got the value for key2 into a variable.
This is documented but you need X-ray vision to spot it here.
Based on your comments, this is the output of a Filter activity. The Filter activity's output is an object that contains an array named value, so you need to iterate over the "output.value":
Inside the ForEach you reference the name of the item using "item().name":
EDIT BASED ON MORE INFORMATION:
The task is to now take the #item().name value and use it as a dynamic property name against a JSON array. This is a bit of a challenge given the limited nature of the Pipeline Expression Language (PEL). Array elements in PEL can only be referenced by their index value, so to do this kind of complex lookup you will need to loop over the array and do some string parsing. Since you are already inside a FOR loop, and nested FOR loops are not supported, you will need to execute another pipeline to handle this process AND the Copy activity. Warning: this gets ugly, but works.
Child Pipeline
Define a pipeline with two parameters, one for the values array and one for the item().name:
When you execute the child pipeline, pass #pipeline.parameters.obj.values as "valuesArray" and #item().name as "keyValue".
You will need several string parsing operations, so create some string variables in the Pipeline:
In the Child Pipeline, add a ForEach activity. Check the Sequential box and set the Items to the valuesArray parameter:
Inside the ForEach, start by cleaning up the current item and storing it as a variable to make it a little easier to consume.
Parse the object key out of the variable [this is where it starts to get a little ugly]:
Add an IF condition to test the value of the current key to the keyValue parameter:
Add an activity to the TRUE condition that parses the value into a variable [gets really ugly here]:
Meanwhile, back at the Pipeline
At this point, after the ForEach, you will have a variable (IterationValue) that contains the correct value from your original array:
Now that you have this value, you can use that variable as a DataSet parameter in the Copy activity.

Can an Input Operator be used in the middle of a DAG in Apache Apex

All examples of Apex say that the first operator of the DAG should be an input operator. Can this operator appear in the middle of the DAG somewhere.
Consider a case in which I have data to be fetched from the database, based on some data that has just been processed by a previous operator, this would mean that an input operator will come in the middle of the DAG somewhere.
According to the definition of an input operator it is one which does not have any input stream. But it also does the work of fetching data if a connector is used. So will it work if I fetch data somewhere in-between a DAG ?
This is an interesting use-case. You should be able to extend an input operator (say JdbcInputOperator since you want to read from a database) and add an input port to it. This input port receives data (tuples) from another operator from your DAG and updates the "where" clause of the JdbcInputOperator so it reads the data based on that. Hope that is what you were looking for.
Yes, it is possible. You may extend an existing InputOperator and add InputPort(s) to it. In this case, Apex platform will handle your operator as a generic operator and not call InputOperator.emitTuples(). It will be your extended operator responsibility to call super.emitTuples() or directly emit on the output port(s).
No, an input operator cannot be used in between the DAG.
As you have already pointed out, since there is no input stream, you will not be able to get data from previous operator for use with this operator.
For the example you pointed out, it would be better to write your own generic operator with an input stream which actually has similar functionality to the input operator where in it can read data from external source based on the data in the input stream.
Also, just a point to note :
If the query is too heavy, its better to have an asynchronous thread to query the database. This thread can write data to a queue from which the main thread can read the records and emit them on the output stream. This will ensure that the main operator thread is not blocked and the operator wont fail.

Gemfire pdxInstance datatype

I am writing pdxInstances to GemFire using the sequence: rabbitmq => springxd => gemfire.
If I put this JSON into rabbitmq {'ID':11,'value':5}, value appears as a byte value in GemFire. If I put {'ID':11,'value':500}, value appears as a word and if I put {'ID':11,'value':50000} it appears as an Integer.
A problem arises when I query data from GemFire and order them. For example, if I use a query such as select * from /my_region order by value it fails, saying it cannot compare a byte with a word (or byte with an integer).
Is there any way to declare the data type in JSON? Or any other method to get rid of this problem?
To add a bit of insight into this problem... in reviewing GemFire/Geode source code, it would seem it is not possible to configure the desired value type and override GemFire/Geode's default behavior, which can be seen in JSONFormatter.setNumberField(..).
I will not explain how GemFire/Geode involves the JSONFormatter during a Region.put(key, value) operation as it is rather involved and beyond the scope of this discussion.
However, one could argue that the problem is not necessarily with the JSONFormatter class, since storing a numeric value in a byte is more efficient than storing the value in an integer, especially when the value would indeed fit into a byte. Therefore, the problem is really that the Comparator used in the Query processor should be able to compare numeric values in the same type family (byte, short, int, long), upcasting where appropriate.
If you feel so inclined, feel free to file a JIRA ticket in the Apache Geode JIRA repository at https://issues.apache.org/jira/browse/GEODE-72?jql=project%20%3D%20GEODE
Note, Apache Geode is the open source "core" of Pivotal GemFire now. See the Apache Geode website for more details.
Cheers!
Your best bet would be to take care of this with a custom module or a groovy script. You can either write a custom module in Java to do the conversion and then upload the custom module into SpringXD, then you could reference your custom module like any other processor. Or you could write a script in Groovy and pass the incoming data through a transform processor.
http://docs.spring.io/spring-xd/docs/current/reference/html/#processors
The actual conversion probably won't be too tricky, but will vary depending on which method you use. The stream creation would look something like this when you're done.
stream create --name myRabbitStream --definition "rabbit | my-custom-module | gemfire-json-server etc....."
stream create --name myRabbitStream --definition "rabbit | transform --script=file:/transform.groovy | gemfire-json-server etc...."
It seems like you have your source and sink modules set up just fine, so all you need to do is get your processor module setup to do the conversion and you should be all set.