Totally new to Talend ESB - esb

I'm completely brand new to Talend ESB (not so much Talend for data integration, but ESB totally.)
That being said, I'm trying to build a simple route that watches a specific file path and get the filename of any file dropped into it. Then it will pass that filename to the childjob (cTalendJob) and the child job will do something to the file.
I'm able to watch the directory, procure the filename itself and System.out.println the filename. but I can't seem to 'pass' it down to the child job. When it runs, the route goes into an endless loop.
Any help is GREATLY appreciated.

You must add a context parameter to your Talend job, and then pass the filename from the route to the job by assigning it to the parameter.
In my example I added a parameter named "Param" to my job. In the Context Param view of cTalendJob, click the + button and select it from the list of available parameters, and assign a value to it.
You can then do context.Param in your child job to use the filename.

I think you are making this more difficult than you need...
I don't think you need your cProcessor or cSetBody steps.
In your tRouteInput if you want the filename, then map "${header.CamelFileName}" to a field in your schema, and you will get the filename. Mapping "${in.body}" would give you the file contents, but if you don't need that you can just map the required heading. If your job would read the file as a whole, you could skip that step and just map the message body.
Also, check the default behaviour of the camel file component - it is intended to put the contents of the file into a message, moving the file to a .camel subdirectory once complete. If your job writes to the directory cFile is monitoring, it will keep running indefinitely, as it keeps finding a "new" file - you would want to write any updated files to a different directory, or a filename mask that isn't monitored by the cFile component.

Related

Using Apache Nifi to collect files from 3rd party Rest APi - Flow advice

I am trying to create a flow within Apache-Nifi to collect files from a 3rd party RESTful APi and I have set my flow with the following:
InvokeHTTP - ExtractText - PutFile
I can collect the file that I am after, as I have specified this within my Remote URL however when I get all of the data from said file it is outputting multiple (100's) of the same files to my output directory.
3 things I need help with:
1: How do I get the flow to output the file in a readable .csv rather than just a file with no ext
2: How can I stop the processor once I have all of the data that I need
3: The Json file that I have been supplied with gives me the option to get files from a certain date range:
https://api.3rdParty.com/reports/v1/scheduledReports/877800/1553731200000
Or I can choose a specific file:
https://api.3rdParty.com/reports/v1/scheduledReports/download/877800/201904/CTDDaily/2019-04-02T01:50:00Z.csv
But how can I create a command in Nifi to automatically check for newer files, as this process will be running daily and we will be looking at downloading a new file each day.
If this is too broad, please help me by letting me know so I can edit this post.
Thanks.
Note: 3rdParty host name has been renamed to comply with security - therefore links will not directly work. Thanks.
1) You change the filename of the flow file to anything you want using the UpdateAttribute processor. If you want to make it have a ".csv" extension then you can add a property named "filename" with a value of "${filename}.csv" (without the quotes when you enter it).
2) By default most processors have a scheduling strategy of timer-driver 0 seconds, which means keep running as fast as possible. Go to the configuration of the processor on the scheduling tab and configure the appropriate schedule, it sounds like you probably want CRON scheduling to schedule it daily.
3) You can use NiFi expression language statements to create dynamic time ranges. I don't fully understand the syntax for the API that you have to communicate with, but you could do something like this for the URL:
https://api.3rdParty.com/reports/v1/scheduledReports/877800/${now()}
Where now() would return the current timestamp as an epoch.
You can also format it to a date string if necessary:
${now():format('yyyy-MM-dd')}
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

KNIME - Execute a EXE program in a Workflow

I have a workflow Knime, in the middle I must execute an external program to create an Excel file.
Exists some node that allows me to achieve this? I don't need to put any input or output, only execute the program and wait to generate the Excel file (I require to use this Excel for the next nodes).
There are (at least) two “External Tool” nodes which allow running executables on the command line:
External Tool
External Tool (Labs)
In case that should not be enough, you can always go for a Java Snippet node. The java.lang.Runtime class should be your entry point.
It's could be used the External tool node. The node requires inputs and outputs... but, you can use a table creator node for input:
This create an empty table.
In the external tool node, you must include an Input file and Output file, depending on your request, this config could be meaningless but require to the Node works.
In this case, the external app creates a text with the result of the execution, so, in the initial table (Table creator node), will be read the file and get the information into Knime.

How to do File System Task in SSIS depending on Result of Data Flow

I'm writing a (what I thought to be a) simple SSIS package to import data from a CSV file into a SQL table.
On the Control Flow task I have a Data Flow Task. In that Data Flow Task I have
a Flat File Source "step",
followed by a Data Conversion "step",
followed by a OLE DB destination "step".
What I want to do is to move the source CSV file to a "Completed" folder or to a "Failed" folder based on the results of the Data Flow Task.
I see that I can't add a File System step inside the Data Flow Task, but I have to do it in the Control Flow tab.
My question is how do I do a simple thing like assign a value to a variable (I saw how to create variable and assign them a value at the bottom pane of Data Tools (2012)) depending of if the "step" succeeds or fails?
Thanks!
(You can tell by my question that I'm an SSIS rookie - and don't assume I can write a C# script, please)
I have used VB or C# scripts to accomplish this myself. Since you do not want to use scripts I would recommend using a different path for the project to flow. Have your success path lead to moving the file to completed and failure path lead to moving the file to failed. This keeps it simple and accomplishes what you are looking for.

Recursively navigate a directory generating dynamic xml files according to the current visited folder with SSIS

I need to visit a folder and all of its children with SSIS (SQL Server Integration Services). At the moment by setting the folder path into a variable after reading it, I able to loop through all the .txt files of the current folder and fill a pre-generated (with head info) xml file.
What I would need now is to be able to create one per each accessed folder, a new xml file (the beginning content will be always the same). Once I would be able to create it, as first action once a new folder is accessed, I can then simply apply the logic I developed so far.
However I am blocked at the moment, since within the loop where i read the files (with their full path) I cannot find a way to express "create the xml file if the accessed folder is new".
Assuming I understand the problem, you need to walk the entirety of a directory structure and for each folder you find, you need to create a base XML file. Then for each text file you find in that folder, you will perform some operation on the XML file. The trick being how do you only create the XML file once.
I would envision a process like this.
A script task that makes use of the System.IO.GetDirectories to populate a variable (directoryXML> that contains the folder structure, something like
<Dir>
<D>C:\ssisdata</D>
<D>C:\ssisdata\a</D>
<D>C:\ssidata\a\b</D>
</Dir>
Use a Foreach Nodelist Enumerator to shred that XML out into a variable (currentDirecotry).
You'd perform your one-time task of creating the XML file in currentDirectory.
Further using the currentDirectory variable as an expression on the Foreach File Enumerator (assign to Directory with a FileSpec of *.txt) you can then perform your task on all the files meeting that specification. Do not check the traverse subfolder option as that will not give the desired results.
This is a fairly high level approach to the problem as I'm assuming you have some familiarity with SSIS but the approach should be sound. Let me know if you have any particular sticking points.

SSIS - "switchable" file output for debug?

In an SSIS data-flow task, I'm using a Multicast transform at a key part of the flow which I want to hang a File Output destination off.
This, in itself, is no problem to do. However I only want output in the file if I enable it; i.e., I'd be using it for debugging the data if the flow fails unexpectedly and it's not immediately obvious from the default log message output why this occured.
My initial thought was to create a File Output whose output file was obtained from a variable, and by default, the variable would contain 'nul' - i.e., the Windows bit-bucket - which I could override through configuration in the event of needing to dig further.
Unfortuantly this isn't working: the File Output complains saying that "The filename is a device or contains invalid characters". So it looks like I can't use the bit-bucket.
Is anyone aware of a way to make output "switchable"? This would make enabling debug a less risky proposition than editing the package and dropping a File Output in directly.
I suppose I could have a Conditional Split off the multi-cast which basically sends output if a variable is set to some given value, but this seems overly messy, I'll be poking other options, but if anyone has any suggestions/solutions, they'd be welcome.
I'd go for the conditional split, redirecting rows to the konesans trash destination adaptor if your variable wasn't set, otherwise send to your file.