Is it possible to import a zipped csv file directly into Talend Open Studio for Big Data 6.3.0.2, if not would there be any work arounds?
Yes, but in two steps.
Step 1. In the file management section of the components is a tFileUnarchive component. You can use this by first placing a tFileList component and connecting it to the tFileUnarchive with the iterate connector.
Step 2. The unzipped files will be moved to an output directory you specify. You can then use another tFileList to read those files in, within the same job.
If you find Talend moves quicker than the unzipping process or files are locked you can put in a tSleep and wait a few seconds or minutes. I use the tSleep in cases where we need a pause in processing to allow system resources to catch up with the job.
Related
I have multiple text file in my source folder which I have to import to SQL Server using SSIS and after import all file has to moved to Archive folder. Can any one suggest me the easiest method?
The first link below provides the basics of using SSIS to archive imported files. The second link provides similar information with additional detail of renaming the archived files with a date/timestamp tag.
Loop through files loading them and archiving them one-by-one
Archive files and add timestamp
Hope this helps.
Daily SQL Job will start at 12.00. It will run a package that fetch a CSV file from a folder(using for each loop container in ssis).
Suppose if there no files in that specific folder. You should not run the package until the csv files load into that folder? How we can do this using SSIS .
Please help me on this.
Have the job run on a schedule. If there are no files in the folder, it won't do anything. The next time it runs, if the files are there, it will process them.
Using a script task, you can check if the file exists in that location with that file extension and then build an expression in the precedence constraint editor. Set the evaluation operation to expression and constraint value to success. Something like the one shown in the screenshot below.
I am trying to automate weekly generation of a database. As a first step in this process, I need to obtain a set of files from network location M:\. The process is as follows:
Delete any possibly remaining old source files from my local folder (REMOVE_OLD_FILES).
Obtain the names of the required files using regular expressions (GET_FILES).
Copy the files from the network location to my local folder for further processing (COPY/MOVE FILES)
Step 3 is where I run into trouble, I frequently receive the below error:
Error processing files. Exception : org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///M:/FILESOURCE/FILENAME.zip" because it is a not a file.
However, when I manually locatae the 'erroneous' file on the network location and try to open or copy it, there are no problems. If I then re-run the Spoon job, no errors occur for this file (although the next file might lead to an error).
So far, I have verified that steps 1 and 2 run correctly: more specifically, there are no errors in the file names returned from step 2.
Obviously, I would prefer not having to manually open all the files first to ensure that Spoon can correctly copy them. Does anyone have an idea what might be causing this behaviour?
For completeness, below are the parameters selected in the COPY/MOVE FILES step.
I was facing same issue with different clients and finally i tried with some basic approach and it got resolved. It might help in your case as well.
Also, other users can follow this rule.
Just try this: Create all required folder with Spoon Job "Create a Folder" and inactive/delete those hops from your job or transformation once your folders are created.
This is because, user you are using to delete the file/s is not recognized as Windows User. Once your folder is in place you can remove "Create a Folder" steps from your Job.
The path to the file is wrong. If you are running spoon in a Windows environment you should use the Windows format for filepaths. Try changing from
"file:///M:/FILESOURCE/FILENAME.zip"
To
"M:\FILESOURCE\FILENAME.zip"
By the way, it will only work if M: is an actual drive in the machine. If you want to access a file in the network you should use the network path to the shared folder, this way:
"\\MachineName\M$\FILESOURCE\FILENAME.zip"
or
"\\MachineName\FILESOURCE\FILENAME.zip"
If you try to access a file in a network mounted drive it won't work.
I have an SSIS package with several data flow tasks. Each one imports a flat file into a table in my DB. I have created a connection manager for each underlying flat file. The package works just fine if all of the files exist. However, even if one of the files is missing, the entire package fails. I don't want this behavior. For whatever files that exist, I want my package to import them. For those that don't exist, I want SSIS to simply ignore them. At least one of the files will always exist. How do I achieve this behavior? I have seen some solutions that involve either scripts or file control tasks, but I'm not sure which is appropriate for my situation.
my solution is
1. make a Script Task for checking the path file:
SSIS Script task to check if file exists in folder or not
2. ValidateExternalMetadata set to False in the source properties
3. link the Script Task with next step if skip and create a Constrain and Variables connection with if the file exist
Task: I'm trying to iterate through excel files using foreachloop editor container.
I was successful until i had different extensions meaning it's works as long as file extension is xls or xlsx but not both together.
Problem: I get errors when i try to iterate files with extensions xls and xlsx. Cannot acquire connection to connectionmanager.
For instance: I have abc.xls and agh.xlsx in a folder and i have trouble iterating thru files using Foreachloop editor.I think i understand & know why it's happening but can i write a script to do it or how to complete this task successfully.
Any ideas..
You will need to add 2 For Each Loop containers to iterate through files. the 1st FLC will process only .xls (or .xlsx) and the second FLC would process only .xlsx (or .xls). Other than that, I dont think writing a script would be of any help. But I could be wrong.
Presuming all xls file have the same format and all xlsx files have the same format...
What you also could do is using one FOREACH loop to loop through all Excel files... then add a dummy task (empty Script Task or Sequence Container) and connect it to two Data Flow Tasks. One for XLS and one for XLSX. Then add expressions on the lines between the dummy tasks and data flow tasks where you check the extensions. Something like:
LOWER(RIGHT(#[User::Filepath],4))==".xls"
LOWER(RIGHT(#[User::Filepath],4))=="xlsx"