ETL Using SSIS for csv files - configuration

I am new to SSIS. I am assigned a POC development. Please help. Requirement is there are 'n' number of countries and each country will load 27 files in to 'n' number of folders. So I need to create a SSIS package to fetch the .csv files from the location and load them in staging and target tables. All countries will load same 27 files. How do I set the file connection manager dynamically. and how do i set package configuration to run it dynamically. each run should ensure the 27 files of one country gets processed. only then it has to execute the next country files. Everything has to be automated. That is at run time the files to be fetched has to be configured for a single country. can this be done? Somebody please help. I am from webmethods back ground and this is totally new.

I had faced this kind of problem in the past and did solve this issue with Creating two Foreach Loop Container 1) Loop the folder and 2) loop the file dynamically.You can modify this logic accordingly to your requirements. I have listed 2 references which will give you step by step process.
How to loop through Excel files and load them into a database using SSIS package?
http://www.codeproject.com/Tips/378129/Dynamically-Configure-Excel-in-Foreach-Loop-Contai

Related

SSIS How to create an expression that runs a task if a specific predecessor task has run

I currently have an ETL package that downloads files and then loads them into a database.
When I have downloaded the files and processed them, I place the files into an archive folder. However, as I truncate the staging table each time I run the process if any new files are downloaded, I need to reload the archive files in a separate process.
I have 2 foreach containers, 1 loads the new files whilst the new container loads the archived files.
What I want to do is create an expression on the archive foreach loop that will only run if the SQL task that truncates the table is completed.
Do I create a variable of type boolean that changes the value to 1 if the task is run and use that in my expression? If so how could I achieve this?

Load thousand files dynamically using SSIS

One folder is having thousand files and each files will be loaded to different sql server tables. How to design a SSIS package to do the task ?
For ex:
File name: Location_12345.xlsx will be loaded into Location table
Employee_1233.txt will be loaded into Employee table
Department_123456.csv will be loaded into Department table
The answer is yes you can. Loop through the files in the folder and get whatever you need. A simple google search would give you everything you need. For example here
You will need a package for each flow. You defined a minimum of 3 in your question.
For example, if all Locations follow the same flow then:
Add a foreach and choose files to loop through.
Define the folder
Define the criteria (Location*.xlsx)
Set the full file path to a variable
Add an excel connection
Make an expression for that variable
Design your data flow.
Delay validation on Excel Source before running.
This is a sample. You will have to do this for each file type.

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

How to load data from csv file in ssis using for each loop container

Daily SQL Job will start at 12.00. It will run a package that fetch a CSV file from a folder(using for each loop container in ssis).
Suppose if there no files in that specific folder. You should not run the package until the csv files load into that folder? How we can do this using SSIS .
Please help me on this.
Have the job run on a schedule. If there are no files in the folder, it won't do anything. The next time it runs, if the files are there, it will process them.
Using a script task, you can check if the file exists in that location with that file extension and then build an expression in the precedence constraint editor. Set the evaluation operation to expression and constraint value to success. Something like the one shown in the screenshot below.

Execute SSIS task when a given list of files exist

I have a working SSIS task that executes once in a month and runs trough a given path, iterating though various XML files and inserting their contents on an SQL Server database.
On the last meeting with the area who determines the rules for that SSIS task, it was stated that the task can only run after all the files expected are present on the path. The files in question are in a number of 35. Their name is TUPSTATUS_SXX.xml, where XX estates for a number from 01 to 35 (35 telecommunication sectors here in Brazil). Those files are generated by the telecom companies that operate each one of those sectors.
So, my question is: how to run an SSIS task only after all the 35 files are present on our directory?
Instead of doing a busy wait on the folders with your SSIS process running continuously, why not set up a FileSystemWatcher which will trigger on file system conditions. Use it to invoke your SSIS package. you can implement this watcher in either a service(if required for a longer time), or a simple application .
Another way is to have a simple application/script to check for the count, which will be invoked from task scheduler.
You could use a Script Task that will count the files in the folder, and only execute further down the pipeline when 35 files exist. Although if the files are large and being transferred by FTP, the final file may exist, but not have fully transferred at that point.