Load thousand files dynamically using SSIS - ssis

One folder is having thousand files and each files will be loaded to different sql server tables. How to design a SSIS package to do the task ?
For ex:
File name: Location_12345.xlsx will be loaded into Location table
Employee_1233.txt will be loaded into Employee table
Department_123456.csv will be loaded into Department table

The answer is yes you can. Loop through the files in the folder and get whatever you need. A simple google search would give you everything you need. For example here

You will need a package for each flow. You defined a minimum of 3 in your question.
For example, if all Locations follow the same flow then:
Add a foreach and choose files to loop through.
Define the folder
Define the criteria (Location*.xlsx)
Set the full file path to a variable
Add an excel connection
Make an expression for that variable
Design your data flow.
Delay validation on Excel Source before running.
This is a sample. You will have to do this for each file type.

Related

SSIS Reuse Dynamic Flat File Desitination Filename

I have an SSIS package that creates a text flat file from data in a database table. Everything works perfect except that I need to capture the dynamic filename for use in another process. I've searched everywhere but haven't found anything close to what I need other than using a ForEach Loop to loop through the directory the file will be stored in. I can't do that because there's too many things that could go wrong. I'm currently creating the dynamic filename through variables and it contains a datetime stamp.
Is there a way that I can capture the file name when the file is created in the data flow task so I can use it in another process within the control flow task?
Thank you in advance!
John

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

How to read multiple flat files dynamic in nature at one go in SSIS

I have a folder which contains files like:
A_ddmmyyyy, b_ddmmyyyy and c_ddmmyyyy.
I need to read all these files for a date and again all these files for the next date present in the same folder. Also the number of files present in the folder varies (may contain data for three days or of five dates), but the date on the folder remains the same.
Is it possible?
You can create a Foreach Loop Container, inside the container create a data flow task that processes all the files in a specific folder. You create a flat file connection with the needed delimiter, and a variable that has the folder path for your files. And then as a last step to your tasks, create a File System Task to move the file to a processed or completed folder so that the your main folder empties out once the files are processed .
File name doesn't matter, you have to be certain how many kinds of schemas (number of columns, column names, types) of files in the folder. Say you have 3 schemas, then you will need to define 3 type of flat file connections. There are many ways to do the job, the easiest of i can think of is to use powershell to separate files of different schema into different folders, you have to know what file names are mapped to what schema, there may be a pattern or business rule. You then put your powershell in a execute process task to run the script. Then the following is simple, for each folder you create a package, inside is a for each container where you loop through the folder to load each file. Or you can have one package with three for each loop container to do the job.

Recursively navigate a directory generating dynamic xml files according to the current visited folder with SSIS

I need to visit a folder and all of its children with SSIS (SQL Server Integration Services). At the moment by setting the folder path into a variable after reading it, I able to loop through all the .txt files of the current folder and fill a pre-generated (with head info) xml file.
What I would need now is to be able to create one per each accessed folder, a new xml file (the beginning content will be always the same). Once I would be able to create it, as first action once a new folder is accessed, I can then simply apply the logic I developed so far.
However I am blocked at the moment, since within the loop where i read the files (with their full path) I cannot find a way to express "create the xml file if the accessed folder is new".
Assuming I understand the problem, you need to walk the entirety of a directory structure and for each folder you find, you need to create a base XML file. Then for each text file you find in that folder, you will perform some operation on the XML file. The trick being how do you only create the XML file once.
I would envision a process like this.
A script task that makes use of the System.IO.GetDirectories to populate a variable (directoryXML> that contains the folder structure, something like
<Dir>
<D>C:\ssisdata</D>
<D>C:\ssisdata\a</D>
<D>C:\ssidata\a\b</D>
</Dir>
Use a Foreach Nodelist Enumerator to shred that XML out into a variable (currentDirecotry).
You'd perform your one-time task of creating the XML file in currentDirectory.
Further using the currentDirectory variable as an expression on the Foreach File Enumerator (assign to Directory with a FileSpec of *.txt) you can then perform your task on all the files meeting that specification. Do not check the traverse subfolder option as that will not give the desired results.
This is a fairly high level approach to the problem as I'm assuming you have some familiarity with SSIS but the approach should be sound. Let me know if you have any particular sticking points.

How do I load Excel files selectively?

I have an SSIS package that needs to lookup two different types of excel files, type A and type B and load the data within to two different staging tables, tableA and tableB. The formats of these excel sheets are different and they match their respective tables.
I have thought of putting typeA.xls and typeB.xls in two different folders for simplicity(folder paths to be configureable). The required excel files will then be put here through some other application or manually.
What I want is to be able to have my dtsx package to scan the folder and pick the latest unprocessed file and load it ignoring others and then postfix the file name with '-loaded' (typeAxxxxxx-loaded.xls). The "-loaded" in the filename is how I plan to differentiate between the already loaded files and the ones yet to be loaded.
I need advice on:
a) How to check that configured folder for the latest file ie. without the '-loaded' in the filename and load it? ..and then after loading it, rename the same file in that configured folder with the '-loaded' postfixed.
b) Is this the best approach to doing this or is there a better way?
Thanks.
You can do it this way, but it might require several complex string expressions.
E.g. create a ForEach loop over .xls files, inside the loop add an empty script task, then a data flow to load this file. Connect them with a precedence constraint and make it conditional: precedence constraint expression will the check if file name does not end with -loaded.xls. You may either do it in script task or purely using SSIS expression on precedence constraint. Finally, add File System Task to rename the file. You may need to build new file name with another expression.
It might be easier to create two folders: Incoming for new unprocessed files, and Loaded for the files you've processed, and just move the .xls to this folder after processing without renaming. This will avoid the first conditional expression (and dummy script task), and simplify the configuration of File System task.
You can get the SQL File watcher Task and add it to your SSIS. I think this is a cleaner way to do what you want.
SQL File Watcher