How do I load Excel files selectively? - ssis

I have an SSIS package that needs to lookup two different types of excel files, type A and type B and load the data within to two different staging tables, tableA and tableB. The formats of these excel sheets are different and they match their respective tables.
I have thought of putting typeA.xls and typeB.xls in two different folders for simplicity(folder paths to be configureable). The required excel files will then be put here through some other application or manually.
What I want is to be able to have my dtsx package to scan the folder and pick the latest unprocessed file and load it ignoring others and then postfix the file name with '-loaded' (typeAxxxxxx-loaded.xls). The "-loaded" in the filename is how I plan to differentiate between the already loaded files and the ones yet to be loaded.
I need advice on:
a) How to check that configured folder for the latest file ie. without the '-loaded' in the filename and load it? ..and then after loading it, rename the same file in that configured folder with the '-loaded' postfixed.
b) Is this the best approach to doing this or is there a better way?
Thanks.

You can do it this way, but it might require several complex string expressions.
E.g. create a ForEach loop over .xls files, inside the loop add an empty script task, then a data flow to load this file. Connect them with a precedence constraint and make it conditional: precedence constraint expression will the check if file name does not end with -loaded.xls. You may either do it in script task or purely using SSIS expression on precedence constraint. Finally, add File System Task to rename the file. You may need to build new file name with another expression.
It might be easier to create two folders: Incoming for new unprocessed files, and Loaded for the files you've processed, and just move the .xls to this folder after processing without renaming. This will avoid the first conditional expression (and dummy script task), and simplify the configuration of File System task.

You can get the SQL File watcher Task and add it to your SSIS. I think this is a cleaner way to do what you want.
SQL File Watcher

Related

Load thousand files dynamically using SSIS

One folder is having thousand files and each files will be loaded to different sql server tables. How to design a SSIS package to do the task ?
For ex:
File name: Location_12345.xlsx will be loaded into Location table
Employee_1233.txt will be loaded into Employee table
Department_123456.csv will be loaded into Department table
The answer is yes you can. Loop through the files in the folder and get whatever you need. A simple google search would give you everything you need. For example here
You will need a package for each flow. You defined a minimum of 3 in your question.
For example, if all Locations follow the same flow then:
Add a foreach and choose files to loop through.
Define the folder
Define the criteria (Location*.xlsx)
Set the full file path to a variable
Add an excel connection
Make an expression for that variable
Design your data flow.
Delay validation on Excel Source before running.
This is a sample. You will have to do this for each file type.

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

How to read multiple flat files dynamic in nature at one go in SSIS

I have a folder which contains files like:
A_ddmmyyyy, b_ddmmyyyy and c_ddmmyyyy.
I need to read all these files for a date and again all these files for the next date present in the same folder. Also the number of files present in the folder varies (may contain data for three days or of five dates), but the date on the folder remains the same.
Is it possible?
You can create a Foreach Loop Container, inside the container create a data flow task that processes all the files in a specific folder. You create a flat file connection with the needed delimiter, and a variable that has the folder path for your files. And then as a last step to your tasks, create a File System Task to move the file to a processed or completed folder so that the your main folder empties out once the files are processed .
File name doesn't matter, you have to be certain how many kinds of schemas (number of columns, column names, types) of files in the folder. Say you have 3 schemas, then you will need to define 3 type of flat file connections. There are many ways to do the job, the easiest of i can think of is to use powershell to separate files of different schema into different folders, you have to know what file names are mapped to what schema, there may be a pattern or business rule. You then put your powershell in a execute process task to run the script. Then the following is simple, for each folder you create a package, inside is a for each container where you loop through the folder to load each file. Or you can have one package with three for each loop container to do the job.

how to prevent SSIS package failure when no file(s) exist to import

I have an SSIS package with several data flow tasks. Each one imports a flat file into a table in my DB. I have created a connection manager for each underlying flat file. The package works just fine if all of the files exist. However, even if one of the files is missing, the entire package fails. I don't want this behavior. For whatever files that exist, I want my package to import them. For those that don't exist, I want SSIS to simply ignore them. At least one of the files will always exist. How do I achieve this behavior? I have seen some solutions that involve either scripts or file control tasks, but I'm not sure which is appropriate for my situation.
my solution is
1. make a Script Task for checking the path file:
SSIS Script task to check if file exists in folder or not
2. ValidateExternalMetadata set to False in the source properties
3. link the Script Task with next step if skip and create a Constrain and Variables connection with if the file exist

Access WMI query results in SSIS

I have a situation whereby I want to process files in an SSIS package but only files that are new and only files that match specific filename patterns.
Is it possible to use WMI to achieve this task by somehow looping through the resulset of a WMI query?
The WMI Data Reader task seems to be the closest contender but it can only write its results to a file (rather than to say a database table or in-memory recordset).
Has anyone had success doing this?
If you want to use the WMI Data Reader Task then the easiest solution would be to save the result to a file. Add a Data Flow Task that reads the file and inserts the data into the database.
However, another solution would be something like:
Add a Foreach Loop with an Foreach File Enumerator, you can use an expression for the filename patterns.
Process the files in a Data Flow Task
If you are allowed to move the files then use a File System Task to move the file to a different folder so it won't be processed again.
If you can't move the files then you need some other way to determine if the file is already processed. If you only need to watch for new files and not modified ones then you could keep a record of which file has been processed in the database, or add a script task to check the modified date of the file and compare it to the last processed date from the database.