How do you get flat file name and perform a row count from multiple flat files with different columns in SSIS? - ssis

I'm trying to get all the file names from a folder directory along with their row counts. (Also file size in bytes if possible) I am using Microsoft Visual Studio 2010 Shell. Here's what I've done so far:
I have created a Foreach Loop Container, set the Enumerator to Foreach File Enumerator and Expressions to a variable to the folder I want to loop over. I left the Files section with *.* and asked to retrieve Name Only. I have changed the Variable Mappings to a New Variable called FullFilePath, Container is Package, Value type is String and Value: is blank.
I then added a Data Flow to the Loop. Added a flat file source, row count, and OLE DB Destination. I changed the Flat file Source properties expression to the same Folder Variable in the Foreach Loop Container Expression. I added the Variable RecordCount to the Row Count function (Int32, value 0). The OLE DB Destination creates a new table with the name OLE DB Destination.
The next step is a Execute SQL Task that does and Insert Into DBO.FileData (FileName,RowCount) Values (?,?). I set 2 parameter mappings - 1) Variable Name from the Foreach Loop Container, FullFilePath and Data Type VarChar, 2) Variable from Row Count, RecordCount and Data Type Long.
I then have another Execute SQL Task that drops the table created by the data flow task. The problem is that with all the these step the Package still does not complete. It actually gets hung up and fails on the pre-execute. It says:
Warning: Access is denied. Error: Cannot open the datafile 'FullFilePath' Error: Flat File Source failed the pre-execute phase and returned error code 0xC020200E.
Anything you see I could be doing wrong? Let me know if pictures would help.

So I figured this out finally. In order to loop over all of the files with varying headers and column counts I decided to change the option in the Flat File Source to unselect "File contains headers." Doing this allowed the all the files to have the same #1 Column, which by default is Column 0(the first column in all of my files is some sort of a numeric field or ID). I was able to map this through row count and insert into a SQL table. Then I was able to finish the Foreach Loop and scribe the file name and row count into another SQL table to record the counts. It is however taking a really really really long time, i.e. it has been running for over 14 hours and it has only counted through 13 files. Granted some files are 250K+ rows but I wouldn't think it would take this long.

Related

SSIS Delete files from folder based on a selection from SQL Database

I have an issue regarding a deletion of files based on a selection.
So, I have an SQL table in which there is a column with the file name.
What I have to do is take the file name from there and delete the physical file with that name from a specified path.
What I have tried is to execute an sql task with the selection of full file paths (including file names) which need to be deleted and store them in an object variable and then try to do a ForEach Loop Container with a 'from variable enumerator' to take those results and delete the files using a file system task, but I don't think that's the right approach.
The execution of the package ends in success, but it ends at the ForEach Loop Container without reaching (executing) the file system task, so the files are not being deleted.
Select statement (also added the full path of the file as a column in order for the file system task to know where the physical file is stored) :
select file_full_path from t1
where flag = 'YES'
Do you guys know a solution to this?
What you can do is set the loop to Foreach ADO Enumerator , so your process would be:
1: SQL Task - map file_full_path to an object variable, say 'filesdel'
2: Foreach Loop set to Foreach ADO Enumerator
3: Map the the the for loop items to a string variable, say 'filename':
4: Use the string variable in the File System Task

How to get row count from flat file destination?

I am redirecting error rows from ole db destinationn table to flat file destination, here i need error rows count which are getting redirect to flat file and if count(error rows)>50 then my ssis package should fail.
And the data loaded into the table should get rollback if count(error rows)>50
How can i achive this?
Use the row count task in the data flow, chain it to your error constraint from the OLEDB task. You might want to configure the row count to write to a variable since you'll be using row count in an expression (in which case you should create a variable called User::RowCount). Finally you can evaluate the condition Count(User::RowCount) > 0 by using a conditional split - all within the data flow task

Passing a variable as Parameter

I have a scenario where i need to pass an text file or excel file column as an parameter to my Sql Query in SSIS Package.
My Text or excel file has a column called Policy_no and it has more than 1000+ policy_no(EX: 12358685). i have an Sql script *select * from main_table where policy_no = ?*. And that that '?' has to be come from my package variable(txt or excel ).
Instead of manually writing script for each and every policy, how can we achieve this through SSIS.
Thanks
Assuming you want to loop through each row in your file and run the query against each individual value, you can use a Data Flow task to read your text file and load the policy numbers in an ADO Recordset (declared as a package variable). Next, you'd use a Foreach Loop Container to iterate through the recordset, loading each policy number in turn into a second variable and then executing your query and doing whatever other work is needed.
See Use a Recordset Destination in MSDN for an overview and example.
You can use EXECUTE SQL TASK (Connect Excel with OLE DB Connection) to get "Policy_no" data from Excel, then store the result into a variable, say:policyNoGroup, whose data type should be Object, then use For Each Loop to loop though variable policyNoGroup, see the example: http://www.codeproject.com/Articles/14341/Using-the-Foreach-ADO-Enumerator-in-SSIS

SSIS - Load flat files, save file names to SQL Table

I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.

how to create a SSIS package which creates three text files, using same variables but the textfile is only created when the correct data is found?

There are only 3 files that can be created : "File_1", "File_2" and "File_3". The same variable name is used in each instance (User::FileDirectory) and (User::File_name), but because the actual value of the variable changes, a new file is created.However the files are only created if there is data to go into the file. i.e. if there are no records to populate the file, it will not be created at all. When the files are created, the date the file was created should also be added to the filename. eg: File1_22102011.txt
Ok if the above was a little confusing, the following is how it works,
All the files use the same variable, but it is reset before each file is created.
• So it populates a result set in memory with the first sql selection (ID number, First_Name and Main_Name). It sets the file variable to “File_1”. If there are records in the result set, it creates and writes to this filename.
• Then it creates a new result set with the second selection(Contract No). It sets the variable to "File_2". If there are records in this new result set, a new file will be created from the variable(which now has a new value)
• Finally a third result set is created (Contract_no, ExperianNo, Entity_ID_Number, First_Name, Main_Name), and the file variable is set to "File_3". Again if there are records in the result set, then this file will be created and written to.
I have worked on a few methods to achieve this but they all have failed, So little help will be greatly appreciated.
While what you have works, I think it'd be rather painful to maintain.
I would approach it as 3 sequence containers running in parallel. Each container would have a data flow and two file tasks hanging off it based on success of the parent and the value of row count variable. If the row count variable is 0, delete the file. If it's greater than 0, rename it to File_n
As you can see, I have a container for the first file. The data flow creates an output a.txt file. Based on the value of the variable #RowCount1, it will either delete the empty file or rename it to File_1.
Each data flow would look like a source query, a row count transformation and a file destination with a temporary name (a.txt, b.txt, c.txt). As a file is always created, even if it's empty, we will need to delete or rename it afterwards which will be accomplished based on the file operation tasks.
In my opinion, this approach will be cleaner as it will allow you to test and debug each item in a cleaner manner rather than dealing with an in-memory dataset.