I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.
Related
I have two files that are named like this:
CustomerReport(08022021-08032021)
ComparingReport(08022021-08032021)
I need to load the CustomerReport to a table and the ComparingReport to another table.
I tried for each loop container but I cant think of how the expression will be to pickup the file.
Im thinking of something like Customer*.csv where the * acts like a wild card but that didnt work. What can I do in this case?
Here key of the answer is to use Foreach Loop and Conditional Split.
Don't pay attention on errors, because I don't have your CSV files and tables in DB!
Create a new variable FileName - string data type
Add foreach loop and set like on screenshot
Collection Tab:
Variable Mappings Tab:
Add Data Flow Task into Foreach Loop container
Drag elements from toolbox like on a image
Flat File Source connect to one of your CSV file using Flat File Connection Manager and on connection manager, in properties > expressions, set for ConnectionString a variable FileName
Set Conditional Split like on a image
Expression for Customer is:
LEFT(
SUBSTRING(#[User::FileName],FINDSTRING((#[User::FileName]),"Customer",1),100),
FINDSTRING(SUBSTRING(#[User::FileName],FINDSTRING((#[User::FileName]),"Customer",1),100),"Report",1) - 1
) == "Customer"
Connect Conditional Split to OLE DB Destination's
NOTE: I can't run package as I said on top of this answer, Pay attention to Conditional Split, but this is the way how you need to find if part of a string is into whole string.
Any help is much appreciated. I am trying to create an SSIS package to loop through files in the folder and get the Path + filename and finally execute the stored proc with parameter as path+filename. I am not sure how to get the path+filename and insert the into the Stored proc as parameter. I have attached the screenshot for your reference:
Looks like you have the right idea in general and the link #Speedbirt186 provided has some good details but it sounds like there are a couple of nuances that I thought I might point out in regards to flow and variables.
The foreach loop can assign the entire path or the file name or file name & extension to a variable. The latter will be the most help in your case if you don't want to add a script task to split the Filename from the path. If you start by adding 5 variables to your project it will make it a little easier. 1 will be the Source Directory Path, another the Destination (Archive) Directory Path, and then 1 to hold the File Name and Extension assigned by the for each loop. Then 2 additional dynamic variables that simply combine the source directory and file name to get the source full path and the destination with file name to get the destination full path.
Next make sure you set up your database and Excel file connections. In your Excel file connection after setting it up go to Expressions in the properties window and set the "Connection String" property to SourceFullPath. This will tell the connection to change the file path at every iteration of your loop.
Now you just need to setup your loop etc. Add the fore each loop container setting a directory, filter, and choose File Name and Extension.
Now in the expression box on the collection page set the directory property to be that of your Source Directory variable.
The last part of the Fore each loop is to set your variable mappings to store the file name in your variable. so go to that tab choose your file name variable and set index to 0.
At this point you can add your data flow and setup your import just like you would with a normal file (note your default value for your file name parameter should be to an actual file with the structure you will want to import).
After your data flow drop in your Execute SQL task and set it up how you need. here is an example of direct input and you can see an easy way to reference a parameter is simply a question mark (?).
Next in your sql task setup your parameter mapping by adding in the details you need such as:
Now you are on to your file task. Drop your file task and setup as you desire, but choose your destination and source full path variables to tell the task which file to move.
that's it your are done. there is 1 more thing to note though. The way you have your precedence set in the image you posted you show going from your data flow to your sql and to your file task simultaneously. If your stored procedure relies on your file you may want to put it after your sql task. You can always change the constraint options to "completion" if you want to move the file even if your stored proc fails.
What you want to do is to create a variable in your package, call it something like Filename. In the Edit window of the Foreach you can configure that variable to be set (on the Variable Mappings page- set index to 0).
To create a variable, you will need to have the Variables window showing. Use the View menu to show it if it's not currently open.
Then when calling your stored procedure you can pass the then current value of the variable as a parameter.
This link might help: https://www.simple-talk.com/sql/ssis/ssis-basics-introducing-the-foreach-loop-container/
I have a scenario where i need to pass an text file or excel file column as an parameter to my Sql Query in SSIS Package.
My Text or excel file has a column called Policy_no and it has more than 1000+ policy_no(EX: 12358685). i have an Sql script *select * from main_table where policy_no = ?*. And that that '?' has to be come from my package variable(txt or excel ).
Instead of manually writing script for each and every policy, how can we achieve this through SSIS.
Thanks
Assuming you want to loop through each row in your file and run the query against each individual value, you can use a Data Flow task to read your text file and load the policy numbers in an ADO Recordset (declared as a package variable). Next, you'd use a Foreach Loop Container to iterate through the recordset, loading each policy number in turn into a second variable and then executing your query and doing whatever other work is needed.
See Use a Recordset Destination in MSDN for an overview and example.
You can use EXECUTE SQL TASK (Connect Excel with OLE DB Connection) to get "Policy_no" data from Excel, then store the result into a variable, say:policyNoGroup, whose data type should be Object, then use For Each Loop to loop though variable policyNoGroup, see the example: http://www.codeproject.com/Articles/14341/Using-the-Foreach-ADO-Enumerator-in-SSIS
There are only 3 files that can be created : "File_1", "File_2" and "File_3". The same variable name is used in each instance (User::FileDirectory) and (User::File_name), but because the actual value of the variable changes, a new file is created.However the files are only created if there is data to go into the file. i.e. if there are no records to populate the file, it will not be created at all. When the files are created, the date the file was created should also be added to the filename. eg: File1_22102011.txt
Ok if the above was a little confusing, the following is how it works,
All the files use the same variable, but it is reset before each file is created.
• So it populates a result set in memory with the first sql selection (ID number, First_Name and Main_Name). It sets the file variable to “File_1”. If there are records in the result set, it creates and writes to this filename.
• Then it creates a new result set with the second selection(Contract No). It sets the variable to "File_2". If there are records in this new result set, a new file will be created from the variable(which now has a new value)
• Finally a third result set is created (Contract_no, ExperianNo, Entity_ID_Number, First_Name, Main_Name), and the file variable is set to "File_3". Again if there are records in the result set, then this file will be created and written to.
I have worked on a few methods to achieve this but they all have failed, So little help will be greatly appreciated.
While what you have works, I think it'd be rather painful to maintain.
I would approach it as 3 sequence containers running in parallel. Each container would have a data flow and two file tasks hanging off it based on success of the parent and the value of row count variable. If the row count variable is 0, delete the file. If it's greater than 0, rename it to File_n
As you can see, I have a container for the first file. The data flow creates an output a.txt file. Based on the value of the variable #RowCount1, it will either delete the empty file or rename it to File_1.
Each data flow would look like a source query, a row count transformation and a file destination with a temporary name (a.txt, b.txt, c.txt). As a file is always created, even if it's empty, we will need to delete or rename it afterwards which will be accomplished based on the file operation tasks.
In my opinion, this approach will be cleaner as it will allow you to test and debug each item in a cleaner manner rather than dealing with an in-memory dataset.
I will be creating flatfiles and based on the data in the batch, it might be necessary to split the data into an undetermined number of files.
I can make the connection string dynamic with an expression, but that is only evaluated when the package starts. I'd like to change that expression to include a '-a' or '-b' in the filename.
Alternately, if I have to create new connection manager objects at run time on demand, how do I go about that?
First determine your naming scheme for the output files and come up with an expression formula in your head
Put the Data Flow Task in a loop.
Within this Data Flow Task, define the source and destination. Destination being the Flat File Destination. Read the source and add some derived column that sets a value to another variable that you'll later use in the Filename expression.
Connect the Flat File Destination to a Connection Manager. First define some path but then add an Expression to define a Connection String based on your File Name scheme (Path + Filename + extension). Now this Filename is tricky. You'll have to put IIF statements based on the values you've got from Source
1) create grobal variable(a variable is created within the scope of a package) and assign it to the file name property.
2) change the variable during the looping.
EDITED
see for more details...
You can access the data set in a script (in the script component) and write out to a set of files based on your criteria.