fetching data from multiple file and loading it into raw file destination(raw file should be dynamic) in ssis - ssis

I have a source folder which contains 4 csv files with different no of columns in each of the file. I need to fetch only 3 columns(metadata same this 3 columns in all the 4 files) from each csv and load the columns inside Raw Destination from all the files avaiable in source folder. And Raw destination Output file name has to be like wht the inputfilename we are fetching + time stamp.
And at next level, i need to fetch this output raw as raw source and insert this records into oledb destination . and the destination table also has to be in dynamic.
for example i have 4 csv files called, test1.csv(10 columns). test2.csv(8), test3.csv(6), test4.csv(10) along with time stamps.
all this 4 files has columns position_id, asofdate, sumassured in common, now i want to load only these 3 columns to raw destination. If i load test1.csv then my raw destination outputfile name has to be RW_test1_20120119_222222.RW. similalrly if i load second file its filename as raw destination output..
Thanks
Satish

As always, decompose your problems until you've got it into a something you can manage.
Processing CSVs via queries
Following the two questions and answers below will result in a package with an OLEDB Connection Manager configured to operate on CSVs in the folder #[User::InputFolder]. 3 variables CurrentFileName, InputFolder and Query have been defined with an expression set on Query.
The expression for your #[User::Query] would look like "SELECT position_id, asofdate, sumassured FROM " + #[User::CurrentFileName]
Reference answers
SSIS FlatFile Acces via Jet
SSIS Task for inconsistent column count import?
At this point, your package should resemble the center piece below. Verify you can correctly enumerate all of the CSVs in the folder and the OLEDB query piece works.
RAW files
I'm not an expert on RAW file usage so there may be better ways of interacting with them. This will use the fourth variable, RawFileName. Set an expression on it like #[User::InputFolder] + "RawFile.raw" which would result in the file being written to C:\ssisdata\so\satishkumar\RawFile.raw
My general approach is to have a dataflow with a script task that sends no rows into a RAW File Destination.
Configure your destination as
Access mode: File name from variable
Variable name: User::RawFileName
Write option: Create Always
Process CSVs
The concept here is to append all the data into the RAW file that was created in the initial step.
Your source should already be configured as
OLE DB connection manager: FlatFile
Data access mode: SQL command from variable
Variable name: User::Query
Configure your destination as
Access mode: File name from variable
Variable name: User::RawFileName
Write option: Append
Extract from RAW
At this point, the foreach enumerator has completed and all the data has been loaded into the staging file. Now it is time to consume that and send data on to the destination.
Drag a Raw File Source Transformation onto your data flow. Unsurprisingly, you will configure as
Access mode: File name from variable
Variable name: User::RawFileName
Instead of Simulate destination, wire it up to the correct data destination.
Caveat
Be careful when using an expression with GETDATE/GETUTCDATE to define filenames as they are constantly evaluated. In 2005, we had used FileName_HHMMSS and had issues because processing didn't complete in the same second between the creation of a file and the next task that consumed the file. Instead, I have had better success using a dynamic but fixed starting point and generally, that is the system variable, StartTime #[System::StartTime]

You can use ForEach Loop Container on the Control Flow Diagram to iterate txt and csv files.

Related

Using a variable as the input for a Conditional Split control

Might be going about this completely the wrong way - happy to be shown the error of my ways.
In a nutshell, I've got 50-odd files of mixed types (csv and excel) that I want to import (each file to its own table) to an SQL database.
In the control flow I've got an sql task that returns:
The source data filename
The source data filetype (csv / xlsx)
What I want to name the table to import to.
This object gets passed to a Foreach loop that loops through this object and puts these 3 fields into variables.
I want to then say "if the filetype variable is csv, go and do a flat file import. If it's .xlsx, go and do an excel import"
So inside my for each container I've got a dataflow task.
I want the first thing the dataflow task does to check the filetype variable, and then do the appropriate import.
I think it's got to be in the dataflow, because there isn't an "If" style control I can see in the control flow?
But I'm at a loss as to how I pass a variable into the conditional split.
Any thoughts welcome.
OR! - just had a thought. Is the best way to do this to get a list of all the csv file types, process them in a dataflow, then get a list of all the .xlsx ones and process them - so I'd have:
Get csv filenames & tablenames
for each to loop through these
dataflow to import data from csv
get xlsx filenames and tablenames
for each through these
dataflow to import data from xlsx.
Just doesn't seem as elegant?
Cheers

How to Write the File and File Path to table

I have a SSIS package - which within a FOR LOOP CONTAINER I look in a particular location, for a particular file format and import it into a database.
This is working fine - when I have two files the contents of both files are being imported.
So I have a Variable Mapping under my ForLoop which records the fully qualified name. What I want to do is when I import the file is I am also recording the file path of where it has come from.
I'm unsure in my dataflow task where I would put that ? Under the data flow I have my source file and a destination.
I tried to have a sql task after the data flow that updated the field in the database with the variable (via Parameter Mapping), but that set the field to the same value for everything (the last file path found) which is not what I'm after.
Any advice would be welcome
In your dataflow task, in between your source and destination add a Derived Column transformation. This will add columns to your dataset with a name and value that you specify. If you reference variables in which you are storing the file name for your loop container, the name of the file being accessed will be appended to an additional column in your dataset. Obviously you need to make sure that this column is present in your destination table.

How do you get flat file name and perform a row count from multiple flat files with different columns in SSIS?

I'm trying to get all the file names from a folder directory along with their row counts. (Also file size in bytes if possible) I am using Microsoft Visual Studio 2010 Shell. Here's what I've done so far:
I have created a Foreach Loop Container, set the Enumerator to Foreach File Enumerator and Expressions to a variable to the folder I want to loop over. I left the Files section with *.* and asked to retrieve Name Only. I have changed the Variable Mappings to a New Variable called FullFilePath, Container is Package, Value type is String and Value: is blank.
I then added a Data Flow to the Loop. Added a flat file source, row count, and OLE DB Destination. I changed the Flat file Source properties expression to the same Folder Variable in the Foreach Loop Container Expression. I added the Variable RecordCount to the Row Count function (Int32, value 0). The OLE DB Destination creates a new table with the name OLE DB Destination.
The next step is a Execute SQL Task that does and Insert Into DBO.FileData (FileName,RowCount) Values (?,?). I set 2 parameter mappings - 1) Variable Name from the Foreach Loop Container, FullFilePath and Data Type VarChar, 2) Variable from Row Count, RecordCount and Data Type Long.
I then have another Execute SQL Task that drops the table created by the data flow task. The problem is that with all the these step the Package still does not complete. It actually gets hung up and fails on the pre-execute. It says:
Warning: Access is denied. Error: Cannot open the datafile 'FullFilePath' Error: Flat File Source failed the pre-execute phase and returned error code 0xC020200E.
Anything you see I could be doing wrong? Let me know if pictures would help.
So I figured this out finally. In order to loop over all of the files with varying headers and column counts I decided to change the option in the Flat File Source to unselect "File contains headers." Doing this allowed the all the files to have the same #1 Column, which by default is Column 0(the first column in all of my files is some sort of a numeric field or ID). I was able to map this through row count and insert into a SQL table. Then I was able to finish the Foreach Loop and scribe the file name and row count into another SQL table to record the counts. It is however taking a really really really long time, i.e. it has been running for over 14 hours and it has only counted through 13 files. Granted some files are 250K+ rows but I wouldn't think it would take this long.

SSIS - Load flat files, save file names to SQL Table

I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.

How to create dynamic number of output files with SSIS?

I will be creating flatfiles and based on the data in the batch, it might be necessary to split the data into an undetermined number of files.
I can make the connection string dynamic with an expression, but that is only evaluated when the package starts. I'd like to change that expression to include a '-a' or '-b' in the filename.
Alternately, if I have to create new connection manager objects at run time on demand, how do I go about that?
First determine your naming scheme for the output files and come up with an expression formula in your head
Put the Data Flow Task in a loop.
Within this Data Flow Task, define the source and destination. Destination being the Flat File Destination. Read the source and add some derived column that sets a value to another variable that you'll later use in the Filename expression.
Connect the Flat File Destination to a Connection Manager. First define some path but then add an Expression to define a Connection String based on your File Name scheme (Path + Filename + extension). Now this Filename is tricky. You'll have to put IIF statements based on the values you've got from Source
1) create grobal variable(a variable is created within the scope of a package) and assign it to the file name property.
2) change the variable during the looping.
EDITED
see for more details...
You can access the data set in a script (in the script component) and write out to a set of files based on your criteria.