I have a SSIS package - which within a FOR LOOP CONTAINER I look in a particular location, for a particular file format and import it into a database.
This is working fine - when I have two files the contents of both files are being imported.
So I have a Variable Mapping under my ForLoop which records the fully qualified name. What I want to do is when I import the file is I am also recording the file path of where it has come from.
I'm unsure in my dataflow task where I would put that ? Under the data flow I have my source file and a destination.
I tried to have a sql task after the data flow that updated the field in the database with the variable (via Parameter Mapping), but that set the field to the same value for everything (the last file path found) which is not what I'm after.
Any advice would be welcome
In your dataflow task, in between your source and destination add a Derived Column transformation. This will add columns to your dataset with a name and value that you specify. If you reference variables in which you are storing the file name for your loop container, the name of the file being accessed will be appended to an additional column in your dataset. Obviously you need to make sure that this column is present in your destination table.
Related
I am trying to load a big file which basically is a json format flat file from my local drive to SQL Server by using SSIS. It's a one line file and I don't need to specify columns and rows as I am going to parse it as soon as it's in SQL Server by OPENJSON.
but when I tried to create Flat File Source in Visual Studio SSIS, I was not able to do that as even I used 'fixed width' format according to the solution here: import large flat file with very long string as SSIS package, as the max width seems to be 32000, while the json file could be much bigger.
here are my settings:
There are other options of loading the data by t-sql like OPENROWSET but we have SQL Server instance installed on another server rather than the same one we are doing our dev work. So there are some security limits between them.
So just wondering if this is the limitation of Flat File Source in SSIS or I didn't do it right?
You're likely looking for the Import Column transformation. https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/import-column-transformation?view=sql-server-ver15
Define a Data Flow as OLE Source -> Import Column -> OLE Destination.
OLE Source
Really, any source but this is the easiest to reproduce
SELECT 'C:\curl\output\source_data.txt' AS SourceFilePath;
That will add a column named SourceFilePath with a single row.
Import Column
Reference the article on Import Column Transformation but the summary is
Check the column that will provide the path
Add a column to the Import Column Collection to hold the file content. Change the data type to DT_TEXT/DT_NTEXT depending on your unicode-ness and note the LineageID value
Click back to Import Column Input and find the column name. Scroll down to the Custom Properties and use the LineageID above for FileDataColumnID where it says 0. Otherwise, you have an error of
The "Import Column.Outputs[Import Column Output].Columns[FileContent]" is not referenced by any input column. Each output column must be referenced by exactly one input column.
OLE DB Destination
Any data sink will do but the important thing will be to map our column from the previous step to a n/varchar(max) in the database.
When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation
I have a complex task that I need to complete. It worked well before since there was only one file but this is now changing. Each file has one long row that is first bulk inserted into a staging table. From here I'm supposed to save the file name into another table and then insert the the broken up parts of the staging table data. This is not the problem. We might have just one file or even multiple files to load at once. What needs to happen is this:
The first SSIS task is a script task that does some checks. The second task prepares the file list.
The staging table is truncated.
The third task is currently a Foreach loop container task that uses the files from the file list and processes it:
File is loaded into table using Bulk Insert task.
The file name needs to be passed as a variable to the next process. This was done with a C# task before but it is now a bit more complex since there could be more than one file and each file name needs to be saved separately.
The last task is a SQL task that executes a stored procedure with the file name as input variable.
My problem is that before it was only one file. This was easy enough. What would the best way be to go about it now?
In Data Flow Task which imports your file create a derrived column. Populate it with system variable value of filename. Load filename into the same table.
Use a Execute SQL task to retrieve distinc list of filenames into a recordset (Object type variable).
Use For Each Loop container to loop through the recordset. Place your code inside the container. Code will recieve filename from the loop as a value of a variable and process the file.
Use Execute SQL task in For Each Loop container to call SP. Pass filename as a parameter like:
Exec sp_MyCode param1, param2, ?
Where ? will pass filename INPUT as a string
EDIT
To make Flat File Connection to pick up the file specified by a variable - use Connection String property of the Flat File Connection
Select FF Connection, right click and select Properties
Click on empty field for Expressions and then click ellipsis that appears. With Expressions you can define every property of the object listed there using variables. Many objects in SSIS can have Expressions specified.
Add an Expression, select Connection String Property and define an expression with absolute path to the file (just to be on a safe side, it can be a UNC path too).
All the above can be accomplished using C# code in the script task itself. You can loop through all the files one by one and for each file :
1. Bulk Copy the data to the staging
2. Insert the filename to the other table
You can modify the logic as per your requirement and desired execution flow.
Add a colunm to your staging table - FileName
Capture the filename in a SSIS Variable (using expressions) then run something like this each loop:
UPDATE StagingTable SET FileName=? WHERE FileName IS NULL
Why are you messing about with C#? From your description it's totally unnecessary.
I need to add an footer in flat file destinatiion as an row count. am importing data from oledb to flat file, i need the count of the records placed as a footer in flat file.
I would solve this similar to this:
ssis package format in excel
But in this case you can:
Create variable of type int (package scope).
Add Row Count Component in your data flow and in Variable Name put you var.
Create script task that will connect to your txt file and add additonal row with rowcount.
Hope it will help.
I will be creating flatfiles and based on the data in the batch, it might be necessary to split the data into an undetermined number of files.
I can make the connection string dynamic with an expression, but that is only evaluated when the package starts. I'd like to change that expression to include a '-a' or '-b' in the filename.
Alternately, if I have to create new connection manager objects at run time on demand, how do I go about that?
First determine your naming scheme for the output files and come up with an expression formula in your head
Put the Data Flow Task in a loop.
Within this Data Flow Task, define the source and destination. Destination being the Flat File Destination. Read the source and add some derived column that sets a value to another variable that you'll later use in the Filename expression.
Connect the Flat File Destination to a Connection Manager. First define some path but then add an Expression to define a Connection String based on your File Name scheme (Path + Filename + extension). Now this Filename is tricky. You'll have to put IIF statements based on the values you've got from Source
1) create grobal variable(a variable is created within the scope of a package) and assign it to the file name property.
2) change the variable during the looping.
EDITED
see for more details...
You can access the data set in a script (in the script component) and write out to a set of files based on your criteria.