Can we compare columns of multiple inputs file to derive a new column in SSIS - csv

I am trying to create a derived column based on columns provided in different input file but unfortunately I keep getting error when I tried to map my Raw_File_1 with Derived Column.The error looks like this:
Cannot create a connector.
The destination component does not have any available inputs for use in creating a path.
My goal is to able to connect both Raw_File_1 and Map_File_1 into Derived Column and generate a new column.
If anyone can provide me any suggestion that would be great!!
I have source file and reference file both are flat file. My source file has column a, column b and column c and my reference file has column d, column e and column f.
If column a=column d and column b=column f then I want to populate column c as the same value as column f. How can I do this kind of analysis or lookup in SSIS

Based on your comments that I patched into the question, you're looking to augment the existing data based on matching data from your reference file.
The core of your SSIS package will look like this
In the first data flow, we will source from map_file_1 and load into a "raw" file.
I configure my raw file destination like this
When the package runs, it'll fill that special format file with the reference data. It's important, because you can either use a database or a raw file as your lookup source.
Finally, we get to work! A flat file source to a Lookup component. In the first tab of that lookup, be sure to change the Connection type from the default of "OLE DB connection manager" to "Cache connection manager"
In Connection tab, click to create a new CCM and use the raw file generated in the preceding step.
Map columns A to D and B to E (assuming data types match). Click the check box on column F and in the Lookup Operation part, Replace C with that value.
Final thoughts
This will be a case sensitive lookup. If things don't have a match in the reference file, it's going to blow up. That's probably not what you want so configure the Lookup transformation to not do that ;)
I blogged about using Excel to populate the cache if you want more words http://billfellows.blogspot.com/2011/11/using-excel-in-ssis-lookup.html

Your question is not clear, i will try to give some suggestions:
If you are looking to perform a lookup with a derived column:
You can use Cache Transform component and Cache connection manager to achieve that:
SSIS - How To Use Flat File Or Excel File In Lookup Transformation [Cache Transformation]
If you are looking to Merge both input:
Then you need to use Merge Join or Union All components:
SSIS Union All Transformation
Learn SSIS : MERGE, MERGE JOIN and UNION ALL
SSIS Basics: Using the Merge Join Transformation

Related

Load big one line flat json file in ssis

I am trying to load a big file which basically is a json format flat file from my local drive to SQL Server by using SSIS. It's a one line file and I don't need to specify columns and rows as I am going to parse it as soon as it's in SQL Server by OPENJSON.
but when I tried to create Flat File Source in Visual Studio SSIS, I was not able to do that as even I used 'fixed width' format according to the solution here: import large flat file with very long string as SSIS package, as the max width seems to be 32000, while the json file could be much bigger.
here are my settings:
There are other options of loading the data by t-sql like OPENROWSET but we have SQL Server instance installed on another server rather than the same one we are doing our dev work. So there are some security limits between them.
So just wondering if this is the limitation of Flat File Source in SSIS or I didn't do it right?
You're likely looking for the Import Column transformation. https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/import-column-transformation?view=sql-server-ver15
Define a Data Flow as OLE Source -> Import Column -> OLE Destination.
OLE Source
Really, any source but this is the easiest to reproduce
SELECT 'C:\curl\output\source_data.txt' AS SourceFilePath;
That will add a column named SourceFilePath with a single row.
Import Column
Reference the article on Import Column Transformation but the summary is
Check the column that will provide the path
Add a column to the Import Column Collection to hold the file content. Change the data type to DT_TEXT/DT_NTEXT depending on your unicode-ness and note the LineageID value
Click back to Import Column Input and find the column name. Scroll down to the Custom Properties and use the LineageID above for FileDataColumnID where it says 0. Otherwise, you have an error of
The "Import Column.Outputs[Import Column Output].Columns[FileContent]" is not referenced by any input column. Each output column must be referenced by exactly one input column.
OLE DB Destination
Any data sink will do but the important thing will be to map our column from the previous step to a n/varchar(max) in the database.

Source File Connection (Flat File) - Not reading column metadata

When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation

Importing flat file which has changing column order using SSIS [duplicate]

Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.

how to load extracted file name into sql server table in SSIS

i have 3 csv files in a folder which contains eid, ename, country fields, and my 5 csv files names are test1_20120116_034512, test1_20120116_035512,test1_20120116_035812 etc.. my requirement is I want to take lastest file based on timne stamp and modified date, which i have done. Now i want to import the extracted file name into destination table..
my destination tables contains fields like,
filepath, filename, eid, ename, country
I have posted regarding this before in the same site i got an answer for extracting filename, now i want to load the extracted FileName into destination table
Import most recent csv file to sql server in ssis
my destination tables should have output as
C:/source test1_20120116_035812 1234 tester USA
In your DataFlow task, add a Derived Column Transformation. The value of CurrentFile will be the fully qualified path to the file. As you only want the file name, I would look to use a replace function on that with the base folder and then strip the remaining slash. This does not strip the file extension but you can add yet another call to REPLACE and substitute an empty string
Derived Column Name: filename
Derived Column:
Expression: REPLACE(REPLACE(#[User::CurrentFile], #[User::RootFolder], ""), "\\", "")
The above expects it to look like
CurrentFile = "C:\source\test1_20120116_035812.csv"
RootFolder = "C:\source"
Edit
I believe you've done something in your approach that I did not do. You should see a warning about possible truncation but given the values discussed in this and the preceding question, I don't believe the 4k limit on expressions will be of concern.
Displaying the derived column
Demonstrating the derived column does work
I will give you a +1 for providing an approach I wasn't aware of, but you'll still need to add a derived column to match your provided format (base path name)
Full path is provided from the custom properties. Use the above REPLACE section to remove the path info except use the column [FileName] instead of #[User::CurrentFile]
I tried to get the filename through the procedure which Billinkc has given, but its throwing me error stating that filename column failed becaue of truncation error..
Any how i tried different approach to load file name into table.
steps i have used
1. right click on flat file Source and click on show advanced edito for Flat file
2. select component Properties tab
3. Inside that Custom Properties section ---> it has a property FileNameColumnName
I have assigned Filename to that column property like
FileNameColumnName----> FileName thats it, am able to get the filename into my destination table..

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.