Is there any way to reorder fields in an SSIS flat file source? - ssis

I have an SSIS package using a tab delimited flat file source with a TON of fields. Recently the provider of the tab delimited flat file has decided to change the format of the flat file by sprinkling a couple dozen new fields at random into the file. Needless to say, this hosed the package.
Rather than rebuild another flat file source and redefine all the fields, types, and lengths all over again, is there a way to reorder the fields in the flat file source? Sure would have been nice if Microsoft allowed you to move the fields around in the Advanced Columns pane, but noooooo.
Any help is appreciated.

If you only need to add columns to your file, you can do that in the Flat File connection editor. In the advanced window, you can select the field next to the new one and click the chevron next to the New button. It will give you the choice insert before or insert after.
If you truly have to move things around, you'll need to edit the XML source. If you use the existing file definition as a guide, you can build the new one in Excel or T-SQL relatively easily. Easier than typing everything in all over again at least.

I had a similar issue: I needed to change the order of columns in my flat file destination. The time-saving approach I settled on:
Delete the FF destination and FF connection manager (note down file name/location!),
Clear the check boxes that enable output columns in the source component
Re-enable the columns in the order you want
Add a new FF destination and FF connection right from the FF destination's connection manager drop-down.
Review/sanity check column sizes in FF connection, as usual
Not a direct answer to the question, but I came here looking for advice on "how to rearrange flat file destination columns", perhaps this will help someone.

I haven't seen an solution for that problem. SSIS isn't very strong in changing metadata. You could try to do it in notepad, but that is very tricky and very buggy. I would not recommand that to you.

In the connection managers below of your IDE you can double click your file name and edit everything you want.

This is still a "feature" of SSIS. To work around this I create a flat file connection called "NULL" with a single column named "NULL". Use the "New" button to add the column. I change the default column name from "Column 0" to "NULL". This column name must not match any column name in the list to be re-populated. If you have a real column named "NULL", pick something else for the column name that's not in use. You can keep the "NULL" flat file connection in the project for later use. (I expect to need it a few more times in this project.)
For this example, I use a flat file destination. Change the Flat File Destination to use the NULL connection.
Check the mapping to see there are no columns mapped. Saving this resets the metadata stored for the mapping.
Finally, change the Flat File Destination back to the correct connection to get a new mapping without metadata interference.
My example is a flat file destination. It should work for a flat file source for resetting the metadata. It is similar to the trick of changing a query to "select 1 as [NULL]" and back to purge metadata when using a ODBC source or such.

you could probably try something, but i havent tested.. use expressions to set everything for your flat file source? turn design time validation off

Related

SSIS Flat File Connection - How does it determine string column DataType?

I am creating a new Flat File Connection Manager SSIS component which is based on a CSV file. I am keen to have the columns (all 547 of them) to be of type Unicode string [DT_WSTR] rather than string [DT_STR].
I am not sure how to trigger this component to do this automatically.
I guess I could go through and manually change each every one of the 547 columns to this data type of Unicode string [DT_WSTR]
Any comments or answers much appreciated !
I have tried using the Unicode checkbox but the wizard then doesn't find the columns. I get message "The specified header or data row delimiter "{LF}" is not found after scanning 2097152 bytes .."
I was hoping there would be some way of changing all the column data types in one action without having to make 547 column type changes.
You can simply open the Flat File Connection Manager, Go To Advanced Tab, Click on one Column, Hold Ctrl key and select all columns then change the data type to DT_WSTR.
Additional information can be found in the following link:
SSIS: Flat File default length
I found an answer to this question.
https://social.msdn.microsoft.com/Forums/en-US/747ad564-1add-422e-af3c-9375b130ec83/easy-way-to-set-all-data-types-in-a-connection-manager?forum=sqlintegrationservices
i.e. In the Flat File Connection Manager Editor it is possible to select multiple (or all) the columns and then the DataType choice made is applied to all the selected columns.
Phew !
i.e. like this:

Source File Connection (Flat File) - Not reading column metadata

When I create the SSIS package it requires a file to be referenced to pick up the files metadata. For example the column headers will be ColumnA, ColumnB.
I have always assumed that these column names need to be present in the file for it to be loaded. Recently business, for whatever reason, changed one of the column names in the file to something else so the file contains ColumnA, NotColumnB. When the SSIS package runs it ignores this and loads the file. I assumed that it would fail. Is my assumption correct and there is something weird going on or is my assumption incorrect, if so please let me know why.
I have changed the column names in a few other packages that load data from a file and they also dont care what the column names are
Click on the flat file source, and press F4 to show the properties tab. There are a property called ValidateExternalMetadata change it to True.
For more information check the following answer:
Detect new column in source not mapped to destination and fail in SSIS
Update 1
It looks like that flat file connection manager has no validation engine and the metadata defined is used at configuration time to configure the mappings between the data file and the database.
Why Does't SSIS Flat File Data Check If Columns Names or Order Have Changed? What is best way to check?
Flat file destination columns data types validation

SSIS Errors for simple CSV Data Flow

Sorry to darken your day with my troubles, but SSIS has broken me! I am new to SSIS and I just seem to be misunderstanding it.
For background: I have a few versions of a basic package that includes a Foreach Loop container and a Data Flow with a few Derived Columns that imports CSV files into a SQL Server Staging table. It is very straightforward and does include an Execute SQL task and a File Move but those work fine. The issues are with the Foreach loop and the Data Flow.
I have one version of this package (let’s call it “A”) that seemed to be working fine. It would process multiple files in a folder, insert records into the staging table, properly execute the SQL Statements, and move the files to Archive. Everything seemed fine until I carefully QA’d the process. Turns out it was duplicating the data from one file, and never importing the data from a second Source File! Yet, the second/dupe round of data included the Source Filename (via a derived column) of the second file (but the data from the first). So it looked like I had successfully processed BOTH files until I looked at the actual data and saw that none of the values from the second source file were ever written to the Staging table.
Once I discovered this, I figured that the problem was in the Foreach loop and how I setup the different file path & name variables. So, I decided to try to make a new version of the package. I started by copying package A and created package B. In B, I deleted the Source Connection manager and created a new Connection Manager along with all new file & path variables. I then tried to cleanup/fix/replace various elements in my Data Flow and Foreach loop. In the process, I discovered that the Advanced Mappings from A – which DID work – were virtually all setup as String (even the Currency and Date columns). That did not seem right, so I modified each source money column by changing to data type Currency, and changed each date-related column to data type Date.
What followed has been dozens and dozens of Errors and I cannot get Package B to run. I have even changed all of the B data types back to String (mirroring the setup in Package A which DID work). But, still no joy.
This leads me to ask a few questions to those of you smarter than I:
1) Why can’t SSIS interpret Source CSV data using the proper data type? I.e. why do I need to set every Input column as a STRING when some columns are clearly & completely Numeric, Currency or Dates? (Yes, the Source CSV files are VERY clean – most don’t even have NULLS)
a. When I do change the Advanced mapping for a date-related Source column to Date, I get the ever present error message: [Flat File Source [30]] Error: Data conversion failed. The data conversion for column "Settle Date" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
2) When I reset the data types back to String in package B, I still get errors – usually Truncation errors (and Yes – I have adjusted the length to 250 in one of these columns).
a. Error Message: "The value could not be converted because of a potential loss of data.".
b. When I reset the Mappings to ignore the column (as a test), it throws a similar error at the next column.
3) Any ideas why Package A would dupe a file’s data and not process the second file, yet throw no errors and move both to Archive?
4) Why does the Data Viewer appear to have parsing errors (it shows data in the wrong columns) but when you use the Copy data feature in the data viewer and paste it into Excel, all of the data lines up perfectly?
5) Are there any tips & tricks that a rookie SSIS user needs to understand and which might not be apparent through the documentation and searching web articles as well as this site?
I can provide further details if they will help, but these packages are really very simple and should not be causing me this much frustration.
THANKS for any insights.
DGP
Wow seems like you have a lot of ssis issues... I think the reason for the same file being extracted is because of the the way your 'variable mappings' is defined.
Have you had a look and followed this guide:
https://www.simple-talk.com/sql/ssis/ssis-basics-introducing-the-foreach-loop-container/
Hope this helps.
Shaheen
Thanks Tab & Shaheen,
To all SSIS rookies - please learn from my mistakes!
It appears that my issue was actually in how I identified the TEXT QUALIFIER in the Connection Manager. I had entered "" and that was causing problems with how my columns were being parsed. The parsing issues caused unexpected values to appear in some of the columns and that was causing the errors in the package.
When I tried changing the the Text Qualifier to only ONE double quote - " - the whole thing worked!
As I mentioned - and as Shaheen suspected - my initial issues with the duplicate processing was probably due to how I setup the foreach loop. I had already fixed that, bit was still getting errors until I fixed the Text Qualifier.
I have only tested it a few times but it looks like that was the issue.
Thanks for the contributions.
DGP

SSIS Package Stubborn Flat File Connection Manager

I receive a pipe-delimited flat file each week that has 50 columns. I am trying to use SSIS to take that file, delete the last 3 columns, then insert the remaining data into a new pipe-delimited flat file. At first I thought this would be very simple, but I've got a stubborn flat file connection manager. It keeps reverting back to the inbound file layout with the extra columns, and also keeps going back to a comma delimited file when the outbound file needs to be pipe delimited.
The way I'm "deleting" the unneeded columns is by just removing them from the inbound flat file connection manager, so they aren't listed in the output columns from the flat file source, and they don't show to be on the input columns of the flat file destination.
The file name of both files is dynamic...not sure if that's having something to do with it.
I have delay validation set to true for both, but I'm not sure what else to try. I've also tried deleting all of it and adding back in the connection managers and the files.
Is there some issue with having 2 flat file connection managers, one for the source and one for the destination? Is there a setting I'm missing?
Delete your connection managers and Source/Destination (effectively start over)
Add Flat File Source (FFSRC) with new FF Connection
Set up FFSRC as needed (pipe delimited, headers, etc) - don't delete any rows
After clicking "OK", you're back in the Data Flow. Right click on your Flat File Source, and click "Show Advanced Editor"
Go to "Input and Output Properties" tab, then expand the FFSRC Output/Output Columns. Click a column, then click "Remove Column".
Add your Flat File Destination (FFDST) with a new connection manager, and map the inputs.
Your destination shouldn't have those columns now.
If the Flat File connection seems to be resetting because of dynamic names, look into supplying them as expressions/variables.
To do that, click on the Connection Manager node for your source/destination (not the data flow node), then in Properties, expand Expressions. You will want to make the ConnectionString dynamic via a variable.
What do you mean by
It keeps reverting back to the inbound file layout with the extra columns, and also keeps going back to a comma delimited file when the outbound file needs to be pipe delimited.
Have you tried directly referencing the file, then setup the type of file (ragged right, delimited or fixed width), then applying the expression to the connectionstring property of the connection manager?
I suggest ragged right, I can specify whatever 'column' and width I want to.

Importing flat file which has changing column order using SSIS [duplicate]

Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.