I have this setup like in illustration attached, hope it clear enough. I load all xls files in loop from given folder.
While implementing new Derived Column box to record FileName from my loop I got truncation error. My variable initially was set to CCS.xls (Len=7, shortest name ).
I tried to increase Length in Derived Column Editor but failed to do this, as it's not active, I can't not type anything there, then I track that that original Length came from Variables value. In Variable windows I have DataType = String and no any option to set length.
So for now I made dummy empty file with looong CCS____1.xls name to avoid this problem and it works OK. But want to learn other good way to avoid this problem, looks like in this setup for data connection I need to use file with longest name (?)
You can change the Length property to 50 or larger manually in Advanced Editor.
Right-Click on the Derived column->Show Advanced Editor->Input and Output Properties->Derived Column Output->Output Columns->the new Column->Data Type Properties->Length
Related
I have a flat file where rows have ids in form of guid. What I need is to redirect error output to table which will have error row id, error column and error code. The problem is that I can map only "Flat File Source Error Output Column" which is some sort of other columns concatenation. Is there a way I can get the ID column value of the error row? The best solution I could find is to add counter which will give the row number, but that's not exactly what I need, as ids are strings in my case.
Nope. You get 3* columns from the Flat File Source Component's: Flat File Source Error Output Column, ErrorCode, ErrorColumn.
A Source component defines the columns that all row buffers "downstream" of that point will contain. It is responsible for adding rows and then filling the columns in that new row buffer.
The Flat File Source component has a contract that describes how it should consume the source data - this many columns, this delimiter (or this many characters) etc.
What happens though when something overflows a length, or the data type is incompatible or not all of the delimiters are present? The design decision is to either put incomplete rows into the pipeline (but then how do you determine which columns get populated - fill left to right? what about type mismatch?) or treat it as an error. Normally, this blows up the data flow but if you add an Error output path, then you can see what row failed.
And the row is the atomic unit the flat file is using as input.
Read line -> Parse -> Write to Output [or Error] buffer {loop}
You could then use a Script Task to try and parse out the GUID from "Flat File Source Error Output Column" but then you have to hope that the value is in the row. It could be that a column has an embedded delimiter that wasn't escaped, someone transferred the file using the wrong encoding/line endings, etc.
*The Flat File Source Component does expose a property in the advanced editor, Component Properties tab for FileNameColumnName and that too will show up in the Error output path but that is the only source component I am aware of with this behaviour.
I am creating a new Flat File Connection Manager SSIS component which is based on a CSV file. I am keen to have the columns (all 547 of them) to be of type Unicode string [DT_WSTR] rather than string [DT_STR].
I am not sure how to trigger this component to do this automatically.
I guess I could go through and manually change each every one of the 547 columns to this data type of Unicode string [DT_WSTR]
Any comments or answers much appreciated !
I have tried using the Unicode checkbox but the wizard then doesn't find the columns. I get message "The specified header or data row delimiter "{LF}" is not found after scanning 2097152 bytes .."
I was hoping there would be some way of changing all the column data types in one action without having to make 547 column type changes.
You can simply open the Flat File Connection Manager, Go To Advanced Tab, Click on one Column, Hold Ctrl key and select all columns then change the data type to DT_WSTR.
Additional information can be found in the following link:
SSIS: Flat File default length
I found an answer to this question.
https://social.msdn.microsoft.com/Forums/en-US/747ad564-1add-422e-af3c-9375b130ec83/easy-way-to-set-all-data-types-in-a-connection-manager?forum=sqlintegrationservices
i.e. In the Flat File Connection Manager Editor it is possible to select multiple (or all) the columns and then the DataType choice made is applied to all the selected columns.
Phew !
i.e. like this:
I have a requirement where the output file needs to be saved(dynamically) with the naming convention as FileName_YYY-MM-DD_FileNumber where file number is the sequence number. For example:-
ABC_2009-01-01_001
ABC_2009-01-01_002
ABC_2009-01-01_003 and so on
I am able to get the name part and date part using expression in .TXT connection but unable to get the sequence number part. I would appreciate if anyone could help me out with the solution.
Thanks in advance!
Use a package variable that starts with "1" and add one to it for each new file.
EDIT: To populate the variable, one way is to use a script task that opens a filesystemobject and gets all the file names in the folder, parses them and figures out what is the highest sequence number. Then just add one to that number and set the value of the variable to that.
And no, I don't have any code handy that does that. You'll need to write it yourself.
Problem.
I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.
Furthermore the arrangement of these files are inconsistent.
Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.
Any suggestion, samples or direction will greatly be appreciated.
http://www.cozyroc.com/ssis/data-flow-task
Some forums
http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400
http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow
Off the top of my head, I have a 50% solution for you.
The problem
SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.
Query based solution
If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.
Variables
I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.
InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + #[User::CurrentFilename]
Connection manager
Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + #[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"
Control Flow
My Control Flow looks like a Data flow task nested in a Foreach file enumerator
Foreach File Enumerator
My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for #[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"
In the Variable Mappings tab, assign the value to our #[User::CurrentFileName] variable
At this point, each iteration of the loop will change the value of the #[User::Query to reflect the current file name.
Data Flow
This is actually the easiest piece. Use an OLE DB source and wire it as indicated.
Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the #[User::Query] variable in there, click OK and you're ready to work.
Sample data
I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.
col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444
and
col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222
Running the package results in these two screen shots
What's missing
I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.
Our solution. We use parent child packages. In the parent pacakge we take the individual client files and transform them to our standard format files then call the child package to process the standard import using the file we created. This only works if the client is consistent in what they send though, if they try to change their format from what they agreed to send us, we return the file.
We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.