Is it possible to pass row id to error output in SSIS? - ssis

I have a flat file where rows have ids in form of guid. What I need is to redirect error output to table which will have error row id, error column and error code. The problem is that I can map only "Flat File Source Error Output Column" which is some sort of other columns concatenation. Is there a way I can get the ID column value of the error row? The best solution I could find is to add counter which will give the row number, but that's not exactly what I need, as ids are strings in my case.

Nope. You get 3* columns from the Flat File Source Component's: Flat File Source Error Output Column, ErrorCode, ErrorColumn.
A Source component defines the columns that all row buffers "downstream" of that point will contain. It is responsible for adding rows and then filling the columns in that new row buffer.
The Flat File Source component has a contract that describes how it should consume the source data - this many columns, this delimiter (or this many characters) etc.
What happens though when something overflows a length, or the data type is incompatible or not all of the delimiters are present? The design decision is to either put incomplete rows into the pipeline (but then how do you determine which columns get populated - fill left to right? what about type mismatch?) or treat it as an error. Normally, this blows up the data flow but if you add an Error output path, then you can see what row failed.
And the row is the atomic unit the flat file is using as input.
Read line -> Parse -> Write to Output [or Error] buffer {loop}
You could then use a Script Task to try and parse out the GUID from "Flat File Source Error Output Column" but then you have to hope that the value is in the row. It could be that a column has an embedded delimiter that wasn't escaped, someone transferred the file using the wrong encoding/line endings, etc.
*The Flat File Source Component does expose a property in the advanced editor, Component Properties tab for FileNameColumnName and that too will show up in the Error output path but that is the only source component I am aware of with this behaviour.

Related

SSIS Need Flat File output with 2 column headers the same

I am trying to use SSIS Flat File destination, but cannot come up with a work around for getting the output file to have two columns named to same thing.
I have a requirement for the output file to have the column headers:
first1, last1, email, shortname, email
Whenever I try to map the source data, I get error messages saying things like "This column name already exists" and "There is more than one data source column with the name "email"".
What's the best work around?
Thanks
Assuming I understand the problem correctly, you need to have the same column name in the output file twice. Doesn't matter whether it's same data or not, just the header needs to be repeated.
It's a little hokey, but in your connection manager, uncheck "Column Names in the first data row" and redefine the columns as email and email1. This will allow you to connect the columns to the right places in the file.
In your flat file destination, you have the ability to define Header row(s). It's very limited, you can't put useful things in there like dynamic checksums and such but in your case, paste in first1, last1, email, shortname, email and run the package. Data will be extracted to the correct columns and a header row will be prepended to the file with all the "right" field names.
Two downsides to this approach. First is the connection manager becomes output only as it would attempt to read in the header row from the file. Second is that any changes to the layout will not be kept in sync with the manual header row.

ssis - capturing the bad rows

HI, Can you help me to figure this out? Is there a way to get the row in which error occured in ssis? I have this flat file with some 10k + records which is being read via a 'flatfilesource'.
Right now the error output defaults to error-column, error-code, and 'flatfilesourceerroroutputcolumn' - and i use a script-component to handle it. But none of these three inputs (to script component) are user-friendly enough. So i want to get an output like the first column-value(this is a unique identifier) of the row in which error occured. How can I add that?
While debugging this in SSIS, you can add a Data Viewer on the path to where your script handles error. This path has all the columns of the original row where your error is.
If you want to handle your SSIS errors and also do something else with it, you can direct the error output from your flat file source to a Multicast and then send one stream down to a file, a table, or something else ( a Recordset destination and a subsequent foreach loop on the object used to store the Recordset will let you do stuff on a row-by-row basis on the errored row(s)).

Errorneous Row numbers in a SSIS task

I am importing a text file into SQL server table which has got number of constraints. I have created one package and associated tasks.
At the end of a SSIS package execution, I want to know the erroenous row numbers which were not succefully exported to DB. Is any direct API or variable available in dts namespace to give this information?
Kindly share with me any knowledge to get this information.
Thanks,
Rahul
The error (red line) output of your import step inside the data flow lets you redirect to an error table. This should list the information you are after.
http://msdn.microsoft.com/en-us/library/ms140083.aspx
Error Outputs ( http://msdn.microsoft.com/en-us/library/ms140080.aspx )
Sources, destinations, and transformations can include error outputs. You can specify how the data flow component responds to errors in each input or column by using the Configure Error Output dialog box. If an error or data truncation occurs at run time and the data flow component is configured to redirect rows, the data rows with the error are sent to the error output. By default, an error output contains the output columns and two error columns: ErrorCode and ErrorColumn. The output columns contain the data from the row that failed, ErrorCode provides the error code, and ErrorColumn identifies the failing column.
For more information, see Handling Errors in the Data Flow.
Redirect the error rows on the destination component, pipe them through a count operation and then log that to a log table or whatever.

SSIS - read a single header record from a flat file or an excel file prior to processing

Is there a method by which one can read just the first record of a file, i.e., to read header information so that a decision can be made whther or not to process the remainder of the file?
I know that with the split transformation component one can write an expression that will ignore all of the rows besides the header based on a key word in the header. I would rather not go that route as that is inefficiently reading every record in the file.
Specifically, is there script component logic that I can implement to close the file
and end the dataflow after the first record has been read?
See this post from Todd McDermid:
Basically, you would set up a Foreach
Container to loop over the files in
your directory. Inside the Foreach,
you would determine the "file type" -
perhaps by creating a variable with a
long-winded expression on it that
pulls apart your file name and assumes
the a "file type" value - then passes
control on to one of five Data Flows
via conditional connectors.
(Double-click on the standard green
connector, change it's Evaluation
Operation to Expression and
Constraint, and set the expression to
be "file_type_variable =
".) Then each Data Flow
picks apart one "file type".

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.