SSIS handling NULL and blank spaces - ssis

hello i am new to SSIS and iam receiving text file created by SSIS iam using wizard to load it to oracle table but in the text file there is columns contain the String NULL and other contain blank string instead of zero length column is there an auto way to make these value to become actual null value in the table or do i have to create derived column for each one of theses cases
thank you,

Within an SSIS project in the SQL Server Data Tools for Visual Studio 2015/SQL Server 2016, the way to address the handling of empty columns seems to be via a property of the Flat File Source component (not certain whether space-only columns qualify):
Right-click the Flat File Source and choose Show Advanced Editor....
Select the Component Properties tab.
Set RetainNulls property to True (default is False).

If you want to convert the value into null if your input value in empty/blank, then you can try (under assumption datatype is of string/varchar) :
LEN(TRIM([ColumnName]))==0 ? NULL(DT_WSTR, 10) : [ColumnName]

I faced the same issue, you can use a script component and add the code below to loop through all the columns and replace each text null with actual null value...
foreach (PropertyInfo dataColumn in Row.GetType().GetProperties())
{
if (dataColumn.Name.ToLower().EndsWith("_isnull") == false && dataColumn.PropertyType == typeof(string))
{
object objValue = dataColumn.GetValue(Row, null);
if (objValue != null && objValue.ToString() == 'NULL')
{
dataColumn.SetValue(Row, null, null);
}
}
}
Code explanation is here

If you're using SSIS 2008, there's also the Null Manager component from Tactek Data Systems. It isn't free, but it's pretty cheap - like $10 bucks. (www.tactek.com). You can convert empty strings to nulls, nulls to empty strings, and nulls to "filler" values like "Unknown" or "NA".

I don't think there is any way to do this using the standard Flat File Source SSIS provides. To do this I make use of a custom component called Delimited File Source, which can be downloaded here: http://ssisdfs.codeplex.com/. As its name indicates, it's also much better at handling delimited files, plus it has the option of treating empty strings as NULL.

Related

SSIS Flat File Connection - How does it determine string column DataType?

I am creating a new Flat File Connection Manager SSIS component which is based on a CSV file. I am keen to have the columns (all 547 of them) to be of type Unicode string [DT_WSTR] rather than string [DT_STR].
I am not sure how to trigger this component to do this automatically.
I guess I could go through and manually change each every one of the 547 columns to this data type of Unicode string [DT_WSTR]
Any comments or answers much appreciated !
I have tried using the Unicode checkbox but the wizard then doesn't find the columns. I get message "The specified header or data row delimiter "{LF}" is not found after scanning 2097152 bytes .."
I was hoping there would be some way of changing all the column data types in one action without having to make 547 column type changes.
You can simply open the Flat File Connection Manager, Go To Advanced Tab, Click on one Column, Hold Ctrl key and select all columns then change the data type to DT_WSTR.
Additional information can be found in the following link:
SSIS: Flat File default length
I found an answer to this question.
https://social.msdn.microsoft.com/Forums/en-US/747ad564-1add-422e-af3c-9375b130ec83/easy-way-to-set-all-data-types-in-a-connection-manager?forum=sqlintegrationservices
i.e. In the Flat File Connection Manager Editor it is possible to select multiple (or all) the columns and then the DataType choice made is applied to all the selected columns.
Phew !
i.e. like this:

Access 2010: How to convert erroneous value to null or blank?

I am doing a simple select query in Access 2010 using the design view. My source file is an excel worksheet that I have linked to a table in Access. I can view the source, but am not able to modify it in any way.
One of the columns in the source file can have the following error: "#NAME?"
When I see an error, I just want to read the value in as an empty string "", but when there isn't an error, I just want the value. I've researched different variations of using Nz, IIF, Switch, and IsError but haven't had any success yet.
I am trying the following code in the "Field" parameter in design view.
Program_Temp: IIf(IsError([Program]), "", [Program])
This fails because IIf evaluates both the "truepart" and the "falsepart" so even if the error is properly detected, the function will be in error because "[Program]" is still evaluated.
Is there a way to achieve this?
The Excel will pass the values as string to the access.
Now you just have to use the replace the error with empty string.
Replace ( string1, find, replacement, [start, [count, [compare]]] )
Replace ( Program_Temp, "#NAME?", "" )
or
Replace ( Program_Temp, "#NAME?", vbnull )
Also I will suggest the same thing as #serakfalcon. Import the table, not link it, and when new info is needed, delete the data and import new data. A nice side-effect of this is that errors will be replaced by null.

Prevent Duplicate headers in flat file destination - SSIS

I need some help.
I am importing some data in .csv file from an oledb source. I don't want the headers to appear twice in the destination. If i Uncheck the "Column names in first data row" property , the headers don't get populated in the first execution as well.
Output as of now.
Col1,Col2
A,B
Col1,Col2
C,D
How can I make the package run in such a way that if the file is empty , the headers get inserted. Then if the execution happens again, headers are not included,just the data.
there was a similar thread, but wasn't able to apply the solution as how to use expressions to get the number of rows of destination itself. It was long back , so I created a new.
Your help is deeply appreciated.
-Akshay
Perhaps I'm missing something but this works for me. I am not having the read only trouble with ColumnNamesInFirstDataRow
I created a package level variable named AddHeader, type Boolean and set it to True. I added a Flat File Connection Manager, named FFCM and configured it to use a CSV output of 2 columns HeadCount (int), AddHeader (boolean). In the properties for the Connection Manager, I added an Expression for the property 'ColumnNamesInFirstDataRow' and assigned it a value of #[User::AddHeader]
I added a script task to test the size of the file. It has read/write access to the Variable AddHeader. I then used this script to determine whether the file was empty. If your definition of "empty" is that it has a header row, then I'd adjust the logic in the if check to match that length.
public void Main()
{
string path = Dts.Connections["FFCM"].ConnectionString;
System.IO.FileInfo stats = null;
try
{
stats = new System.IO.FileInfo(path);
// checking length isn't bulletproof based on how the disk is configured
// but should be good enough
// http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk
if (stats != null && stats.Length != 0)
{
this.Dts.Variables["AddHeader"].Value = false;
}
}
catch
{
// no harm, no foul
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I looped through twice to ensure I'd generate the append scenario
I deleted my file and ran the package and only had a header once.
The property that controls whether the column names will be included in the output file or not is ColumnNamesInFirstDataRow. This is a readonly property.
One way to achieve what you are trying to do it would be to have two data flow tasks on the control flow surface preceded by a script task. these two data flow tasks will be identical except that they will be referring to two different flat file connection managers. Again, the only difference between these two would be the different values for the ColumnsInTheFirstDataRow; one true, another false.
Use this Script task to decide whether this is the first run or subsequent runs. Persist this information and check it within the script. Either you can have a separate table for this information, or use some log table to infer it.
Following solution is worked for me.You can also try the following.
Create three variables.
IsHeaderRequired
RowCount
TargetFilePath
Get the source row counts using Execute SQL task and save it in
RowCount variable.
Have script task. Add readonly variables TargetFilePath and
RowCount. Add read and write variable IsHeaderRequired.
Edit the script and add the following line of code.
string targetFilePath = Dts.Variables["TargetFilePath"].Value.ToString();
int rowCount = (int)Dts.Variables["RowCount"].Value;
System.IO.FileInfo targetFileInfo = new System.IO.FileInfo(targetFilePath);
if (rowCount > 0)
{
if (targetFileInfo.Length == 0)
{
Dts.Variables["IsHeaderRequired"].Value = true;
}
else
{
Dts.Variables["IsHeaderRequired"].Value = false;
}
}
Dts.TaskResult = (int)ScriptResults.Success;
Connect your script component to your database
Click connection manager of flat file[i.e your target file] and go
to properties. In the expression, mention the following as shown in
the screenshot.
Map the connectionString to variable "TargetFilePath".
Map the ColumnNamesInFirstDataRow to "IsHeaderRequired".
Expression for Flat file connection Manager.
Final package[screenshot]:
Hope this helps
A solution ....
First, add an SSIS integer variable in the scope of the Foreach Loop or higher - I'll call this RowCount - and make its default value negative (this is important!). Next, add a Row Count to your Data Flow, and assign the result to the RowCount SSIS variable we just made. Third, select your Connection Manager (don't double-click) and open the Properties window (F4). Find the Expressions property, select it, and hit the ellipsis (...) button. Select the ColumnNamesInFirstDataRow property, and use an expression like this:
[#User::RowCount] < 0
Now, when your package starts, RowCount has the static value of -1 or another negative number. When the data flow starts for the first time in your loop, the ColumnNamesInFirstDataRow property will have a value of TRUE. When the first data flow completes, the row count (even if it's zero) is written to the RowCount variable. On the second interation of the loop, the Connection Manager is then reconfigured to NOT write column names...

SSIS transactional data (different record types, one file)

An interesting one, we're evaluating ETL tools for pre-processing statement data (e.g. utility bills, bank statements) for printing.
Some of the data comes through in a single flat file, with different record types.
e.g. a record type with "01" as the first field will be address data. This will have name and address fields. A record type with "02" will be summary data, with balances and totals. Record type "03" will be a line item on the statement.
Each statement will have one 01 and 02 records, and multiple 03 records. I could pre-parse the file and split into 3 files for loading into a table, but this is less than ideal.
We take the file and do a few manipulations on it (e.g. add in a couple more fields to the address record, and maybe do some totalling / validation), and then send the file in pretty much the same format (But with the extra fields added) to our print composition program.
How would you do this in SSIS?
The big problem with variant records in SSIS is that you don't get any of the benefits of the connection manager helping with the layout, since the connection manager can only handle a single layout.
So typically, you end up with a CRLF terminated flat file with two columns: recordtype and recorddata. Then you put the conditional split in and parse each type of row on different paths. The parsing will have to split up the remaining record data and put it in columns and convert as normal, either with a derived column transform or a script transform and potentially conversion transforms.
If you had a lot of packages to do, I would seriously consider writing a custom component which produced 3 outputs already converted to your destination types.
answered my own question - see below script. AcctNum come in from a derived column from the flat file source and will be correctly populated for 02 record types, save it in local static varialbe and put it back on the row for other record types that do not contain the acct number.
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
static String AccountNumber = null;
public override void PreExecute()
{
base.PreExecute();
/*
Add your code here for preprocessing or remove if not needed
*/
}
public override void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.RecordType == "02")
AccountNumber = Row.AcctNum; // Store incomming Account Number into local script variable
else if (Row.RecordType == "06" || Row.RecordType == "07" || Row.RecordType == "08" ||
Row.RecordType == "09" || Row.RecordType == "10")
Row.AcctNum = AccountNumber; // Put Stored Account Number on this row.
}
}
This is possible, bu you will have to write custom logic. I did this once with DTS.
If the file is delimited, SSIS will import the fields correctly. You can write a script that examines the record type field, then branches into different inserts depending on the record type. If the file has records that are not delimited, but each type has its own fixed widths, this becomes a lot more complicated, since you'd have to parse and split each imported line, with the record types and their width hardcoded in the script.
There are a few ways to do it, but I think the easiest one to understand would be to add a conditional split after the source task, and then push it through a bunch of data conversion tasks to get the right format of data.
Make sure that your source is set up with the correct data types, so nothing falls through (e.g.-all strings). Then just check the "Record Type" field in that conditional split to send it to the right branch.

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.