SSIS flat file loading by checking conditions - ssis

I've 10 flat files(.dat format) in a folder which need to be uploaded into database everyday at a scheduled time.
All files related information are present in a database like File name, file path, table name, column names and delimiter.
We need to check each file exists or not, if not, need to log an entry, "File Not Found".
If the file exists, it needs to check for a trailer record(the last record in the file which says, Count=00001000, It has to be the count of number of records in that particular file).
If the trailer record not exists, then need to log an entry "No trailer record found".If the trailer record says zero count, then a log entry has to be made "Zero count" and also, if the counts of the file are not matching a log entry is needed, "Count mismatch".
If all the conditions are satisfied then data need to be loaded into database for each of the file.
Please suggest your ideas to implement the above scenario. Thanks!!!

Following solution may help you to resolve the issue.
Use the For each loop container with "Item" enumerator. Since you have 10 files and if something missing you need raise then you should use this. File enumerator just iterate through the files, not raises any error.
Following are Steps.
Create following SSIS package with variables.
FileFullPath
IsValidated
For each loop enumerator should be configured as following screenshots.
Configuartion in collection:
configuration in Variable section
Inside the container have a script task. you have to mention the FileFullPath as readonly variable and IsValidate as read and write like the following screen.
Click Edit script and insert the following code.
public void Main()
{
Dts.Variables["IsValidated"].Value = true;
string fileFullPath = Dts.Variables["FileFullPath"].Value.ToString();
if (!File.Exists(fileFullPath))
{
var msg = String.Format("File is not available in location : {0}", fileFullPath);
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
//Read last line
String lstLine = File.ReadLines(fileFullPath).Last();
int totalCount = 0;
bool talierExists = int.TryParse(lstLine, out totalCount);
if (!talierExists)
{
var msg = String.Format("No tailer row found and last line is : {0}", lstLine);
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
//Total count
int fullCount = File.ReadLines(fileFullPath).Count();
if (fullCount != totalCount)
{
var msg = String.Format("No of count is not matching, tailer count = {0} and full count={1}");
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
Dts.Variables["IsValidated"].Value = true;
Dts.TaskResult = (int)ScriptResults.Success;
}
After that have your Data flow. Connect the script task with your data flow and right click on the connector and go to edit and configure as follows.
Your SSIS package will looks like follows.
Hope this helps!

Related

SSIS flat file with values containing text qualifier

I received a flat file that cannot be generated in other way. The delimited is a comma and the text qualifier is a double quote. The problem is that sometimes a have a double quote in the value. In example:
"0","12345", "Centre d"edu et de recherche", "B8E7"
Because of the double quote in the value, I received this error:
[Flat File Source [58]] Error: The column delimiter for column "XYZ" was not found.
[Flat File Source [58]] Error: An error occurred while processing file "C:\somefile.csv" on data row 296.
What can I do to process this file?
I use SSIS 2016 with Visual Studio 2015
You can use the Flat File Source error output to redirect bad rows to another flat file and correct values manually while all valid rows will be processed.
There are many links online to learn more about Flat File Source Error Output:
Flat File source Error Output connection in SSIS
How to Avoid Package Design Flaws When Sourcing Data From Flat Files
Flat File Source Editor (Error Output Page)
Update 1 - Workaround using Script Component and conditional split
Since Flat File error output is not working you can use a script component with a conditional split to filter bad rows, the following update is a step by step guide to implement that:
Add a Flat File connection manager, Go To advanced Tab, Delete all columns except one column and change it length to 4000
Add a script component, Go to Input and Output Column Tab, add desired output columns (in this example 4 columns) and add a Flag Column of type DT_BOOL
Inside the Script Component write the following script to check if the number of columns is 4 then Flag = True which means this is a valid row else set Flag as False which mean that this is a bad row:
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.Column0_IsNull && !String.IsNullOrWhiteSpace(Row.Column0))
{
string[] cells = Row.Column0.Split(new string[] { "\",\"" }, StringSplitOptions.None);
if (cells.Length == 4)
{
Row.Col1 = cells[0].TrimStart('\"');
Row.Col2 = cells[1];
Row.Col3 = cells[2];
Row.Col4 = cells[3].TrimEnd('\"');
Row.Flag = true;
}
else
{
bool cancel;
Row.Flag = false;
}
}
else
{
Row.Col1_IsNull = true;
Row.Col2_IsNull = true;
Row.Col3_IsNull = true;
Row.Col4_IsNull = true;
Row.Flag = true;
}
}
}
Add a conditional split to split rows based on Flag column
Map the Valid Rows output to the OLEDB Destination, and the Bad Rows output to another flat file where you only map Column0

SSIS 2016 - ErrorColumn is 0 (zero)

I have a package with a bunch of oledb Destinations, using SSIS 2016 - which is supposed to show the exact column that generated the error.The ErrorColumn shows (0) zero, therefore I am unable to trap the column that generated the error.
Using the script below (with code that assigns "Unknown column" but it does not help, it just avoids the script fail):
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//IDTSComponentMetaData130 componentMetaData = ComponentMetaData as IDTSComponentMetaData130;
//var component130 = this.ComponentMetaData as IDTSComponentMetaData130;
//if (component130 != null)
//{
// System.Windows.Forms.MessageBox.Show(component130.GetIdentificationStringByID(Row.ErrorColumn));
// Row.ErrorColumnName = component130.GetIdentificationStringByID(Row.ErrorColumn);
//}
IDTSComponentMetaData130 componentMetaData = ComponentMetaData as IDTSComponentMetaData130;
if (componentMetaData != null)
{
//
if (Row.wkpErrorColumn != 0)
Row.wkpErrorColumnName = componentMetaData.GetIdentificationStringByID(Row.wkpErrorColumn);
else
Row.wkpErrorColumnName = "Unknown column";
}
else
{
Row.wkpErrorColumnName = "Cannot determine";
}
Row.wkpErrorDescription = ComponentMetaData.GetErrorDescription(Row.wkpErrorCode);
}
In SSIS 2008 (and I believe also SSIS 2016), a zero error column identifies an error that affects the entire row. In the example below, I have created a package that contains only 1 data flow task in the control flow and redirecting all errors and truncations to the error output. Placed a row count task just to have somewhere to send them to.
Also placed data viewers in both error flows to see the data coming out of it.
In a package consuming data from a flat file into an OLE DB Destination:
Using these values as input data
And having name as the PK of the table
We get a PK violation, check the error description and the error column
So when the error affects the entire record, it gets an error column value of 0.
Hope this helps.
I found that
When the error is generated by the PK, the error affect the entire row, and the error column is 0
When the error is generated by a Fk, The error affect only the column, and you get the error column value different then 0

How to continue only if records present in SSIS

For this diagram:
The "Get Score Files" Script obtains a list of files and puts them into a User Variable filelist (datatype object). That list is THrown into the "Find Score Files" Loop, and will process each item on the list.
I need it to run ONLY if their's files to be had. If the "Get Score Files" Script returns NO objects, I want the Package to End Successfuly. How do I tell it to do that?
Thanks
In "get score file" try this code
if (files.Count == 0)
{
Dts.Variables["files_present"].Value = false;
}
else
{
Dts.Variables["file_list"].Value =files;
Dts.Variables["files_present"].Value = true;
}`
In SSIS u should create one more variable(files_present) with bool type
Now in the precedence constraints expression before for each loop use files_present variable to check any file present or not`(if true file present else no files)

dynamic SQL execution and saving the result in flat file in SSIS

I want to create a SSIS package which writes a file with data generated by executing a SQL Statement. This generic package will be invoked by other packages passing in correct SQL as a variable.
Thus in the generic package :
I want to execute a dynamic SELECT query and fetch dynamic number of columns from a single database instance, the connection string does not per call and store the result into a flat file.
What would be an ideal way to accomplish this in SSIS.
What I tried :
The simplest solution that I could find was a writing a script task which would open a SQL connection , execute the SQL using SQLCommand, populate a datatable using the data fetched and write the contents directly to the file system using System.io.File and Release the connection.
I tried using OLE Database source with the SQLsupplied by a variable (with Validation set to false) and directing the rows into a Flat file connection. However due to the dynamic number and names of the columns I ran into errors.
Is there a more standard way of achieving this without using a script task?
How about this ... concatenate all field values into one field, and map AllFields to a field in a text file destination.
SELECT [f1]+',' + [f2] AS AllFields FROM [dbo].[A]
All of the "other"packages will know how to create the correct SQL. Their only contract with the "generic" package would be to eventually have only one field nameed "AllFields".
To answer your question directly, I do not think there is a "standard" way to do this. I believe the solution from Anoop would work well and while I have not tested the idea I wish I would have investigated it before writing my own solution. You should not need a script task in that solution...
In any case, I did write my own way to generate csv files from SQL tables that may run up against edge cases and need polishing but works rather well right now. I am looping through multiple tables before this task so the CurrentTable variable can be replaced with any variable you want.
Here is my code:
public void Main()
{
string datetime = DateTime.Now.ToString("yyyyMMddHHmmss");
try
{
string TableName = Dts.Variables["User::CurrentTable"].Value.ToString();
string FileDelimiter = ",";
string TextQualifier = "\"";
string FileExtension = ".csv";
//USE ADO.NET Connection from SSIS Package to get data from table
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["connection manager name"].AcquireConnection(Dts.Transaction) as SqlConnection);
//Read data from table or view to data table
string query = "Select * From [" + TableName + "]";
SqlCommand cmd = new SqlCommand(query, myADONETConnection);
//myADONETConnection.Open();
DataTable d_table = new DataTable();
d_table.Load(cmd.ExecuteReader());
//myADONETConnection.Close();
string FileFullPath = Dts.Variables["$Project::ExcelToCsvFolder"].Value.ToString() + "\\Output\\" + TableName + FileExtension;
StreamWriter sw = null;
sw = new StreamWriter(FileFullPath, false);
// Write the Header Row to File
int ColumnCount = d_table.Columns.Count;
for (int ic = 0; ic < ColumnCount; ic++)
{
sw.Write(TextQualifier + d_table.Columns[ic] + TextQualifier);
if (ic < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
// Write All Rows to the File
foreach (DataRow dr in d_table.Rows)
{
for (int ir = 0; ir < ColumnCount; ir++)
{
if (!Convert.IsDBNull(dr[ir]))
{
sw.Write(TextQualifier + dr[ir].ToString() + TextQualifier);
}
if (ir < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
}
sw.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception exception)
{
// Create Log File for Errors
//using (StreamWriter sw = File.CreateText(Dts.Variables["User::LogFolder"].Value.ToString() + "\\" +
// "ErrorLog_" + datetime + ".log"))
//{
// sw.WriteLine(exception.ToString());
//}
Dts.TaskResult = (int)ScriptResults.Failure;
throw;
}
Dts.TaskResult = (int)ScriptResults.Success;

how to check end of file in a csv file before processing it in ssis

I have created an SSIS package which processes .CSV files using a ForEachLoop container.
All the csv files contains "END OF FILE" in the last row.
Only those CSV files will be processed if it contains "END OF FILE" in the last row.
How can it be done. Please help.
Thanks in advance.
Create a variable check
Name DataType Value
check int 0
Let's say you have a package design like the one below
Script task is to check the file which has End of File at the last row
In the Script task add the variable check in ReadWriteVariable section and the output variable from ForEach container (suppose the variable name is LoopFiles) in ReadOnlyVariables
In the script task add the following code to read the file .There are several ways you can read the files here and here
public void Main()
{
int counter = 0;
string loop= Dts.Variables["User::LoopFiles"].Value.ToString();
string line;
using (StreamReader files = new StreamReader(file))
{
while((line = files.ReadLine()) != null)
{
if (line.ToLower() == "End Of File".ToLower())
{
Dts.Variables["User::check"].Value = 1;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Double Click the green arrow connection script task and Data Flow Task .A precedence dialog box will open and enter the expression as below
There are a number of ways that this could be done. One way would be:
Create the following variables:
EOF_Found Boolean
Row_Count Integer
Bring the data into a dataflow using the Flat File Source
Use a row count component to add the number of rows to Row_Count, to identify the last row later
Use a script component to loop through the rows, adding 1 to a counter for each row
When your counter equals the value in Row_Count (i.e. you are looking at the last row) check the value in the column that you expect "END OF FILE" to appear (depends on how you set up the flat file connection manager). if it equals "END OF FILE", change the value of EOF_Found to True
After the script component, add a derived column referencing the value in EOF_Found
Use a conditional split, checking the value of the derived column and only process if True
This solution avoids reading the entire file line by line. I have merged Praveen's code here for sake of completeness.
public void Main()
{
string line = ReadLastLine(#"c:\temp\EOF.cs");
if (line.ToUpper() == "END OF FILE")
{
Dts.Variables["User::check"].Value = 1;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
public static string ReadLastLine(string path)
{
StreamReader stream = new StreamReader(path);
string str = stream.ReadToEnd();
int i = str.LastIndexOf('\n');
string lastLine = str.Substring(i + 1);
return lastLine;
}