Unmatched Files Processing through ForEach Loop Container - ssis

I have some processed and unprocessed files in my Source Folder and the file names
of all the processed files are stored in a table. How can I match the files names of source folder and table prior to ForEach Loop Container and process only unmatched files.

The solution below is a bit elaborate but it's the best I could think of.
STEP 1: Create 2 Variables, both strings.
a)CurrentFile: This will be used for your Foreach Loop Container collection value
b)ToProcess: This will be used to map the result set an Execute SQL Task explained
below
STEP 2: Add an Execute SQL Task into your Foreach Loop Container.
Configure Parameter Mapping as shown below:
Use the script below as your SQL Statement:
DECLARE #ToProcess VARCHAR(1)
IF NOT EXISTS(SELECT [FileNames] FROM [YourFilesTable] WHERE FileNames = ?)
SET #ToProcess = 'Y'
SELECT #ToProcess AS ToProcess
Set ResultSet to Single Row as shown below:
Configure Result Set as shown below:
On the Execute SQL Task, configure the Precedence Constraint as shown below:
Your Foreach Loop Container should look like below:

Before the Foreach Loop, use a Script Task to store the names of unprocessed files in an SSIS object variable, then iterate through this variable to load the new files as you already are. Create an object variable and add this in the ReadWriteVariables field of the Script Task. If you're using an SSIS variable to hold the folder path of the source files as done below, add this in the ReadOnlyVariables field. The Foreach Loop will need to use the Foreach From Variable Enumerator enumerator type. In the Variable field on the Collection page, add the object variable that is populated in the Script Task. As you're probably already doing, add a string variable at Index 0 of the Variable Mapping pane and set this variable as the expression of the ConnectionString property on the connection manager, assuming this is a flat file connection. If this is excel, change the ExcelFilePath property to use this variable as the expression. The example code and referenced namespaces for the Script Task is below and uses C#.
using System.Linq;
using System.Data.SqlClient;
using System.IO;
using System.Collections.Generic;
using System.Data;
string connString = #"Data Source=YourSQLServer;Initial Catalog=YourDatabase;Integrated Security=SSPI;";
string cmdText = #"SELECT DISTINCT ColumnWithFileNames FROM YourDatabase.YourSchema.YourTable";
string sourceFolder = Dts.Variables["User::SourceFilePath"].Value.ToString();
//create DirectoryInfo object from source folder
DirectoryInfo di = new DirectoryInfo(sourceFolder);
List<string> processedFiles = new List<string>();
List<string> newFiles = new List<string>();
//get names of already processed files stored in tavle
using (SqlConnection conn = new SqlConnection(connString))
{
conn.Open();
//data set name does not need to relate to name of table storing processed files
DataSet ds = new DataSet("ProcessedFiles");
SqlDataAdapter da = new SqlDataAdapter(cmdText, conn);
da.Fill(ds, "ProcessedFiles");
foreach (DataRow dr in ds.Tables["ProcessedFiles"].Rows)
{
processedFiles.Add(dr[0].ToString());
}
}
foreach (FileInfo fi in di.EnumerateFiles())
{
//only add files not already processed
if (!processedFiles.Contains(fi.FullName))
{
newFiles.Add(fi.FullName);
}
}
//populate SSIS object variable with unprocessed files
Dts.Variables["User::ObjVar"].Value = newFiles.ToList();

Related

How do I check whether all the rows for a column are NULL in SSIS?

I'm working on a SSIS package where I have a text file with 5 columns. I need to check if all the rows for 5th column are NULL values.
If all the rows in 5th column are NULL then all the data should go for invalid file table.
If any row in 5th column have non NULL value then all the data should go to valid table.
You will need to read the entire file before being able to make the decision of where to write it to, so introduce a third table where you can stage the data first
Next part would be to build the logic that checks the staging table for all NULLS. Below query would return 0 if all was NULL and more than 0 if any record had a value
SELECT COUNT(*) FROM dbo.StagingTable ST WHERE ST.Column5 IS NOT NULL
Once you feed the answer into a variable you can use precedence constraints to fire the dataflow copy [staging to active] if the result was more than 0 or [staging to faulty] if the result was 0
personally if i had to perform this task I would use a script task to do it all:
Load into a data table
Use linq to check column to determine destination .Where(x => x[4]!=null).Count()
Load to destination via bulk Copy
You can check if the file is empty with C# using an OleDbDataAdapter and search the file, then determine where to load the file using SSIS Precedence Constraints. This example uses a CSV file without column names. If the columns do have names add the replacement code noted in the comments below. You will also need the using statements listed.
Add an SSIS Boolean variable. This is IsColumnNull in the following example. Next add a C# Script Task with IsColumnNull variable in the ReadWriteVariables field, and (optionally) a variable holding the file path ReadOnlyVariables pane.
Next set Precedence Constraints to check for both a true condition (has null rows) or false condition (does not have null records). Since the IsColumnNull variable is a Boolean, use just the variable itself as the expression to check for all null rows, but add ! for non-nulls, i.e. !#[User::IsColumnNull].
Connect the appropriate Data Flow Tasks with each destination table to the corresponding Precedence Constraint. For example, add the Data Flow Task with the "invalid file table" as the destination after the Precedence Constraint checking for a true value in the IsColumnNull variable.
Precedence Constraint For Rows with Nulls:
Precedence Constraint for Rows without Nulls:
Script Task Example:
using System;
using System.Data;
using System.IO;
using System.Data.OleDb;
using System.Linq;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
string fullFilePath = Dts.Variables["User::FilePath"].Value.ToString();
string fileName = Path.GetFileName(fullFilePath);
string filePath = Path.GetDirectoryName(fullFilePath);
string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
+ ";Extended Properties=\"text;HDR=No;FMT=Delimited\";";
//add filter for NOT NULL on given column to only return non-nulls
string sql = "SELECT F2 FROM " + fileName + " WHERE F2 IS NOT NULL";
//if file has column names replce "connStr" and "sql" as shown below
/*
string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
+ ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\";";
string sql = "SELECT ID FROM " + fileName + " WHERE ID IS NOT NULL";
*/
using (OleDbDataAdapter oleAdpt = new OleDbDataAdapter(sql, connStr))
{
DataTable dt = new DataTable();
oleAdpt.Fill(dt);
//if emtpy set IsColumnNull SSIS variable to true
if (dt.Select().Count() < 1)
{
Dts.Variables["User::IsColumnNull"].Value = true;
}
else
{
Dts.Variables["User::IsColumnNull"].Value = false;
}
}

SSIS- retaining null values as null in the csv output file using script task

I am using execute sql task to read sql queries and then script task to write the query results into csv files using the method mentioned in the following post on this site-
SSIS: Script task to write recordset to file
(SSIS: Script task to write recordset to file).
In addition to this, what I require is 'null' populated in csv files wherever the query result is NULL. How to achieve this using script task???? What additional code do I have to write??
There are a few ways to do this but the answer is in Persist() within the foreeach data row loop:
foreach (System.Data.DataRow row in table.Rows)
{
// TODO: For string based fields, capture the max length
IEnumerable<string> fields = (row.ItemArray).Select(field => field.ToString());
file.WriteLine(string.Join(delimiter, fields));
}
You have to test for db null in your lambda expression. This should work though I have not tested:
IEnumerable<string> fields = (row.ItemArray).Select(field => ((field == DBNull.Value) "NULL" : field.ToString()));

dynamic SQL execution and saving the result in flat file in SSIS

I want to create a SSIS package which writes a file with data generated by executing a SQL Statement. This generic package will be invoked by other packages passing in correct SQL as a variable.
Thus in the generic package :
I want to execute a dynamic SELECT query and fetch dynamic number of columns from a single database instance, the connection string does not per call and store the result into a flat file.
What would be an ideal way to accomplish this in SSIS.
What I tried :
The simplest solution that I could find was a writing a script task which would open a SQL connection , execute the SQL using SQLCommand, populate a datatable using the data fetched and write the contents directly to the file system using System.io.File and Release the connection.
I tried using OLE Database source with the SQLsupplied by a variable (with Validation set to false) and directing the rows into a Flat file connection. However due to the dynamic number and names of the columns I ran into errors.
Is there a more standard way of achieving this without using a script task?
How about this ... concatenate all field values into one field, and map AllFields to a field in a text file destination.
SELECT [f1]+',' + [f2] AS AllFields FROM [dbo].[A]
All of the "other"packages will know how to create the correct SQL. Their only contract with the "generic" package would be to eventually have only one field nameed "AllFields".
To answer your question directly, I do not think there is a "standard" way to do this. I believe the solution from Anoop would work well and while I have not tested the idea I wish I would have investigated it before writing my own solution. You should not need a script task in that solution...
In any case, I did write my own way to generate csv files from SQL tables that may run up against edge cases and need polishing but works rather well right now. I am looping through multiple tables before this task so the CurrentTable variable can be replaced with any variable you want.
Here is my code:
public void Main()
{
string datetime = DateTime.Now.ToString("yyyyMMddHHmmss");
try
{
string TableName = Dts.Variables["User::CurrentTable"].Value.ToString();
string FileDelimiter = ",";
string TextQualifier = "\"";
string FileExtension = ".csv";
//USE ADO.NET Connection from SSIS Package to get data from table
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["connection manager name"].AcquireConnection(Dts.Transaction) as SqlConnection);
//Read data from table or view to data table
string query = "Select * From [" + TableName + "]";
SqlCommand cmd = new SqlCommand(query, myADONETConnection);
//myADONETConnection.Open();
DataTable d_table = new DataTable();
d_table.Load(cmd.ExecuteReader());
//myADONETConnection.Close();
string FileFullPath = Dts.Variables["$Project::ExcelToCsvFolder"].Value.ToString() + "\\Output\\" + TableName + FileExtension;
StreamWriter sw = null;
sw = new StreamWriter(FileFullPath, false);
// Write the Header Row to File
int ColumnCount = d_table.Columns.Count;
for (int ic = 0; ic < ColumnCount; ic++)
{
sw.Write(TextQualifier + d_table.Columns[ic] + TextQualifier);
if (ic < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
// Write All Rows to the File
foreach (DataRow dr in d_table.Rows)
{
for (int ir = 0; ir < ColumnCount; ir++)
{
if (!Convert.IsDBNull(dr[ir]))
{
sw.Write(TextQualifier + dr[ir].ToString() + TextQualifier);
}
if (ir < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
}
sw.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception exception)
{
// Create Log File for Errors
//using (StreamWriter sw = File.CreateText(Dts.Variables["User::LogFolder"].Value.ToString() + "\\" +
// "ErrorLog_" + datetime + ".log"))
//{
// sw.WriteLine(exception.ToString());
//}
Dts.TaskResult = (int)ScriptResults.Failure;
throw;
}
Dts.TaskResult = (int)ScriptResults.Success;

Pass a variable using Script task in SSIS

I have a C# script in the ssis package as mentioned below
SqlConnection importTab = new SqlConnection(#"Server=ServerName;
Integrated Security=true;user=;pwd=;database=DBname");
I need to pass the database name (DBName) inside a variable...
May be like this
SqlConnection importTab = new SqlConnection(#"Server=ServerName;
Integrated Security=true;user=;pwd=;database="+"User::Variable" +");"
But I know I am wrong...
To use a variable in a script, first ensure that the variable has been added to
either the list contained in the ReadOnlyVariables property or the list contained in
the ReadWriteVariables property of this script task, according to whether or not your
code needs to write to the variable.
//Example of reading from a variable:
DateTime startTime = (DateTime) Dts.Variables["System::StartTime"].Value;
//Example of writing to a variable:
Dts.Variables["User::myStringVariable"].Value = "new value";
//Example of reading from a package parameter:
int batchId = (int) Dts.Variables["$Package::batchId"].Value;
//Example of reading from a project parameter:
int batchId = (int) Dts.Variables["$Project::batchId"].Value;
//Example of reading from a sensitive project parameter:
int batchId = (int) Dts.Variables["$Project::batchId"].GetSensitiveValue();
I do it like this:
When opening the script task properties you have two fields, ReadOnlyVariables and ReadWriteVariables. Write your variable name into the according field based on your needs, in your case User::Variable.
In the code you can use it like this
Dts.Variables["User::Variable"].Value.ToString()
Following code in the Script task may help you
var dbServerName = Dts.Variables["yourVariableName"].Value.ToString();
var sqlConnString = string.Format("Server=ServerName;Integrated Security=true;user=;pwd=;database={0}", dbServerName);
SqlConnection sqlConn = new SqlConnection(sqlConnString);

how to check end of file in a csv file before processing it in ssis

I have created an SSIS package which processes .CSV files using a ForEachLoop container.
All the csv files contains "END OF FILE" in the last row.
Only those CSV files will be processed if it contains "END OF FILE" in the last row.
How can it be done. Please help.
Thanks in advance.
Create a variable check
Name DataType Value
check int 0
Let's say you have a package design like the one below
Script task is to check the file which has End of File at the last row
In the Script task add the variable check in ReadWriteVariable section and the output variable from ForEach container (suppose the variable name is LoopFiles) in ReadOnlyVariables
In the script task add the following code to read the file .There are several ways you can read the files here and here
public void Main()
{
int counter = 0;
string loop= Dts.Variables["User::LoopFiles"].Value.ToString();
string line;
using (StreamReader files = new StreamReader(file))
{
while((line = files.ReadLine()) != null)
{
if (line.ToLower() == "End Of File".ToLower())
{
Dts.Variables["User::check"].Value = 1;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Double Click the green arrow connection script task and Data Flow Task .A precedence dialog box will open and enter the expression as below
There are a number of ways that this could be done. One way would be:
Create the following variables:
EOF_Found Boolean
Row_Count Integer
Bring the data into a dataflow using the Flat File Source
Use a row count component to add the number of rows to Row_Count, to identify the last row later
Use a script component to loop through the rows, adding 1 to a counter for each row
When your counter equals the value in Row_Count (i.e. you are looking at the last row) check the value in the column that you expect "END OF FILE" to appear (depends on how you set up the flat file connection manager). if it equals "END OF FILE", change the value of EOF_Found to True
After the script component, add a derived column referencing the value in EOF_Found
Use a conditional split, checking the value of the derived column and only process if True
This solution avoids reading the entire file line by line. I have merged Praveen's code here for sake of completeness.
public void Main()
{
string line = ReadLastLine(#"c:\temp\EOF.cs");
if (line.ToUpper() == "END OF FILE")
{
Dts.Variables["User::check"].Value = 1;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
public static string ReadLastLine(string path)
{
StreamReader stream = new StreamReader(path);
string str = stream.ReadToEnd();
int i = str.LastIndexOf('\n');
string lastLine = str.Substring(i + 1);
return lastLine;
}