SSIS Row Count For All Files Within a Folder - ssis

I have a folder which contains multiple Excel files that are replaced daily. I need a total row count that gives me the sum of row counts from each individual Excel file within the folder (i.e. if there are 3 files with 10 records each, I need a result count of 30). I then need to run this package daily to add an individual record to a log table that will provide me with the daily count of records in the folder. I've been trying Foreach Loop Containers and ADO Enumerators but cannot seem to achieve a solution.

There is a good solution that you can apply without the need to use any script task. All you need to use is a "Foreach loop container" , a "Data Flow Task" and a "Execute SQL task"
Define Variables:
V_FilesPath-> (String) Will hold the path where your files are located
V_FileName-> (String) will be populated with the files name in the Foreach Container
V_RowCount (Int)
V_FileRowCount (int)
V_TotalRecords (int)
Define the Foreach:
Map your Source file to a RowCount Component and select the Variable: V_FileRowCount
In the "Execute SQL task" change the result set to "Single Row"
Map the ResultSet to the following variables:
In the Expression part of the "Execute SQL task" Choose the following Property Expression:
In the Expression Pane issue the following:
" SELECT " + (DT_STR,10,1252)#[User::V_TotalRecords] + " + " + (DT_STR,10,1252) #[User::V_FileRowCount] + ", 1 + " +(DT_STR,10,1252) #[User::V_RowCount]
Once you've completed the aforementioned you're done!
If you wish to see the result add a Script task (Just to display the results)
and paste the following Script Code instead of the part that starts with "Public Void Main"
public void Main()
{
try
{
string Variables = "Loop Counter: " + Dts.Variables["User::V_RowCount"].Value.ToString() + " Total Records in all files: " + Dts.Variables["User::V_TotalRecords"].Value.ToString();
MessageBox.Show(Variables).ToString();
}
catch (Exception Ex)
{
MessageBox.Show(Ex.Message);
}
}

You can use the Row Count component.
Create a Data Flow Task in the Control Flow. Then, inside Data Flow, use an Excel Source component connected to a Row Count component. Create an integer variable, double click the Row Count component and assign the variable to it.
If you configured your Excel Source correctly (with an Excel Connection Manager), the variable you created will hold the number of rows in the Excel file you're passing.

Related

How do I check whether all the rows for a column are NULL in SSIS?

I'm working on a SSIS package where I have a text file with 5 columns. I need to check if all the rows for 5th column are NULL values.
If all the rows in 5th column are NULL then all the data should go for invalid file table.
If any row in 5th column have non NULL value then all the data should go to valid table.
You will need to read the entire file before being able to make the decision of where to write it to, so introduce a third table where you can stage the data first
Next part would be to build the logic that checks the staging table for all NULLS. Below query would return 0 if all was NULL and more than 0 if any record had a value
SELECT COUNT(*) FROM dbo.StagingTable ST WHERE ST.Column5 IS NOT NULL
Once you feed the answer into a variable you can use precedence constraints to fire the dataflow copy [staging to active] if the result was more than 0 or [staging to faulty] if the result was 0
personally if i had to perform this task I would use a script task to do it all:
Load into a data table
Use linq to check column to determine destination .Where(x => x[4]!=null).Count()
Load to destination via bulk Copy
You can check if the file is empty with C# using an OleDbDataAdapter and search the file, then determine where to load the file using SSIS Precedence Constraints. This example uses a CSV file without column names. If the columns do have names add the replacement code noted in the comments below. You will also need the using statements listed.
Add an SSIS Boolean variable. This is IsColumnNull in the following example. Next add a C# Script Task with IsColumnNull variable in the ReadWriteVariables field, and (optionally) a variable holding the file path ReadOnlyVariables pane.
Next set Precedence Constraints to check for both a true condition (has null rows) or false condition (does not have null records). Since the IsColumnNull variable is a Boolean, use just the variable itself as the expression to check for all null rows, but add ! for non-nulls, i.e. !#[User::IsColumnNull].
Connect the appropriate Data Flow Tasks with each destination table to the corresponding Precedence Constraint. For example, add the Data Flow Task with the "invalid file table" as the destination after the Precedence Constraint checking for a true value in the IsColumnNull variable.
Precedence Constraint For Rows with Nulls:
Precedence Constraint for Rows without Nulls:
Script Task Example:
using System;
using System.Data;
using System.IO;
using System.Data.OleDb;
using System.Linq;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
string fullFilePath = Dts.Variables["User::FilePath"].Value.ToString();
string fileName = Path.GetFileName(fullFilePath);
string filePath = Path.GetDirectoryName(fullFilePath);
string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
+ ";Extended Properties=\"text;HDR=No;FMT=Delimited\";";
//add filter for NOT NULL on given column to only return non-nulls
string sql = "SELECT F2 FROM " + fileName + " WHERE F2 IS NOT NULL";
//if file has column names replce "connStr" and "sql" as shown below
/*
string connStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath
+ ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\";";
string sql = "SELECT ID FROM " + fileName + " WHERE ID IS NOT NULL";
*/
using (OleDbDataAdapter oleAdpt = new OleDbDataAdapter(sql, connStr))
{
DataTable dt = new DataTable();
oleAdpt.Fill(dt);
//if emtpy set IsColumnNull SSIS variable to true
if (dt.Select().Count() < 1)
{
Dts.Variables["User::IsColumnNull"].Value = true;
}
else
{
Dts.Variables["User::IsColumnNull"].Value = false;
}
}

SSIS flat file with values containing text qualifier

I received a flat file that cannot be generated in other way. The delimited is a comma and the text qualifier is a double quote. The problem is that sometimes a have a double quote in the value. In example:
"0","12345", "Centre d"edu et de recherche", "B8E7"
Because of the double quote in the value, I received this error:
[Flat File Source [58]] Error: The column delimiter for column "XYZ" was not found.
[Flat File Source [58]] Error: An error occurred while processing file "C:\somefile.csv" on data row 296.
What can I do to process this file?
I use SSIS 2016 with Visual Studio 2015
You can use the Flat File Source error output to redirect bad rows to another flat file and correct values manually while all valid rows will be processed.
There are many links online to learn more about Flat File Source Error Output:
Flat File source Error Output connection in SSIS
How to Avoid Package Design Flaws When Sourcing Data From Flat Files
Flat File Source Editor (Error Output Page)
Update 1 - Workaround using Script Component and conditional split
Since Flat File error output is not working you can use a script component with a conditional split to filter bad rows, the following update is a step by step guide to implement that:
Add a Flat File connection manager, Go To advanced Tab, Delete all columns except one column and change it length to 4000
Add a script component, Go to Input and Output Column Tab, add desired output columns (in this example 4 columns) and add a Flag Column of type DT_BOOL
Inside the Script Component write the following script to check if the number of columns is 4 then Flag = True which means this is a valid row else set Flag as False which mean that this is a bad row:
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.Column0_IsNull && !String.IsNullOrWhiteSpace(Row.Column0))
{
string[] cells = Row.Column0.Split(new string[] { "\",\"" }, StringSplitOptions.None);
if (cells.Length == 4)
{
Row.Col1 = cells[0].TrimStart('\"');
Row.Col2 = cells[1];
Row.Col3 = cells[2];
Row.Col4 = cells[3].TrimEnd('\"');
Row.Flag = true;
}
else
{
bool cancel;
Row.Flag = false;
}
}
else
{
Row.Col1_IsNull = true;
Row.Col2_IsNull = true;
Row.Col3_IsNull = true;
Row.Col4_IsNull = true;
Row.Flag = true;
}
}
}
Add a conditional split to split rows based on Flag column
Map the Valid Rows output to the OLEDB Destination, and the Bad Rows output to another flat file where you only map Column0

SSIS- retaining null values as null in the csv output file using script task

I am using execute sql task to read sql queries and then script task to write the query results into csv files using the method mentioned in the following post on this site-
SSIS: Script task to write recordset to file
(SSIS: Script task to write recordset to file).
In addition to this, what I require is 'null' populated in csv files wherever the query result is NULL. How to achieve this using script task???? What additional code do I have to write??
There are a few ways to do this but the answer is in Persist() within the foreeach data row loop:
foreach (System.Data.DataRow row in table.Rows)
{
// TODO: For string based fields, capture the max length
IEnumerable<string> fields = (row.ItemArray).Select(field => field.ToString());
file.WriteLine(string.Join(delimiter, fields));
}
You have to test for db null in your lambda expression. This should work though I have not tested:
IEnumerable<string> fields = (row.ItemArray).Select(field => ((field == DBNull.Value) "NULL" : field.ToString()));

SSIS Lookup multiple identical databases

I'm working on a project where i need to do lookups on a data warehouse server in Integration Services.
My problem is that I need to be able to change what database it i performs the lookup to. The databases are design wise identical.
I have solved this problem with a script component before, where for each row, if the database id have changed, the connection changes, example below
try {
if (databaseNr != Row.DatabaseNr) {
try {
databaseNr = Row.DatabaseNr;
currentCatalog = "db" + Row.DatabasNr;
connection.ChangeDatabase(currentCatalog);
} catch (Exception e) {
ComponentMetaData.FireWarning(0, ComponentMetaData.Name, e.Message, "", 0);
}
}
string command = "SELECT Id, Name, Surname FROM [" + currentCatalog + "].[TableName] WHERE Id = '" + Row.OrderID + "'";
But it would save me a lot of trouble if this was possible with the lookup component.
So my question is: Is it possible in any way to use column data to change what database to perform a Lookup with the Lookup component?
Grateful for any help!
What you can do is:
Goto control flow
Select your data flow task
Goto properties and select the lookup component
Create an expression for the lookup, you can reuse a query prepared in a script task.

how to check end of file in a csv file before processing it in ssis

I have created an SSIS package which processes .CSV files using a ForEachLoop container.
All the csv files contains "END OF FILE" in the last row.
Only those CSV files will be processed if it contains "END OF FILE" in the last row.
How can it be done. Please help.
Thanks in advance.
Create a variable check
Name DataType Value
check int 0
Let's say you have a package design like the one below
Script task is to check the file which has End of File at the last row
In the Script task add the variable check in ReadWriteVariable section and the output variable from ForEach container (suppose the variable name is LoopFiles) in ReadOnlyVariables
In the script task add the following code to read the file .There are several ways you can read the files here and here
public void Main()
{
int counter = 0;
string loop= Dts.Variables["User::LoopFiles"].Value.ToString();
string line;
using (StreamReader files = new StreamReader(file))
{
while((line = files.ReadLine()) != null)
{
if (line.ToLower() == "End Of File".ToLower())
{
Dts.Variables["User::check"].Value = 1;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Double Click the green arrow connection script task and Data Flow Task .A precedence dialog box will open and enter the expression as below
There are a number of ways that this could be done. One way would be:
Create the following variables:
EOF_Found Boolean
Row_Count Integer
Bring the data into a dataflow using the Flat File Source
Use a row count component to add the number of rows to Row_Count, to identify the last row later
Use a script component to loop through the rows, adding 1 to a counter for each row
When your counter equals the value in Row_Count (i.e. you are looking at the last row) check the value in the column that you expect "END OF FILE" to appear (depends on how you set up the flat file connection manager). if it equals "END OF FILE", change the value of EOF_Found to True
After the script component, add a derived column referencing the value in EOF_Found
Use a conditional split, checking the value of the derived column and only process if True
This solution avoids reading the entire file line by line. I have merged Praveen's code here for sake of completeness.
public void Main()
{
string line = ReadLastLine(#"c:\temp\EOF.cs");
if (line.ToUpper() == "END OF FILE")
{
Dts.Variables["User::check"].Value = 1;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
public static string ReadLastLine(string path)
{
StreamReader stream = new StreamReader(path);
string str = stream.ReadToEnd();
int i = str.LastIndexOf('\n');
string lastLine = str.Substring(i + 1);
return lastLine;
}