SSIS flat file with values containing text qualifier - csv

I received a flat file that cannot be generated in other way. The delimited is a comma and the text qualifier is a double quote. The problem is that sometimes a have a double quote in the value. In example:
"0","12345", "Centre d"edu et de recherche", "B8E7"
Because of the double quote in the value, I received this error:
[Flat File Source [58]] Error: The column delimiter for column "XYZ" was not found.
[Flat File Source [58]] Error: An error occurred while processing file "C:\somefile.csv" on data row 296.
What can I do to process this file?
I use SSIS 2016 with Visual Studio 2015

You can use the Flat File Source error output to redirect bad rows to another flat file and correct values manually while all valid rows will be processed.
There are many links online to learn more about Flat File Source Error Output:
Flat File source Error Output connection in SSIS
How to Avoid Package Design Flaws When Sourcing Data From Flat Files
Flat File Source Editor (Error Output Page)
Update 1 - Workaround using Script Component and conditional split
Since Flat File error output is not working you can use a script component with a conditional split to filter bad rows, the following update is a step by step guide to implement that:
Add a Flat File connection manager, Go To advanced Tab, Delete all columns except one column and change it length to 4000
Add a script component, Go to Input and Output Column Tab, add desired output columns (in this example 4 columns) and add a Flag Column of type DT_BOOL
Inside the Script Component write the following script to check if the number of columns is 4 then Flag = True which means this is a valid row else set Flag as False which mean that this is a bad row:
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.Column0_IsNull && !String.IsNullOrWhiteSpace(Row.Column0))
{
string[] cells = Row.Column0.Split(new string[] { "\",\"" }, StringSplitOptions.None);
if (cells.Length == 4)
{
Row.Col1 = cells[0].TrimStart('\"');
Row.Col2 = cells[1];
Row.Col3 = cells[2];
Row.Col4 = cells[3].TrimEnd('\"');
Row.Flag = true;
}
else
{
bool cancel;
Row.Flag = false;
}
}
else
{
Row.Col1_IsNull = true;
Row.Col2_IsNull = true;
Row.Col3_IsNull = true;
Row.Col4_IsNull = true;
Row.Flag = true;
}
}
}
Add a conditional split to split rows based on Flag column
Map the Valid Rows output to the OLEDB Destination, and the Bad Rows output to another flat file where you only map Column0

Related

SSIS 2016 - ErrorColumn is 0 (zero)

I have a package with a bunch of oledb Destinations, using SSIS 2016 - which is supposed to show the exact column that generated the error.The ErrorColumn shows (0) zero, therefore I am unable to trap the column that generated the error.
Using the script below (with code that assigns "Unknown column" but it does not help, it just avoids the script fail):
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//IDTSComponentMetaData130 componentMetaData = ComponentMetaData as IDTSComponentMetaData130;
//var component130 = this.ComponentMetaData as IDTSComponentMetaData130;
//if (component130 != null)
//{
// System.Windows.Forms.MessageBox.Show(component130.GetIdentificationStringByID(Row.ErrorColumn));
// Row.ErrorColumnName = component130.GetIdentificationStringByID(Row.ErrorColumn);
//}
IDTSComponentMetaData130 componentMetaData = ComponentMetaData as IDTSComponentMetaData130;
if (componentMetaData != null)
{
//
if (Row.wkpErrorColumn != 0)
Row.wkpErrorColumnName = componentMetaData.GetIdentificationStringByID(Row.wkpErrorColumn);
else
Row.wkpErrorColumnName = "Unknown column";
}
else
{
Row.wkpErrorColumnName = "Cannot determine";
}
Row.wkpErrorDescription = ComponentMetaData.GetErrorDescription(Row.wkpErrorCode);
}
In SSIS 2008 (and I believe also SSIS 2016), a zero error column identifies an error that affects the entire row. In the example below, I have created a package that contains only 1 data flow task in the control flow and redirecting all errors and truncations to the error output. Placed a row count task just to have somewhere to send them to.
Also placed data viewers in both error flows to see the data coming out of it.
In a package consuming data from a flat file into an OLE DB Destination:
Using these values as input data
And having name as the PK of the table
We get a PK violation, check the error description and the error column
So when the error affects the entire record, it gets an error column value of 0.
Hope this helps.
I found that
When the error is generated by the PK, the error affect the entire row, and the error column is 0
When the error is generated by a Fk, The error affect only the column, and you get the error column value different then 0

SSIS write DT_NTEXT into an UTF-8 csv file

I need to write the result of an SQL query into a CSV file (UTF-8 (I need this encoding as there are French letters)). One of the columns is too large (more than 20000 char) so I can't use DT_WSTR for it. The type that is inputted is DT_TEXT so I use a Data Conversion to change it to DT_NTEXT. But then when I want to write it to the file I have this error message :
Error 2 Validation error. The data type for "input column" is
DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead
and convert the data to DT_NTEXT using the data conversion component
Is there a way I can write the data to my file?
Thank you
I had this kind of issues also sometimes. When working with data larger than 255 characters SSIS sees it as blob data and will always handle this as such.
I then converted this blob stream data to a readable text with a script component. Then other transformation should be possible.
This was the case in ssis that came with sql server 2008 but I believe this isn't changed yet.
I ended up doing just like Samyne says, I used a script.
First I've modified my SQL SP, instead of having several columns I put all the info in one single column like follows :
Select Column1 + '^' + Column2 + '^' + Column3 ...
Then I used this code in a script
string fileName = Dts.Variables["SLTemplateFilePath"].Value.ToString();
using (var stream = new FileStream(fileName, FileMode.Truncate))
{
using (var sw = new StreamWriter(stream, Encoding.UTF8))
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Dts.Variables["FileData"].Value);
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
sw.WriteLine(row[column]);
}
}
sw.WriteLine();
}
}
Putting all the info in one column is optional, I just wanted to avoid handling it in the script, this way if my SP is changed I don't need to modify the SSIS.

JMeter - Specify CSV row failure

Within JMeter, I am running a script which uses a .CSV file to enter data as well as verify results. It is working correctly, but I cannot figure out how to tell which row/line of the .CSV caused the individual failures. Is there a way to do this?
Somewhat of an example scenario (not specific to what I'm doing, but similar):
Each row of the .CSV file contains a mathematical equation as well as the expected result.
On page 1, enter the equation (2+2)
Then on Page 2, you get the response: 3.
That test would obviously be a failure.
Say there are 1,000 tests being ran, some that pass and some that do not. How can I tell which .CSV row/line didn't pass?
Do you have any columns in your CSV file which help you to uniquely identify a row?
Let me assume you have a column called 'TestCaseNo' which has values as TC001, TC002,TC003...etc
Add a Beanshell Post Processor to store the result for each iteration.
Add the below code. I assume the you have the PASS or FAIL result stored in the 'Result' variable.
import java.io.File;
import org.apache.jmeter.services.FileServer;
f = new FileOutputStream("someptah/tcstatus.csv", true);
p = new PrintStream(f);
p.println( vars.get("TestCaseNo") + "," + vars.get("Result"));
p.close();
f.close();
The above code creates a CSV file with the results for each testcase.
EDIT:
Do the assertion yourself in the Beanshell post processor.
import java.io.File;
import org.apache.jmeter.services.FileServer;
Result = "FAIL";
Response = prev.getResponseDataAsString();
if (Response.contains("value")) // replace the value with the expected text
Result = "PASS";
f = new FileOutputStream("someptah/tcstatus.csv", true);
p = new PrintStream(f);
p.println( vars.get("TestCaseNo") + "," + Result);
p.close();
f.close();
I would use the following approach
__CSVRead() function - to get data from the .csv file.
__counter function - to track CSV file position. You can include counter variable name into Sampler's label so current .csv file line could be reported. See below image for example
For more information on aforementioned and other useful JMeter functions see How to Use JMeter Functions posts series.

SSIS flat file loading by checking conditions

I've 10 flat files(.dat format) in a folder which need to be uploaded into database everyday at a scheduled time.
All files related information are present in a database like File name, file path, table name, column names and delimiter.
We need to check each file exists or not, if not, need to log an entry, "File Not Found".
If the file exists, it needs to check for a trailer record(the last record in the file which says, Count=00001000, It has to be the count of number of records in that particular file).
If the trailer record not exists, then need to log an entry "No trailer record found".If the trailer record says zero count, then a log entry has to be made "Zero count" and also, if the counts of the file are not matching a log entry is needed, "Count mismatch".
If all the conditions are satisfied then data need to be loaded into database for each of the file.
Please suggest your ideas to implement the above scenario. Thanks!!!
Following solution may help you to resolve the issue.
Use the For each loop container with "Item" enumerator. Since you have 10 files and if something missing you need raise then you should use this. File enumerator just iterate through the files, not raises any error.
Following are Steps.
Create following SSIS package with variables.
FileFullPath
IsValidated
For each loop enumerator should be configured as following screenshots.
Configuartion in collection:
configuration in Variable section
Inside the container have a script task. you have to mention the FileFullPath as readonly variable and IsValidate as read and write like the following screen.
Click Edit script and insert the following code.
public void Main()
{
Dts.Variables["IsValidated"].Value = true;
string fileFullPath = Dts.Variables["FileFullPath"].Value.ToString();
if (!File.Exists(fileFullPath))
{
var msg = String.Format("File is not available in location : {0}", fileFullPath);
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
//Read last line
String lstLine = File.ReadLines(fileFullPath).Last();
int totalCount = 0;
bool talierExists = int.TryParse(lstLine, out totalCount);
if (!talierExists)
{
var msg = String.Format("No tailer row found and last line is : {0}", lstLine);
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
//Total count
int fullCount = File.ReadLines(fileFullPath).Count();
if (fullCount != totalCount)
{
var msg = String.Format("No of count is not matching, tailer count = {0} and full count={1}");
Dts.Events.FireError(0, "Dat file loading", msg, string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
Dts.Variables["IsValidated"].Value = true;
Dts.TaskResult = (int)ScriptResults.Success;
}
After that have your Data flow. Connect the script task with your data flow and right click on the connector and go to edit and configure as follows.
Your SSIS package will looks like follows.
Hope this helps!

how to check end of file in a csv file before processing it in ssis

I have created an SSIS package which processes .CSV files using a ForEachLoop container.
All the csv files contains "END OF FILE" in the last row.
Only those CSV files will be processed if it contains "END OF FILE" in the last row.
How can it be done. Please help.
Thanks in advance.
Create a variable check
Name DataType Value
check int 0
Let's say you have a package design like the one below
Script task is to check the file which has End of File at the last row
In the Script task add the variable check in ReadWriteVariable section and the output variable from ForEach container (suppose the variable name is LoopFiles) in ReadOnlyVariables
In the script task add the following code to read the file .There are several ways you can read the files here and here
public void Main()
{
int counter = 0;
string loop= Dts.Variables["User::LoopFiles"].Value.ToString();
string line;
using (StreamReader files = new StreamReader(file))
{
while((line = files.ReadLine()) != null)
{
if (line.ToLower() == "End Of File".ToLower())
{
Dts.Variables["User::check"].Value = 1;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Double Click the green arrow connection script task and Data Flow Task .A precedence dialog box will open and enter the expression as below
There are a number of ways that this could be done. One way would be:
Create the following variables:
EOF_Found Boolean
Row_Count Integer
Bring the data into a dataflow using the Flat File Source
Use a row count component to add the number of rows to Row_Count, to identify the last row later
Use a script component to loop through the rows, adding 1 to a counter for each row
When your counter equals the value in Row_Count (i.e. you are looking at the last row) check the value in the column that you expect "END OF FILE" to appear (depends on how you set up the flat file connection manager). if it equals "END OF FILE", change the value of EOF_Found to True
After the script component, add a derived column referencing the value in EOF_Found
Use a conditional split, checking the value of the derived column and only process if True
This solution avoids reading the entire file line by line. I have merged Praveen's code here for sake of completeness.
public void Main()
{
string line = ReadLastLine(#"c:\temp\EOF.cs");
if (line.ToUpper() == "END OF FILE")
{
Dts.Variables["User::check"].Value = 1;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
public static string ReadLastLine(string path)
{
StreamReader stream = new StreamReader(path);
string str = stream.ReadToEnd();
int i = str.LastIndexOf('\n');
string lastLine = str.Substring(i + 1);
return lastLine;
}