data omisson while using ms ace driver to read csv file - csv

There must be some explanation for this. My csv file is something like this:
CustomerID,FirstName,LastName,EmpID,EmployeeName
1,John,Smith,2,Smith
2,Wilber,Wright,3,Shaney
3,Gloria,Johnathan,4,Dick
Notice that some field names have ID on them. I execute the below code and try to view the datatable during debugging using the DataTable visualizer (in VS).
using System;
using System.Data.OleDb;
using System.Data;
namespace caOledbFileOpen
{
class Program
{
static void Main(string[] args)
{
OleDbConnection cxn = new OleDbConnection();
cxn.ConnectionString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\tools;Extended Properties='text;HDR=No;Delimiter(,)'";
cxn.Open();
OleDbCommand cmd = cxn.CreateCommand();
cmd.CommandText = "Select * from [OUt.csv]";
DataTable tbl = new DataTable();
OleDbDataAdapter adp = new OleDbDataAdapter(cmd);
adp.Fill(tbl);
Console.WriteLine("End");
Console.ReadLine();
}
}
}
I observe on the cells where it should show CustomerID or EmpID it appears blank.

The problem is that the OLEDB text driver infers the data types of your CSV 'columns' based on the data in the first rows of your data file. It sees the numbers in the first and fourth columns in your data and assumes that those columns are numeric even though the data in these columns' first row are not numeric. What you're seeing is the really annoying part of all this: columns with data that does not match the data type infered for those columns are not imported.
The solution here is to specify the data types of your columns by using a text file driver file with your CSV. A text file driver file is a text file that you create in the same folder as your CSV. The file is always named schema.ini. In the file you specify the CSV file name on the first line and the following lines define your CSV.
A schema.ini like this should work for you:
[test.csv]
ColNameHeader=False
Col1="My Field 1" Text
Col2="My Field 2" Text
Col3="My Field 3" Text
Col4="My Field 4" Text
Col5="My Field 5" Text
Here and here are links to more info on using schema.ini files

Also, in your schema.ini file add:
MaxScanRows=1
example:
[hdrdtl.txt]
Format=TabDelimited
ColNameHeader=True
MaxScanRows=1

Related

SSIS flat file with values containing text qualifier

I received a flat file that cannot be generated in other way. The delimited is a comma and the text qualifier is a double quote. The problem is that sometimes a have a double quote in the value. In example:
"0","12345", "Centre d"edu et de recherche", "B8E7"
Because of the double quote in the value, I received this error:
[Flat File Source [58]] Error: The column delimiter for column "XYZ" was not found.
[Flat File Source [58]] Error: An error occurred while processing file "C:\somefile.csv" on data row 296.
What can I do to process this file?
I use SSIS 2016 with Visual Studio 2015
You can use the Flat File Source error output to redirect bad rows to another flat file and correct values manually while all valid rows will be processed.
There are many links online to learn more about Flat File Source Error Output:
Flat File source Error Output connection in SSIS
How to Avoid Package Design Flaws When Sourcing Data From Flat Files
Flat File Source Editor (Error Output Page)
Update 1 - Workaround using Script Component and conditional split
Since Flat File error output is not working you can use a script component with a conditional split to filter bad rows, the following update is a step by step guide to implement that:
Add a Flat File connection manager, Go To advanced Tab, Delete all columns except one column and change it length to 4000
Add a script component, Go to Input and Output Column Tab, add desired output columns (in this example 4 columns) and add a Flag Column of type DT_BOOL
Inside the Script Component write the following script to check if the number of columns is 4 then Flag = True which means this is a valid row else set Flag as False which mean that this is a bad row:
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.Column0_IsNull && !String.IsNullOrWhiteSpace(Row.Column0))
{
string[] cells = Row.Column0.Split(new string[] { "\",\"" }, StringSplitOptions.None);
if (cells.Length == 4)
{
Row.Col1 = cells[0].TrimStart('\"');
Row.Col2 = cells[1];
Row.Col3 = cells[2];
Row.Col4 = cells[3].TrimEnd('\"');
Row.Flag = true;
}
else
{
bool cancel;
Row.Flag = false;
}
}
else
{
Row.Col1_IsNull = true;
Row.Col2_IsNull = true;
Row.Col3_IsNull = true;
Row.Col4_IsNull = true;
Row.Flag = true;
}
}
}
Add a conditional split to split rows based on Flag column
Map the Valid Rows output to the OLEDB Destination, and the Bad Rows output to another flat file where you only map Column0

Export Datatables from Spotfire to CSV using IronPython Script

I have a IronPython script I use to export all my data tables from a Spotfire project.
Currently it works perfectly. It loops through all datatables and exports them as ".xlsx". Now I need to export the files as ".csv" which I thought would be as simple as changing ".xlsx" to ".csv".
This script still exports the files, names them all .csv, but what is inside the file is a .xlsx, Im not sure how or why. The code is just changing the file extension name but not converting the file to csv.
Here is the code I am currently using:
I have posted the full code at the bottom, and the code I believe is relevant to my question in a separate code block at the top.
if(dialogResult == DialogResult.Yes):
for d in tableList: #cycles through the table list elements defined above
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
table = Document.Data.Tables[d[0]] #d[0] is the Data Table name in the Spotfire project (defined above)
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet() #OR pass the filter
stream = File.OpenWrite(savePath+'\\'+ d[1] +".csv") #d[1] is the Excel alias name. You could also use d.Name to export with the Data Table name
names = []
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
I think it may have to do with the ExcelXlsDataWriter.
I tried with ExcelXlsxDataWriter as well. Is there a csv writer I could use for this? I believe csv and txt files have a different writer.
Any help is appreciated.
Full script shown below:
import System
import clr
import sys
clr.AddReference("System.Windows.Forms")
from sys import exit
from System.Windows.Forms import FolderBrowserDialog, MessageBox, MessageBoxButtons, DialogResult
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, FileStream, FileMode
#This is a list of Data Tables and their Excel file names. You can see each referenced below as d[0] and d[1] respectively.
tableList = [
["TestTable1"],
["TestTable2"],
]
#imports the location of the file so that there is a default place to put the exports.
from Spotfire.Dxp.Application import DocumentMetadata
dmd = Application.DocumentMetadata #Get MetaData
path = str(dmd.LoadedFromFileName) #Get Path
savePath = '\\'.join(path.split('\\')[0:-1]) + "\\DataExports\\"
dialogResult = MessageBox.Show("The files will be save to "+savePath
+". Do you want to change location?"
, "Select the save location", MessageBoxButtons.YesNo)
if(dialogResult == DialogResult.Yes):
# GETS THE FILE PATH FROM THE USER THROUGH A FILE DIALOG instead of using the file location
SaveFile = FolderBrowserDialog()
SaveFile.ShowDialog()
savePath = SaveFile.SelectedPath
#message making sure that the user wants to exporthe files.
dialogResult = MessageBox.Show("Export Files."
+" Export Files","Are you sure?", MessageBoxButtons.YesNo)
if(dialogResult == DialogResult.Yes):
for d in tableList: #cycles through the table list elements defined above
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
table = Document.Data.Tables[d[0]] #d[0] is the Data Table name in the Spotfire project (defined above)
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet() #OR pass the filter
stream = File.OpenWrite(savePath+'\\'+ d[1] +".csv") #d[1] is the Excel alias name. You could also use d.Name to export with the Data Table name
names = []
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
#if the user doesn't want to export then he just gets a message
else:
dialogResult = MessageBox.Show("ok.")
For some reason the Spotfire Iron Python implementation does not support the csv package implemented in Python.
The workaround I found to your implementation is using StdfDataWriter instead of ExcelXsDataWriter. The STDF data format is the Spotfire Text Data Format. The DataWriter class in Spotfire supports only Excel and STDF (see here) and STDF comes closest to CSV.
from System.IO import File
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.StdfDataWriter)
table = Document.Data.Tables['DropDownSelectors']
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet()
stream = File.OpenWrite("C:\Users\KLM68651\Documents\dropdownexport.stdf")
names =[]
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
Hope this helps

SSIS write DT_NTEXT into an UTF-8 csv file

I need to write the result of an SQL query into a CSV file (UTF-8 (I need this encoding as there are French letters)). One of the columns is too large (more than 20000 char) so I can't use DT_WSTR for it. The type that is inputted is DT_TEXT so I use a Data Conversion to change it to DT_NTEXT. But then when I want to write it to the file I have this error message :
Error 2 Validation error. The data type for "input column" is
DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead
and convert the data to DT_NTEXT using the data conversion component
Is there a way I can write the data to my file?
Thank you
I had this kind of issues also sometimes. When working with data larger than 255 characters SSIS sees it as blob data and will always handle this as such.
I then converted this blob stream data to a readable text with a script component. Then other transformation should be possible.
This was the case in ssis that came with sql server 2008 but I believe this isn't changed yet.
I ended up doing just like Samyne says, I used a script.
First I've modified my SQL SP, instead of having several columns I put all the info in one single column like follows :
Select Column1 + '^' + Column2 + '^' + Column3 ...
Then I used this code in a script
string fileName = Dts.Variables["SLTemplateFilePath"].Value.ToString();
using (var stream = new FileStream(fileName, FileMode.Truncate))
{
using (var sw = new StreamWriter(stream, Encoding.UTF8))
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Dts.Variables["FileData"].Value);
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
sw.WriteLine(row[column]);
}
}
sw.WriteLine();
}
}
Putting all the info in one column is optional, I just wanted to avoid handling it in the script, this way if my SP is changed I don't need to modify the SSIS.

How to import comma delimited text file into datawindow (powerbuilder 11.5)

Hi good day I'm very new to powerbuilder and I'm using PB 11.5
Can someone know how to import comma delimited text file into datawindow.
Example Text file
"1234","20141011","Juan, Delacruz","Usa","001992345456"...
"12345","20141011","Arc, Ino","Newyork","005765753256"...
How can I import the third column which is the full name and the last column which is the account number. I want to transfer the name and account number into my external data window. I've tried to use the ImportString(all the rows are being transferred in one column only). I have three fields in my external data window.the Name and Account number.
Here's the code
ls_File = dw_2.Object.file_name[1]
li_FileHandle = FileOpen(ls_File)
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
li_FileRead = FileRead(li_FileHandle, ls_Text)
ll_row = dw_1.ImportString(ls_Text,1)
Loop.
Please help me with the code! Thank You
It seems that PB expects by default a tab-separated csv file (while the 'c' from 'csv' stands for 'coma'...).
Add the csv! enumerated value in the arguments of ImportString() and it should fix the point (it does in my test box).
Also, the columns defined in your dataobject must match the columns in the csv file (at least for the the first columns your are interested in). If there are mode columns in the csv file, they will be ignored. But if you want to get the 1st (or 2nd) and 3rd columns, you need to define the first 3 columns. You can always hide the #1 or #2 if you do not need it.
BTW, your code has some issues :
you should always test the return values of function calls like FileOpen() for stopping processing in case of non-existent / non-readable file
You are reading the text file twice for the first row: once before the while and another inside of the loop. Or maybe it is intended to ignore a first line with column headers ?
FWIF, here is a working code based on yours:
string ls_file = "c:\dev\powerbuilder\experiment\data.csv"
string ls_text
int li_FileHandle, li_fileread, li_count
long ll_row
li_FileHandle = FileOpen(ls_File)
if li_FileHandle < 1 then
return
end if
li_FileRead = FileRead(li_FileHandle, ls_Text)
DO WHILE li_FileRead > 0
li_Count ++
ll_row = dw_1.ImportString(csv!,ls_Text,1)
li_FileRead = FileRead(li_FileHandle, ls_Text)//read next line
Loop
fileclose(li_fileHandle)
use datawindow_name.importfile(CSV!,file_path) method.

Problem using oledb datatypes to write data to excel sheet

I am trying insert some data into excel sheet using oledb dataadapter which is obtained from MYSQL Db.This data obtained from mysql db contains very long texts whose datatypes in MYSQL have been defined as(Varchar(1023),Text,Longtext etc).When I try to pass these to the oledb Dataadapter I tried to use oledb.VarWChar,oledb.LongVarWChar with size 5000 and so on.But I am getting the following exception when I try to run da.update(...) command.
The field is too small to accept the amount of data you attempted to add. Try inserting or pasting less data
I am having trouble understanding what datatypes with what sizes should I use in oledb to map to these long text values.
Could someone please help me with this?
Thanks.
I am doing something similar and ran into the same error with varchar(max) data types that come from SQL Server. It doesn't matter where the data is coming from though. When you get the data from your database, you need to define the schema for the column data types and sizes. I do this by calling FillSchema on the data adapter that I am using -
DataTable dt = new DataTable();
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dt);
da.FillSchema(dt, SchemaType.Source);
return dt;
You could also set the column properties individually, if you wanted.
Then I loop through each column in my DataTable and set up my columns for export with oleDB using ADOX.NET. You don't have to use ADOX.NET, the main concept here is to use the sizes that came from the original database.
foreach (DataColumn col in dt.Columns)
{
columnList.Append(dt.TableName + "." + col.ColumnName + ", ");
ADOX.Column adoxCol = new Column();
adoxCol.ParentCatalog = cat;
adoxCol.Name = col.ColumnName;
adoxCol.Type = TranslateType(col.DataType, col.MaxLength);
int size = col.MaxLength > 0 ? col.MaxLength : 0;
if (col.AllowDBNull)
{
adoxCol.Attributes = ColumnAttributesEnum.adColNullable;
}
adoxTable.Columns.Append(adoxCol, adoxCol.Type, size);
}
Here is a snippet from my TranslateType method that determines whether or not to use the LongVarWChar or VarWChar. These data types are the ADOX.NET version of the oleDB data types. I believe that anything over 4000 characters should use the LongVarWChar type but I'm not sure about that. You didn't mention which version of Excel is your target, but I have this working with both Excel 2003 and Excel 2007.
case "System.String":
if (maxLength > 4000)
{
return DataTypeEnum.adLongVarWChar;
}
return DataTypeEnum.adVarWChar;
The LongVarWChar can take large sizes that can accomadate 2 GB. So don't worry about making the size too big.