Unable to read CSV file with double quotes in cell value - csv

I'm trying to read a CSV file with this kind of lines :
"A text";"Another text";"A text with ""quotes"""
In my Flat File connection, I filled the Text qualifier as ".
When I click on the Preview button, the lines are shown properly : A text with ""quotes"" (Shouldn't it show only one double quote btw ?)
But as soon as I try to execute the package, an error occurs saying that the column delimiter cannot be found:
[Source du fichier plat [1313]] Erreur*: «*Le séparateur de colonne pour la colonne «COL3» est introuvable.
If I remove those double double-quotes within the cell value it works fine.
Is there any way to make SSIS read those cells with double quotes in it ?
For the same data, you can see how 2008 versus 2012 will preview the data. Observe that Col2 either does, or does not escape the double quote (A text with "quotes" vs A text with ""quotes"")
The result of using the 2008 version is that it will fail with the following error messages
The column delimiter for column "Col2" was not found.
An error occurred while processing file "c:\ssisdata\so\input\so_36033443.txt" on data row 1.
A reproduction of the problem using Biml follows
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<FlatFileConnection
FilePath="c:\ssisdata\so\input\so_36033443.txt"
FileFormat="FFF_36033443"
Name="FFSRC" />
</Connections>
<FileFormats>
<FlatFileFormat
Name="FFF_36033443"
IsUnicode="false"
HeaderRowDelimiter=";"
CodePage="1252"
TextQualifer="""
>
<Columns>
<Column Name="Col0" DataType="AnsiString" Length="10" Delimiter=";" CodePage="1252"/>
<Column Name="Col1" DataType="AnsiString" Length="20" Delimiter=";" CodePage="1252"/>
<Column Name="Col2" DataType="AnsiString" Length="20" Delimiter="CRLF" CodePage="1252"/>
</Columns>
</FlatFileFormat>
</FileFormats>
<Packages>
<Package Name="so_36033443">
<Tasks>
<Dataflow Name="DFT Demo Delimiter">
<Transformations>
<FlatFileSource
ConnectionName="FFSRC"
Name="FFSRC so_36033443" />
<DerivedColumns Name="DER Placeholder" />
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

Related

Liquibase CSV loadData fails with quoted string containing a comma

I am trying to load CSV file into SQLserver table using Liquibase change log set.
When saved XLSX file as CSV file, column containing comma saved in double quotes (please see 3rd value below), this is fine as per standards but liquibase is ignoring double quotes and considering comma inside the double-quotes.
13,OV,"Diabetes outpatient self-management training services individual,per 30 minutes",77.82,1,0,1/4/2016,,G0108
Error messgae from command line terminal:
CSV file v2.1/r21/TestData20212021.csv Line 21 has 10 values defined, Header has 9. Numbers MUST be equal (check for unquoted string with embedded commas)
<changeSet author="sprint-developer" id="sprint1-09">
<loadData
file="v2.1/r21/TestData2021.csv"
tableName = "tbl_Votes" encoding="UTF-8" >
<column header="VcenarioID" name="VcenarioID" type="numeric"/>
<column header="venefitCode" name="venefitCode" type="string"/>
<column header="KostDescription" name="KostDescription" type="string"/>
<column header="Kost" name="Kost" type="NUMERIC"/>
<column header="OcKurrences" name="OKcurrences" type="numeric"/>
<column header="KostIsPerIncident" name="KostIsPerIncident" type="boolean"/>
<column header="KostDate" name="KostDate" type="date"/>
<column header="VundleId" name="VundleId" type="NUMERIC"/>
<column header="VillingCode" name="VillingCode" type="string"/>
</loadData>
<rollback>Delete from tbl_Votes where VcenarioID=13 </rollback>
</changeSet>
Try adding quotchar='"' to your changeSet. This should tell liqbuiase to treat everything inside "" as one single value.
Check out loadData docs.
So your changeSet could look like this:
<changeSet author="sprint-developer" id="sprint1-09">
<loadData
file="v2.1/r21/TestData2021.csv"
tableName = "tbl_Votes" encoding="UTF-8" quotchar='"'>
<column header="VcenarioID" name="VcenarioID" type="numeric"/>
<column header="venefitCode" name="venefitCode" type="string"/>
<column header="KostDescription" name="KostDescription" type="string"/>
<column header="Kost" name="Kost" type="NUMERIC"/>
<column header="OcKurrences" name="OKcurrences" type="numeric"/>
<column header="KostIsPerIncident" name="KostIsPerIncident" type="boolean"/>
<column header="KostDate" name="KostDate" type="date"/>
<column header="VundleId" name="VundleId" type="NUMERIC"/>
<column header="VillingCode" name="VillingCode" type="string"/>
</loadData>
<rollback>Delete from tbl_Votes where VcenarioID=13 </rollback>
</changeSet>

Biml Master-Child package connections

In a 2008 BIDS/SQL Server/SSIS dev environment (along with BIDS Helper v1.70), I'm trying to create a biml master package that executes the child packages already built under Rootnode. Also using a config file to be able to run the entire process on different servers.
Config File
?xml version="1.0"?>
<DTSConfiguration>
<DTSConfigurationHeading><DTSConfigurationFileInfo GeneratedBy="XXXXXX" GeneratedDate="7/28/2016 1:28:29 PM"/></DTSConfigurationHeading>
<Configuration ConfiguredType="Property" Path="\Package.Connections[dw].Properties[ConnectionString]" ValueType="String">
<ConfiguredValue>Data Source=imsqldv50s\euc;Initial Catalog=CDODW;Provider=SQLNCLI10.1;Integrated Security=SSPI;</ConfiguredValue>
</Configuration>
<Configuration ConfiguredType="Property" Path="\Package.Connections[PkgFile].Properties[ConnectionString]" ValueType="String">
<ConfiguredValue>\\IMSQLDV50s\EUCPACKAGES\CDODW\Load CDODW Tables Biml\</ConfiguredValue>
</Configuration>
<Configuration ConfiguredType="Property" Path="\Package.Variables[User::ChildPackagePath].Properties[Value]" ValueType="String">
<ConfiguredValue>\\IMSQLDV50s\EUCPACKAGES\CDODW\Load CDODW Tables Biml\</ConfiguredValue>
</Configuration>
</DTSConfiguration>
The building of the child packages has been tested and passed. Now we are attempting to build the Master Package.
05-load-edw-master.biml
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name ="Master EDW Load" ConstraintMode ="Linear">
<PackageConfigurations>
<PackageConfiguration Name="dw">
<ExternalFileInput ExternalFilePath="C:\SSISConfig\CDODW_ETL_Load.dtsConfig" />
<ConfigurationValues>
<ConfigurationValue DataType="String" Name="dw" PropertyPath="\Package.Connections[dw].Properties[ConnectionString]" Value=""></ConfigurationValue>
</ConfigurationValues>
</PackageConfiguration>
<PackageConfiguration Name="PkgFile">
<ExternalFileInput ExternalFilePath="C:\SSISConfig\CDODW_ETL_Load.dtsConfig" />
<ConfigurationValues>
<ConfigurationValue DataType="String" Name="PkgFile" PropertyPath="\Package.Connections[PkgFile].Properties[ConnectionString]" Value=""></ConfigurationValue>
</ConfigurationValues>
</PackageConfiguration>
</PackageConfigurations>
<#=CallBimlScript("cbs-pkg-params-variables.biml", "No Table", "master-load", "","","")#>
<!--
<Connections>
<Connection ConnectionName="dw" />
<Connection ConnectionName="PkgFile">
<Expressions>
<Expression PropertyName="PkgFile.ConnectionString">#[User::ChildPackagePath]</Expression>
</Expressions>
</Connection>
</Connections>
-->
<Tasks>
<#=CallBimlScript("cbs-sql-audit-begin.biml", "master-load")#>
<Container Name="SEQ Load Dimensions" ConstraintMode="Linear">
<Tasks>
<# foreach (var package in RootNode.Packages.Where(pkg => pkg.GetTag("type")=="load-edw-dim").OrderBy(pkg => pkg.GetTag("LoadOrder"))) { #>
<ExecutePackage Name="EP <#=package.Name#>" DelayValidation="true">
<Package PackageName="<#=package.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Container>
<Container Name="SEQ Load Facts" ConstraintMode="Linear">
<Tasks>
<# foreach (var package in RootNode.Packages.Where(pkg => pkg.GetTag("type")=="load-edw-fact").OrderBy(pkg => pkg.GetTag("LoadOrder"))) { #>
<ExecutePackage Name="EP <#=package.Name #>" DelayValidation="true">
<Package PackageName="<#=package.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Container>
<#=CallBimlScript("cbs-sql-audit-end.biml")#>
</Tasks>
<Annotations>
<Annotation AnnotationType="Tag" Tag="type">master-load</Annotation>
</Annotations>
</Package>
</Packages>
</Biml>
<## template language="C#" tier="5"#>
The Connections have already been defined in a tier 1 file
After generating the packages, I note that the Biml engine creates Connection Managers using a convention "_" + Master Package Name.SequenceContainerName.ExecutePackageName, with a connection string pointing to the local file path. And it's doing so "under the covers" as there is no clue in the expanded biml file of how it's done!
Is there a nice simple way to interject a passed-in file path from the config file that can be recognized and used to build each FileConnection's data? I thought it would make sense to store the relevant file location in a variable (fed from the config file) and somehow use that to develop the ConnectionString from the package name garnered from the foreach snippet, but the engine doesn't appear to like that.
Any help is appreciated.
Thanks!

SSIS BIML Derived Column syntax for expressions

I am defining a Derived Column transformation in BIML but I am having trouble referencing the output from the previous Excel Source in my Derived Column transformation.
I receive the error upon opening the package after successfully generating the SSIS package and it suggests that it the Derived Transformation cannot find the output from the Excel Source.
Error 2 Error loading AFR_ShareTableBIML.dtsx: The object
"/DTS:Executable/DTS:Executables/DTS:Executable/DTS:ObjectData/pipeline/components/component/inputs/input/inputColumns/inputColumn/properties/property"
references ID "#{Package\Data Flow {Import Share Table CSV}\Source
{Flat File Share Table}.Outputs[Output].Columns[Div c per share]}",
but no object in the package has this ID.
Here is a code snippet:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="FFF_AFRShareTable" ColumnNamesInFirstDataRow="true"
FlatFileType="Delimited" IsUnicode="false" TextQualifer="None" HeaderRowsToSkip="6">
<Columns>
<Column Name="Quote Buy" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter=","></Column>
<Column Name="Quote Sell" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter=","></Column>
<Column Name="Div c per share" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter=","></Column>
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<FlatFileConnection Name="FF_AFRShareTable" FileFormat="FFF_AFRShareTable"
FilePath="C:\Temp\Stocks.csv"></FlatFileConnection>
<OleDbConnection Name="CMD DB"
ConnectionString="Data Source=Localhost;Initial Catalog=DB;Provider=SQLNCLI11.1;Integrated Security=SSPI;" CreateInProject="true">
</OleDbConnection>
</Connections>
<Packages>
<Package Name="AFR_ShareTableBIML" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
<Tasks>
<ExecuteSQL Name="SQLTask {OLE_DB} Truncate Security Share Table" ConnectionName="CMD DB">
<DirectInput>truncate table Staging.SecurityShareTable</DirectInput>
</ExecuteSQL>
<Dataflow Name="Data Flow {Import Share Table CSV}">
<Transformations>
<FlatFileSource Name="Source {Flat File Share Table}" ConnectionName="FF_AFRShareTable"></FlatFileSource>
<DerivedColumns Name="DER_NullifyColumns">
<Columns>
<Column Name ="DER_DPS" DataType = "Decimal" Precision="4">
[Div c per share] == "-" ? NULL(DT_DECIMAL, 4) : (DT_DECIMAL, 4)[Div c per share]
</Column>
</Columns>
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
I have already defined the column name via the FlatFileFormat and I have confirmed that the expression in the DER_DPS column is is syntactically correct. I found that through replacing the square brackets "[" and "]" with double apostrophes, the SSIS package can be opened. For example:
"Div c per share" == "-" ? NULL(DT_DECIMAL, 4) : (DT_DECIMAL, 4) "Div c per share"
However there are derived column transformation errors on incorrect syntax. Are square brackets special characters in BIML that I need to escape?
That was ... interesting.
It appears that your use of curly braces in your component names causes the Biml expansion to go haywire.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="FFF_AFRShareTable" ColumnNamesInFirstDataRow="true"
FlatFileType="Delimited" IsUnicode="false" TextQualifer="None" HeaderRowsToSkip="6">
<Columns>
<Column Name="Quote Buy" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter=","></Column>
<Column Name="Quote Sell" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter=","></Column>
<!-- Change -->
<Column Name="Div c per share" ColumnType="Delimited" DataType="AnsiString" Length ="50" Delimiter="CRLF"></Column>
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<FlatFileConnection Name="FF_AFRShareTable" FileFormat="FFF_AFRShareTable"
FilePath="C:\ssisdata\so\input\Stocks.csv"></FlatFileConnection>
<OleDbConnection Name="CMD DB"
ConnectionString="Data Source=Localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI11.1;Integrated Security=SSPI;"
CreateInProject="false">
</OleDbConnection>
</Connections>
<Packages>
<Package Name="so_37641290_AFR_ShareTableBIML" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
<Tasks>
<ExecuteSQL Name="SQLTask OLE_DB Truncate Security Share Table" ConnectionName="CMD DB">
<DirectInput>truncate table Staging.SecurityShareTable</DirectInput>
</ExecuteSQL>
<Dataflow Name="Data Flow Import Share Table CSV">
<Transformations>
<FlatFileSource Name="Source Flat File Share Table" ConnectionName="FF_AFRShareTable"></FlatFileSource>
<DerivedColumns Name="DER_NullifyColumns">
<Columns>
<Column Name="DER_DPS" DataType="Decimal" Precision="4"><![CDATA[[Div c per share] == "-" ? NULL(DT_DECIMAL, 4) : (DT_DECIMAL, 4)[Div c per share]]]></Column>
</Columns>
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
The above biml works for me. Changes I made:
removed { and } from the tasks and component names
updated the last Column definition within your FlatFileFormat Columns collection to have a delimiter of CRLF instead of ,
I used the CDATA tag for the expression. Not needed here but if you had a > or < in there, then you'd need to escape them as either < or the CDATA approach as I used.
I also cleaned up the Derived Column's entity assignments. There were spaces around the equals and I don't believe those are supposed to be there.
Path updates for flat file + OLE DB to work with my setup.
Source data
0
1
2
3
4
5
Quote Buy,Quote Sell,Div c per share
1,1,1
2,2,2
3,3,-
Results

FuzzyLookup in BIML

I'm trying to do the following in BIML:
I'm at a bit of a loss on how to do this in BIML. Here is what I've tried:
<FuzzyLookup
Name="Fuzzy Lookup"
ConnectionName="WO7"
Exhaustive="true"
AutoPassThroughInputColumns="true"
>
<ExternalReferenceTableInput Table="map.AgencyWO7" />
<Inputs>
<Column SourceColumn="AgencyName" TargetColumn="AgencyName" />
</Inputs>
<Outputs>
<Column SourceColumn="AgencyId" TargetColumn="AgencyIdWO7" />
<Column SourceColumn="AgencyName" TargetColumn="AgencyNameWO7" />
</Outputs>
The result is the following error:
(-1,-1) : Error 5 : The input column for the
Fuzzy Lookup Fuzzy Lookup references external column that cannot be found in the reference table. Verify that the
input mapping references a valid column in the reference table.
Property TargetColumn. EmitSsis. There were errors during compilation.
See compiler output for more information.
I think you are maybe missing a reference to the previous transform which is effectively the joining arrow, had you been using SSDT.
Also the format I use to set passthrough = true is on a per column basis.
<FuzzyLookup Name="Fuzzy Lookup" MatchIndexName="" ConnectionName="WO7">
<InputPath OutputPathName="[Previous Transform Name].Output" />
<ExternalReferenceTableInput Table="map.AgencyWO7" />
<Inputs>
<Column MinSimilarity="85" MatchTypeExact="true" PassThrough="true" SourceColumn="AgencyName" TargetColumn="AgencyName" />
</Inputs>
<Outputs>
<Column SourceColumn="AgencyId" TargetColumn="AgencyIdWO7" />
<Column SourceColumn="AgencyName" TargetColumn="AgencyNameWO7" />
</Outputs>
</FuzzyLookup>
Try the above code, and if all else fails you can design the fuzzy look up in SSDT and then import it into biml using Mist/BimlStudio which is pretty reliable.
https://varigence.com/Mist
Cheers

Bulk Import task with text qualifier

SSIS Bulk Import data transformation Row/Column delimiter definition. Trying to import from a csv file to sql table. After doing the import data/values come as follows
Col1 col2 col3
"XXX" "BBN" "BBB"
"XXX" "BBN" "BBB"
"XXX" "BBN" "BBB"
data/values are wrapped around double quotes.
How can i fix this.
Connection manager for csv file has text qualifer: " ,
header row delimiter has {CR}{LR}
unforunately bulk import doesn't have text qualifier
Here's a way to setup an XML format file to handle CSV with double quotes:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="0" xsi:type="CharFixed" LENGTH="1"/>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="","" MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="","" MAX_LENGTH="38"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="","" MAX_LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="","" MAX_LENGTH="11" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="SequenceID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="TransactionID" xsi:type="SQLUNIQUEID"/>
<COLUMN SOURCE="3" NAME="Product_Name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="Product_Code" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
If you want to handle pipe-delimited instead, change "","" to ""|""