How to manage giant fixed-width file in SSIS? - ssis

I have a fixed width file that is about 1200 characters wide and has about 300+ columns. I'm looking for a way to create a fixed-width data source in SSIS without using the UI for the flat file connection manager. Is there a way to modify the column definitions without having to use the UI in SSIS? I can't find a file for the data connection anywhere in the project.
Am I doomed to manually add 300+ columns into the flat-file connection manager one by one?

Two options come to mind. The first is to Install BIDSHelper and use the Create Fixed Width Columns
The other, as #ElectricLlama mentioned is to use BIML. This too will require the installation of BIDS Helper but to convert a .biml file into a .dtsx Short Walkthrough
This should approximate creating a package with a flat file connection manager (with a single column) adding a data flow and inside that consume our flat file and wire it up to a Row count. This is approximate for what you want. Just fill in the XML in the Columns tag.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<FlatFileConnection
Name="FF dchess"
FileFormat="FFF dchess"
FilePath="C:\ssisdata\SO\Input\dchess.txt"
/>
</Connections>
<FileFormats>
<FlatFileFormat
Name="FFF dchess"
CodePage="1252"
RowDelimiter="CRLF"
IsUnicode="false"
FlatFileType="RaggedRight"
>
<Columns>
<Column Name="MyColumn" Length="08" DataType="AnsiString" ColumnType="FixedWidth" CodePage="1252" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Packages>
<Package Name="dchess" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
<Connections >
<Connection ConnectionName="FF dchess" />
</Connections>
<Variables>
<Variable Name="CurrentFileName" DataType="String">C:\ssisdata\so\Input\dchess.txt</Variable>
<Variable Name="RowCountInsert" DataType="Int32">0</Variable>
</Variables>
<Tasks>
<Dataflow Name="DFT Load file" >
<Transformations>
<FlatFileSource
Name="FF_SRC dchess"
ConnectionName="FF dchess"
RetainNulls="true">
</FlatFileSource>
<RowCount Name="CNT Source" VariableName="User.RowCountInsert"></RowCount>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Generated package looks like
Feel free to pick your jaw up off the ground ;)

Related

Referencing Project Parameters in BIML

I've been using Catherine W's post on creating project parameters in BIML with some luck. What I'm having a problem with though is setting the expression of a local parameter equal to the project parameter. It's most likely just an XML formatting issue, but I haven't found any examples of it out on the web and have not figured it out on my own yet. So, any suggestions would be most helpful.
Here's the definition of my project parameters that is in my environment BIML file.
<Projects>
<PackageProject Name="ProjParams">
<Parameters>
<Parameter Name="AgentJobName" DataType="String"></Parameter>
<Parameter Name="LoadType" DataType="String">Full</Parameter>
</Parameters>
</PackageProject>
</Projects>
Then under Packages \ Package I have the Variables. I am defining a user variable named LoadType and setting it to the package variable of LoadType in an expression. (There's something in the package that wouldn't use package parameters so I had to create a user variable) I know the reference to #[$Package::LoadType] is incorrect, but that's what I'm trying to figure out. What should it be to get BIML to put in a package parameter?
<Variables>
<Variable EvaluateAsExpression="true" DataType="String" IncludeInDebugDump="Exclude" Name="LoadType">#[$Package::LoadType]</Variable>
Thanks everyone!
It's working for me
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Projects>
<PackageProject Name="so">
<Parameters>
<Parameter DataType="String" Name="ProjectParameter" >Demo0</Parameter>
</Parameters>
<Packages>
<Package PackageName="so_43721322" />
</Packages>
</PackageProject>
</Projects>
<Packages>
<Package Name="so_43721322">
<Parameters>
<Parameter DataType="String" Name="PackageParameter">Demo1</Parameter>
</Parameters>
<Variables>
<Variable Name="PackageParameter" DataType="String" EvaluateAsExpression="true">#[$Package::PackageParameter]</Variable>
<Variable Name="ProjectParameter" DataType="String" EvaluateAsExpression="true">#[$Project::ProjectParameter]</Variable>
</Variables>
</Package>
</Packages>
</Biml>
I create a project and a package level parameter and then create two variables within my package, each referencing the parameter (#[$Project::ProjectParameter] and #[$Package::PackageParameter])
Am I missing some nuance?

How to skip specific Execution Plan Steps?

In the TaskBlockService there is a POST call that one or more steps should be skipped. There is not a good example given how the posted XML (List of String) the paths of the steps to skip.
Tried the following content for the POSTed data:
curl -X POST https://xldeploy.company.com/deployit/tasks/v2/5e917094-d054-4cc7-940e-89d851ca225a/skip
File remove-steps.xml content - sample 1:
<list>
<string>0_1_1</string>
</list>
File remove-steps.xml content - sample 2:
<list>
<string>0-1-1</string>
</list>
The first format you list is right, but you have to make sure you're using a step path and not just the path to a block.
Lets say you get the blocks from your deployment plan with this call.
curl -uadmin:password http://localhost:4516/deployit/tasks/v2/28830810-5104-4ab9-9826-22f66dee265d
This will produce the result:
<task id="28830810-5104-4ab9-9826-22f66dee265d" failures="0" state="PENDING" owner="admin">
<description>Initial deployment of Environments/local/TestApp001</description>
<activeBlocks/>
<metadata>
<environment>local</environment>
<taskType>INITIAL</taskType>
<environment_id>Environments/local</environment_id>
<application>TestApp001</application>
<version>1.0</version>
</metadata>
<block id="0" state="PENDING" description="" root="true">
<block id="0_1" state="PENDING" description="Deploy" phase="true">
<block id="0_1_1" state="PENDING" description="Deploy TestApp001 1.0 on environment local"/>
</block>
</block>
<dependencies/>
If you want to see the steps in block 0_1_1 then you can use this rest call to get the steps.
curl -uadmin:password http://local6/deployit/tasks/v2/28830810-5104-4ab9-9826-22f66dee265d/block/0_1_1/step
<block id="0_1_1" state="PENDING" description="Deploy TestApp001 1.0 on environment local" current="0">
<step failures="0" state="PENDING" description="Execute Command"/>
<step failures="0" state="PENDING" description="Copy File001.txt to Infrastructure/localhost"/>
The steps are numbered within the block starting from 1. So if you are want to skip the step - Copy File001.txt to Infrastructure/localhost the step path is 0_1_1_2. Your XML will look like:
<list>
<string>0_1_1_2</string>
</list>

BIML Flat File Format with VARCHAR(MAX) Column

I have so far successfully used BIML to auto generate SSIS package (from CSV to SQL Server). But I got into problems where ever I have Varchar(MAX) columns in the Flat File Format.
The problem is If I define a column of type AnsiString with size -1 in the Flat file format, the output SSIS package shows the below warning
The metadata of the following output columns does not match the
metadata of the external columns with which the output columns are
associated.
If I click Yes, the problem is fixed by itself, but that would be my last option as I have 150 packages.
When I checked the Advanced options of Flat File Source Component I can see a difference in data type for the column Comments, External Columns show as DT_TEXT where as the Output Columns show DT_STR. :(
What I don't understand is why the Output columns showing a different data type only for Varchar(Max) when all others are working fine. Aren't the output columns generated from External columns?
Please see the biml code below.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="MetadataFileFormat" RowDelimiter="LF" ColumnNamesInFirstDataRow="true" IsUnicode="false">
<Columns>
<Column Name="Category" DataType="AnsiString" Length="128" Delimiter="|" CodePage="1252" />
<Column Name="Comments" DataType="AnsiString" Length="-1" Delimiter="|" />
<Column Name="DisplayName" DataType="AnsiString" Length="256" Delimiter="CRLF" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<FlatFileConnection Name="FF_Test" FilePath="C:\Data\Sample.csv" FileFormat="MetadataFileFormat">
</FlatFileConnection>
</Connections>
<Packages>
<Package Name="FFTest" ConstraintMode="Linear">
<Tasks>
<Dataflow Name="DFT Load Data">
<Transformations>
<FlatFileSource Name="FF_SRC" ConnectionName="FF_Test">
</FlatFileSource>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Within a dataflow a DT_STR is bounded between lengths of 0 to 8000. The Flat File Connection Manager is happy to let you specify a length greater than 8k.
However, when you try to use that in a data flow, the component is going to report that it's not a valid length
And it makes sense if you know the concepts of how SSIS gets the performance out of data flow. It preallocates memory and does all the transformations in that memory space. How much memory would you allocate for a MAX type? Exactly...
So, you're going to need to use one of the stream data types: DT_TEXT or DT_NTEXT. Those allow for unlimited length strings.
Biml
I'm actually stumped on this, hopefully Scott can chime in. The emitted DTSX will look as the before screenshot with a data type of DT_STR and length of zero. It runs fine, just looks bad. When you double click to let the editor fix it, it changes to DT_TEXT as it should.
I thought it was just going to be a matter of providing a data type override as we can in an Execute SQL Task, but to no avail, it's not a property on the Columns collection in the flat file source.
Perhaps this was a situation where I needed to mess with the Dataflow overrides property...
<DataflowOverrides>
<OutputPath OutputPathName="Output">
<Columns>
<Column
ColumnName="Comments"
DataType="AnsiString"
CodePage="1252"
Length="-1"
></Column>
</Columns>
</OutputPath>
</DataflowOverrides>
But no, that gave me no better result.
Fine, I gave up and "cheated" by using Mist/BimlOnline to reverse engineer the corrected package back into Biml.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<FlatFileConnection Name="FF_Test" FilePath="C:\ssisdata\SO\Input\so_35438946.txt" FileFormat="FF_Test" />
</Connections>
<Packages>
<Package Name="so_35438946_re" Language="None" VersionBuild="1" CreatorName="BillFellows" CreatorComputerName="AVATHAR" CreationDate="2016-02-16T13:02:49">
<Tasks>
<Dataflow Name="DFT Load Data">
<Transformations>
<DerivedColumns Name="DER Placeholder">
<InputPath OutputPathName="FF_SRC.Output" />
</DerivedColumns>
<FlatFileSource Name="FF_SRC" LocaleId="None" FileNameColumnName="" ConnectionName="FF_Test" />
</Transformations>
</Dataflow>
</Tasks>
<Connections>
<Connection ConnectionName="FF_Test" />
</Connections>
</Package>
</Packages>
<FileFormats>
<FlatFileFormat Name="FF_Test" CodePage="1252" TextQualifer="_x003C_none_x003E_" ColumnNamesInFirstDataRow="true" RowDelimiter="LF">
<Columns>
<Column Name="Category" Length="128" DataType="AnsiString" Delimiter="VerticalBar" MaximumWidth="128" />
<Column Name="Comments" Length="-1" DataType="AnsiString" Delimiter="VerticalBar" />
<Column Name="DisplayName" Length="256" DataType="AnsiString" Delimiter="CRLF" MaximumWidth="256" />
</Columns>
</FlatFileFormat>
</FileFormats>
</Biml>
And now I simply Generate SSIS package and... Well, I suppose it's progress. Comments is identified as DT_TEXT but I still get the warning.
Deep dive into the dtsx
In the data flow's flat file source, the external metadata collection for this column is defined as follows
<externalMetadataColumn
codePage="1252"
dataType="str"
name="Comments"
refId="Package\DFT Load Data\FF_SRC.Outputs[Output].ExternalColumns[Comments]"></externalMetadataColumn>
In the on we let the editor adjust
<externalMetadataColumn
refId="Package\DFT Load Data\FF_SRC.Outputs[Output].ExternalColumns[Comments]"
codePage="1252"
dataType="text"
name="Comments" />
and the one emitted from VS 2013 using the original code, we get
<externalMetadataColumn
codePage="1252"
dataType="str"
name="Comments"
refId="Package\DFT Load Data\FF_SRC.Outputs[Output].ExternalColumns[Comments]">
</externalMetadataColumn>
It might be distasteful but perhaps a bit of XSLT could find any of the instances where you have this named column and data type of str and transform it to text
I didn't try, but found it on Varigance documentation:
<!-- A Length of -1 will automatically be converted to nvarchar(max)/varchar(max) -->
<Column Name="LongString" DataType="String" Length="-1" />

Phing pdosqlexec Segmentation Fault

when i try to execute this
<pdosqlexec
url="${pdo_driver}:host=${host};dbname=${db.name};"
userid="${mysql_user}"
password="${mysql_pwd}"
encoding="utf8"
onerror="abort">
<fileset dir="./cache/mysql/">
<include name="dump.sql"/>
</fileset>
<formatter type="plain" outfile="./cache/mysql/output4.txt"/>
</pdosqlexec>
i get a segmentation fault, and the loading of the dump.sql is interrupted.
Any solution?
You can try add delimitertype="row" option to pdosqlexec. For me this solution helps load files larger then 4 MB
full example of use
<pdosqlexec
url="mysql:host=${DB_HOST};dbname=${DB_NAME}"
userid="${DB_USER}"
password="${DB_PASS}"
delimitertype="row">
<transaction src="${DB_SCHEMA_PATH}/notus-backend.sql"/>
<formatter type="plain" outfile="${LOG_PATH}/phing.CreateTables.log"/>
</pdosqlexec>

How to execute (via ant) a set of sql files with mysql command line

I want to execute all sql files which resistes in a given directory.
The call i want to make is something like this:
mysql --host=dbbackend --user=stack --password=overflow dbname -e 'source file1.sql'
This can be expressed as:
<apply executable="mysql" dir="." failonerror="true">
<arg value="--host=dbbackend"></arg>
<arg value="--user=stack"></arg>
<arg value="--password=overflow"></arg>
<arg value="mydbname"></arg>
<arg value="--e source dummy.sql"></arg>
<fileset dir="${db.dump.location.data}" casesensitive="no" description="take all sql files">
<patternset>
<include name="**/*.sql" />
</patternset>
</fileset>
</apply>
If i have 3 sql files in that dir, then dummy.sql is called 3 times. So far so good. Is there a placeholder available to change this line:
<arg value="--e source dummy.sql"></arg>
into:
<arg value="--e source ${unknown.placeholder.name}"></arg>
If there is a placeholder, then i want to use it for the "input" attribute of the "apply" tag (and remove the argument "-e").
There is a "srcfile" tag available for the "apply" element, but i can not call this (does not work):
<arg value="--e source"></arg>
<srcfile/>
Do you have suggestions on how to do it with native ant declaration?
Is it possilbe to build a workaround using antcall+fileset (+placeholder)?
The noobish workaround is to iterate via fileset and create a temp sql file with references to the sql files. As a last step: call it static via "-e". But that is a workaround that i want to remove (by this question).
PS: I do not want to use ant-contrib features.
Why not using the ant sql task, which is designed to do exactly what you want to do?
Else, you may use the srcfile nested element:
<apply ... parallel="false">
<...>
<arg value="--e source" />
<srcfile/>
<fileset ...></fileset>
</apply>
There is a complete example on the manual page of the apply task.