How to export a flat file with different rows using SSIS? - ssis

I have tree tables, Customer, Invoice and InvoiceRow with the standard relations.
These I have to export in one fixed field length file with the first two characters of each row identifying the row type. The row types have different specifications.
I could probably do it with a nested loop in a script block, but this is my first ever SSIS package and that solution feels wrong.
edit:
The output has to have:
Customer
Invoice
Rows
Customer
Invoice
Rows
and so on

Your gut feeling on doing this using a Script Destination component is correct. Unfortunately, this scenario doesn't jive with SSIS well. I don't consider this a beginner package. If you must use SSIS then I'd start by inner joining all the data so there is one row for each InvoiceRow, containing the data needed from all three tables.
CustomerCols, InvoiceCols, RowCols
Then, in the script destination component you'll need to keep track of the customer and invoice values, as they change you'll need to write extra rows to the output.
See Creating a Destination with the Script Component for more information on script destination.
My experience shows that script destinations can have good performance.

I would avoid writing Script Destination, and use just Script Transform + Flat File Destination. This way, you concentrate on the logical output (strings of data), while allowing SSIS to do actual writing to the file (it might be a bit more efficient, plus you concentrate on your business, not on writing to files).
First, you'll need to get denormalized data. You can do joins and sorts in the DBMS, but if you don't want to put too much pressure on DBMS - just get sorted data out of it and merge it using two SSIS Merge Join transforms.
Then do the script: keep running values of current Customer and Invoice, output them when they change, output InvoiceRow on every input. Something like this:
if (this.CustomerID != InputBuffer.CustomerID) {
this.CustomerID = InputBuffer.CustomerID;
OutputBuffer.AddRow();
OutputBuffer.OutputColumn = "Customer: " + InputBuffer.CustomerID + " " + InputBuffer.CustomerName;
}
// repeat the same code for Invoice
OutputBuffer.AddRow();
OutputBuffer.OutputColumn = "InvoiceRow: " + InputBuffer.InvoiceRowPrice;
Finally, add a Flat File Destination with a single column (OutputColumn created by the script) to write this to the file.

Process your three tables so that the outputs are all appropriate for your output file (including the row type designator). You'll have to do this in three separate flow paths in your data flow, then bring the rows together in a Union All data flow element. From there, process them as needed to create your output file.

Related

SSIS consolidate and concatenate multiple rows into single rows without using SQL

I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.
Given entities: [Contract Item] (many) to (one) [Account]
There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).
This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:
SELECT distinct accountid,
(SELECT TT2.ari_productsummary + '; '
FROM TESTTABLE TT2
WHERE TT2.accountid = TT.accountid
FOR XML PATH ('')
) AS 'ari_activecontractitems'
FROM TESTTABLE TT
Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:
But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:
Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?
I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.
Spit balling here as I don't Dynamics nor do I have the custom components.
Data Flow 1 - Contract aggregation
The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)
KingswaySoft Dynamics Source -> Script Task -> Cache Transform
If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.
The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT
As inputs, you'll select accountid and ari_productsummary from your source.
The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.
// member variable
Dictionary<string, List<string>> accumulator;
The PreProcess method, we'll tack this in there to initialize our variable
// initialize in PreProcess method
accumulator = new Dictionary<string, List<string>>();
In the OnBufferRowSent (name approx)
// simulate the inbound queue
// row_id would be something like Rows.row_id
if (!accumulator.ContainsKey(row_id))
{
// Create an empty dictionary for our list
accumulator.Add(row_id, new List<string>());
}
// add it if we don't have it
if (!accumulator[row_id].Contains(invoice))
{
accumulator[row_id].Add(invoice);
}
Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.
// This is how we shove data out the pipe
foreach(var kvp in accumulator)
{
// approximately thus
OutputBuffer1.AddRow();
OutputBuffer1.row_id = kvp.Key;
OutputBuffer1.ari_productsummary = string.Join("; ", kvp.Value);
}
We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.
UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.

Can we compare columns of multiple inputs file to derive a new column in SSIS

I am trying to create a derived column based on columns provided in different input file but unfortunately I keep getting error when I tried to map my Raw_File_1 with Derived Column.The error looks like this:
Cannot create a connector.
The destination component does not have any available inputs for use in creating a path.
My goal is to able to connect both Raw_File_1 and Map_File_1 into Derived Column and generate a new column.
If anyone can provide me any suggestion that would be great!!
I have source file and reference file both are flat file. My source file has column a, column b and column c and my reference file has column d, column e and column f.
If column a=column d and column b=column f then I want to populate column c as the same value as column f. How can I do this kind of analysis or lookup in SSIS
Based on your comments that I patched into the question, you're looking to augment the existing data based on matching data from your reference file.
The core of your SSIS package will look like this
In the first data flow, we will source from map_file_1 and load into a "raw" file.
I configure my raw file destination like this
When the package runs, it'll fill that special format file with the reference data. It's important, because you can either use a database or a raw file as your lookup source.
Finally, we get to work! A flat file source to a Lookup component. In the first tab of that lookup, be sure to change the Connection type from the default of "OLE DB connection manager" to "Cache connection manager"
In Connection tab, click to create a new CCM and use the raw file generated in the preceding step.
Map columns A to D and B to E (assuming data types match). Click the check box on column F and in the Lookup Operation part, Replace C with that value.
Final thoughts
This will be a case sensitive lookup. If things don't have a match in the reference file, it's going to blow up. That's probably not what you want so configure the Lookup transformation to not do that ;)
I blogged about using Excel to populate the cache if you want more words http://billfellows.blogspot.com/2011/11/using-excel-in-ssis-lookup.html
Your question is not clear, i will try to give some suggestions:
If you are looking to perform a lookup with a derived column:
You can use Cache Transform component and Cache connection manager to achieve that:
SSIS - How To Use Flat File Or Excel File In Lookup Transformation [Cache Transformation]
If you are looking to Merge both input:
Then you need to use Merge Join or Union All components:
SSIS Union All Transformation
Learn SSIS : MERGE, MERGE JOIN and UNION ALL
SSIS Basics: Using the Merge Join Transformation

SSIS - Process a flat file with varying data

I have to process a flat file whose syntax is as follows, one record per line.
<header>|<datagroup_1>|...|<datagroup_n>|[CR][LF]
The header has a fixed-length field format that never changes (ID, timestamp etc). However, there are different types of data groups and, even though fixed-length, the number of their fields vary depending on the data group type. The three first numbers of a data group define its type. The number of data groups in each record varies also.
My idea is to have a staging table with to which I would insert all the data groups. So two records like this,
12320160101|12323456KKSD3467|456SSGFED43520160101173802|
98720160102|456GGLWSD45960160108854802|
Would produce three records in the staging table.
ID Timestamp Data
123 01/01/2016 12323456KKSD3467
123 01/01/2016 456SSGFED43520160101173802
987 02/01/2016 456GGLWSD45960160108854802
This would allow me to preprocess the staged records for further processing (some would be discarded, some have their data broken down further). My question is how to break down the flat file into the staging table. I can split the entire record with pipe (|) and then use a Derived Column Transformation to break down the header with SUBSTRING. After that it gets trickier because of the varying number of data groups.
The solution I came up with myself doesn't try to split at the flat file source, but rather in a script. My Data Flow looks like this.
So the Flat File Source output is just a single column containing the entire line. The Script Component contains output columns for each column in the Staging table. The script looks like this.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var splits = Row.Line.Split('|');
for (int i = 1; i < splits.Length; i++)
{
Output0Buffer.AddRow();
Output0Buffer.ID = splits[0].Substring(0, 11);
Output0Buffer.Time = DateTime.ParseExact(splits[0].Substring(14, 14), "yyyyMMddHHmmssFFF", CultureInfo.InvariantCulture);
Output0Buffer.Datagroup = splits[i];
}
}
Note that in the SynchronousInputID property (Script Transformation Editor > Input and Outputs > Output0) must be set to None. Otherwise you won't have Output0Buffer available in your script. Finally the OLE DB Destination just maps the script output columns to the Staging table columns. This solves the problem I had with creating multiple output Records from a single input record.

how to create a SSIS package which creates three text files, using same variables but the textfile is only created when the correct data is found?

There are only 3 files that can be created : "File_1", "File_2" and "File_3". The same variable name is used in each instance (User::FileDirectory) and (User::File_name), but because the actual value of the variable changes, a new file is created.However the files are only created if there is data to go into the file. i.e. if there are no records to populate the file, it will not be created at all. When the files are created, the date the file was created should also be added to the filename. eg: File1_22102011.txt
Ok if the above was a little confusing, the following is how it works,
All the files use the same variable, but it is reset before each file is created.
• So it populates a result set in memory with the first sql selection (ID number, First_Name and Main_Name). It sets the file variable to “File_1”. If there are records in the result set, it creates and writes to this filename.
• Then it creates a new result set with the second selection(Contract No). It sets the variable to "File_2". If there are records in this new result set, a new file will be created from the variable(which now has a new value)
• Finally a third result set is created (Contract_no, ExperianNo, Entity_ID_Number, First_Name, Main_Name), and the file variable is set to "File_3". Again if there are records in the result set, then this file will be created and written to.
I have worked on a few methods to achieve this but they all have failed, So little help will be greatly appreciated.
While what you have works, I think it'd be rather painful to maintain.
I would approach it as 3 sequence containers running in parallel. Each container would have a data flow and two file tasks hanging off it based on success of the parent and the value of row count variable. If the row count variable is 0, delete the file. If it's greater than 0, rename it to File_n
As you can see, I have a container for the first file. The data flow creates an output a.txt file. Based on the value of the variable #RowCount1, it will either delete the empty file or rename it to File_1.
Each data flow would look like a source query, a row count transformation and a file destination with a temporary name (a.txt, b.txt, c.txt). As a file is always created, even if it's empty, we will need to delete or rename it afterwards which will be accomplished based on the file operation tasks.
In my opinion, this approach will be cleaner as it will allow you to test and debug each item in a cleaner manner rather than dealing with an in-memory dataset.

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.