Flat File Destination has multiple Inputs from custom data flow task - ssis

I have an SSIS Package setup with the following Data Flow Tasks in order:
Flat File Source
Derived Column
Custom Task
Flat File Destination
The Flat File source contains fixed-width rows of data (282 characters per row).
The Derived Column splits each row into columns using the SUBSTRING() method.
The Custom Task performs some Regular Expression validation and creates two new output columns: RowIsValid (a DT_BOOL) and InvalidReason (a DT_WSTR of 200). There is no Custom UI for this Task.
The Flat File Destination is the validated data in delimited column format. Eventually, this would be a
database destination.
I know that this can be done using a Script Task. In fact, I am currently doing so in my solution. However, what I am trying to accomplish is building a Custom Task so that code changes are done in a single-spot instead of having to change multiple Script Tasks.
I have a couple of issues I'm trying to overcome and am hoping for some help/guidance:
(Major) Currently, when I review the mappings of the Flat File Destination, the Available Input columns are coming from the Flat File Source, the Derived Column Task, and the Custom Task. Only one column is coming from the Flat File Source (because there is only one column), while the Derived Column and Custom Task each have all of the columns created in the Derived Column.
My expectation is that the Available Input Columns would/should only display the Custom Validator.[column name] columns (with only the column name) from the Custom Validator. Debugging, I don't see where I can manipulate and suppress the Derived Column.[column name] columns.
(Minor) Getting the input columns from the Derived Column Task to automatically be selected or used when the Input is attached.
Currently, after hooking up the input and output of the Custom Validator, I have to go to the Inputs tab on the Advanced Edit and select the columns I want. I'm selecting all, because I want all columns to go through the task, even though only some will be validated by the task.

Related

SSIS import from excel files with multiple sheets and different amount of columns

I have SSIS with 2 loops that loop over excel files and over the sheets. I have a data flow task in the for each sheet loop with variable name for sheetname and the source is excel and odbc destination.
The table in the db has all the columns I need such as userid, username, productname, supportname.
However, some sheets can have columns username, productname and others have userid, username, productname, supportname.
How can I load the excel files? Can I add columns to a derived column task that checks if a column exists and if not add it with a default value and then map it to the destination?
thanks
SSIS is a not an any format goes at run-time data loading engine. There was a conscious design decision to make the fastest possible ETL tool and one of those requirements was that they needed to define a contract between the data source's shape and the destination. That's why you'll inevitably run into VS_NEEDSNEWMETADATA error because something has altered the shape and the package needs to be edited in designer mode to update the columns and sizes.
If you want to write the C# to make a generic Excel ingest engine, more power to you.
An alternative approach would be to have multiple data flows defined within your file and worksheet looping construct. The trick would be to conditionally enable them based on the available column set.
Columns "username and productname" detected, enable DFT UserName and ProductName. And that DFT will have default values, or a lookup, for UserId, SupportName, etc
All columns present, enable DFT All.
Finally, Azure Data Factory can "slurp and burp" whatever source to whatever destination. Perhaps that might be a better fit for your problem.

Validate .CSV output in ADF

I have a DataFlow that searches an information inside each one of my databases and returns into several .CSV files. When the search returns with data the .CSV contains headers and the data that was founded. When it does not, the .CSV contains only the headers. After that, all the .CSV files are moved into a Sharepoint folder through app logic.
My question is: I need to put those .CSV into two folders "with data" and "no data" to make it easier to check which one of them has or has not data in it. I have tried to use "Conditional Slipt" in my DataFlow but it does not work. Does anyone have any suggestion to deal with that?
As per your scenario since you have incoming files that have rows and empty you can input additional stream with a only header (similar to your CSV with only header and no rows). You can use this to compare with the earlier input stream to decide if it is empty. You could use lookup activity and then use an if activity to decide whether you need to run the copy activity.
In the lookup activity, you could set firstRowOnly as true since you only want to check whether there are data. Check weather the first row is empty, if yes copy that file to folder "no data" else copy that file to folder "with data". Use conditional split here to direct them to different streams to sink (copy) or can use copy activity in the pipeline separately.
From my repro:
1. Consider inputs with and without data (CSV files)
2. Use lookup activity to compare input with a predefined empty source file.
3. Use conditional split activity with suitable condition expression depending on your data schema.
4. Route to appropriate folders using sink
Validate the output:
Refer: Lookup transformation in mapping data flow, Conditional split transformation in mapping data flow, Data transformation expressions in mapping data flow

Remove Column from CSV file using SSIS

I have a CSV file that I am using as a source in SSIS. The file contains additional blank columns in the file
There is an additional column in S, U, V; is there a way I can remove the column through SSIS Script Task before using it as a source file.
Perhaps I misunderstand the problem, but you have a CSV with blank columns that you'd like to import into SQL Server but you do not want to import the blank columns into your table.
So, don't create the target columns in your table. That's it. There's no problem with having "extra" data in your data pipeline, just don't land it.
The Flat File Connection manager must* have a definition for all the columns. As it appears you do not headers for the blank columns and a column name is required, you will need to set up the flat file connection manager as my file has no header columns but then skip 1 row to avoid the header row. It might be a data starts on line setting - doing this from memory. By specifying no header row, that means you need to manually provide your column names. I favor naming them something obvious like IgnoreBlankColumn_S, IgnoreBlankColumn_U, IgnoreBlankColumn_V that way future maintainers will know this is an intentional design decision since the source data has no data.
*You can write a query against a text file which would allow you to only pull in specific columns but this is not going to be worth the effort.

How to pass Multiple Input for SSIS Script Component

I have a Custom Source DataFlow Component whose O/P will differ every time, I need to Insert those records in to a Destination table.
Problem:-
Can't Specify the Input columns at Design time for Destination Component.. as in actual for every call to the DataFlow task, The source component gonna return Different O/P Columns based on the Table Schema.
Solution Needed For:-
How to accept whatever inputs are available without any mapping in Destination DataFlow component(either by any Existing Component or by Custom Component)?
The data flow's fixed structure is there for data validation and to optimization purposes. All of it's components are going to have fixed input and output columns. I would suggest the following possibilities:
Write a data flow for every possible schema. There are probably a finite number of possibilities. You could reduce the effort of this task by using BIML which could generate the package structure for you. This may also introduce the possibility of parallel loading.
Use a script task instead of a data flow. In the script task, write the rows for each input into a table.
m
If you need to pass multiple inputs to a single script component, the only way I know to do this is by passing the multiple inputs to a UNION ALL component, and then passing the single output from the UNION ALL to the Script.
You'll have to account for any differences between the two columns structures in the UNION ALL, and maybe use derived columns if you need an easy way to identify which original input a row came from.
I know this is way late but I keep seeing this UNION ALL approach and don't like it.
How about this approach.
Run both data flows into their own recordset destination and save into a variable of type ADO object
Create a new dataflow and use a script source and bring in both ADO objects
Fill datatables using adapter and then do what ever you want with them.

process csv File with multiple tables in SSIS

i trying to figure out if its possible to pre-process a CSV file in SSIS before importing the Data into SQL.
I currently receive a file that contains 8 tables with different structures in one flat file.
the Tables are identified by a row with the Table name in it encapsulated by Square Brackets i.e. [DOL_PROD]
the the data is underneath in standard CSV format. Headers first and then the data.
the tables are split by a blank line and the process repeats for the next 7 tables.
[DOL_CONSUME]
TP Ref,Item Code,Description,Qty,Serial,Consume_Ref
12345,abc,xxxxxxxxx,4,123456789,abc
[DOL_ENGPD]
TP Ref,EquipLoc,BackClyLoc,EngineerCom,Changed,NewName
is it posible to split it out into seperate CSV files? or Process it in a loop?
i would really like to be able to perform this all with SSIS automatically.
Kind Regards,
Adam
You can't do that by flat file source and connection manager alone.
There are two ways to achieve your goal:
You can use Script Component as source of the rows and to process the files, and then you'd do whatever you want with a file programatically.
The other way, is to read your flat file treating every row as a single column (i.e. without specifying delimiter), and then, via Data Flow Transformations, you'd be splitting rows, recognizing table names, splitting flows and so on.
I'd strongly advise you to use Script Component, even if you'd have to learn .NET first, because the second option will be a nightmare :). I'd use Flat File Source to extract lines from file as single column, and thet work it in Script Component, rather than reading a "raw" file directly.
Here's a resource that should get you started: http://furrukhbaig.wordpress.com/2012/02/28/processing-large-poorly-formatted-text-file-with-ssis-9/