SSIS import from excel files with multiple sheets and different amount of columns - ssis

I have SSIS with 2 loops that loop over excel files and over the sheets. I have a data flow task in the for each sheet loop with variable name for sheetname and the source is excel and odbc destination.
The table in the db has all the columns I need such as userid, username, productname, supportname.
However, some sheets can have columns username, productname and others have userid, username, productname, supportname.
How can I load the excel files? Can I add columns to a derived column task that checks if a column exists and if not add it with a default value and then map it to the destination?
thanks

SSIS is a not an any format goes at run-time data loading engine. There was a conscious design decision to make the fastest possible ETL tool and one of those requirements was that they needed to define a contract between the data source's shape and the destination. That's why you'll inevitably run into VS_NEEDSNEWMETADATA error because something has altered the shape and the package needs to be edited in designer mode to update the columns and sizes.
If you want to write the C# to make a generic Excel ingest engine, more power to you.
An alternative approach would be to have multiple data flows defined within your file and worksheet looping construct. The trick would be to conditionally enable them based on the available column set.
Columns "username and productname" detected, enable DFT UserName and ProductName. And that DFT will have default values, or a lookup, for UserId, SupportName, etc
All columns present, enable DFT All.
Finally, Azure Data Factory can "slurp and burp" whatever source to whatever destination. Perhaps that might be a better fit for your problem.

Related

SSIS - How can I use the value of a variable to determine the ForEachLoop Container?

I'm working on an SSIS package, the goal of the package is to take a spreadsheet that has several columns (we need PartNum, PartType, and Qty)
and for each row in the spreadsheet, run a query to calculate consumption and dump that into a separate sheet.
I've got a few problems, but my initial problem, is that I have two part types, Manufactured and Purchased. I only want to run the query against Manufactured pieces. How Can I do that in SSIS? I'm trying to set it up in the expression builder for the variable to equal "M", but this always evaluates to false.
Ideally, I want to filter on both Part Type = M and Qty > 0.
Here is a picture of the SSIS package, basically I'm using a data flow to bring a spreadsheet into a Recordset, and then in a Foreach loops, an OLEDB Source to pass query parameters (the part and qty variables) to export into a .csv
In the initial Data Flow Task from the Excel Source into the Recordset Destination, instead of loading the entire Excel file just select records that satisfy the given criteria. Unless you need these records for another purpose in the package, this will also prevent adding unused rows in the Recordset Destination and processing them in subsequent components. You can do this in the Excel Source by changing the Data Access Mode to SQL Command and adding the necessary filters. Excel can be queried similar to SQL. The query you want should be somewhat similar to the following, with the table and column names substituted appropriately. If the columns contain spaces in their names, these will need to be enclosed in square brackets. For example, PartType would be [Part Type].
SELECT
PartNum,
PartType,
Qty
FROM Excel_Sheet
WHERE PartType = 'M' AND Qty > 0

Flat File Destination has multiple Inputs from custom data flow task

I have an SSIS Package setup with the following Data Flow Tasks in order:
Flat File Source
Derived Column
Custom Task
Flat File Destination
The Flat File source contains fixed-width rows of data (282 characters per row).
The Derived Column splits each row into columns using the SUBSTRING() method.
The Custom Task performs some Regular Expression validation and creates two new output columns: RowIsValid (a DT_BOOL) and InvalidReason (a DT_WSTR of 200). There is no Custom UI for this Task.
The Flat File Destination is the validated data in delimited column format. Eventually, this would be a
database destination.
I know that this can be done using a Script Task. In fact, I am currently doing so in my solution. However, what I am trying to accomplish is building a Custom Task so that code changes are done in a single-spot instead of having to change multiple Script Tasks.
I have a couple of issues I'm trying to overcome and am hoping for some help/guidance:
(Major) Currently, when I review the mappings of the Flat File Destination, the Available Input columns are coming from the Flat File Source, the Derived Column Task, and the Custom Task. Only one column is coming from the Flat File Source (because there is only one column), while the Derived Column and Custom Task each have all of the columns created in the Derived Column.
My expectation is that the Available Input Columns would/should only display the Custom Validator.[column name] columns (with only the column name) from the Custom Validator. Debugging, I don't see where I can manipulate and suppress the Derived Column.[column name] columns.
(Minor) Getting the input columns from the Derived Column Task to automatically be selected or used when the Input is attached.
Currently, after hooking up the input and output of the Custom Validator, I have to go to the Inputs tab on the Advanced Edit and select the columns I want. I'm selecting all, because I want all columns to go through the task, even though only some will be validated by the task.

How to pass Multiple Input for SSIS Script Component

I have a Custom Source DataFlow Component whose O/P will differ every time, I need to Insert those records in to a Destination table.
Problem:-
Can't Specify the Input columns at Design time for Destination Component.. as in actual for every call to the DataFlow task, The source component gonna return Different O/P Columns based on the Table Schema.
Solution Needed For:-
How to accept whatever inputs are available without any mapping in Destination DataFlow component(either by any Existing Component or by Custom Component)?
The data flow's fixed structure is there for data validation and to optimization purposes. All of it's components are going to have fixed input and output columns. I would suggest the following possibilities:
Write a data flow for every possible schema. There are probably a finite number of possibilities. You could reduce the effort of this task by using BIML which could generate the package structure for you. This may also introduce the possibility of parallel loading.
Use a script task instead of a data flow. In the script task, write the rows for each input into a table.
m
If you need to pass multiple inputs to a single script component, the only way I know to do this is by passing the multiple inputs to a UNION ALL component, and then passing the single output from the UNION ALL to the Script.
You'll have to account for any differences between the two columns structures in the UNION ALL, and maybe use derived columns if you need an easy way to identify which original input a row came from.
I know this is way late but I keep seeing this UNION ALL approach and don't like it.
How about this approach.
Run both data flows into their own recordset destination and save into a variable of type ADO object
Create a new dataflow and use a script source and bring in both ADO objects
Fill datatables using adapter and then do what ever you want with them.

SSIS parameters/variables in destination mapping

I am working on a large SSIS data migration project in which all of the output tables have fields for the creation date & user as well as the last update date and user. The values will be the same for all of the records in all of the output tables.
Is there a way to define parameters or variables that will appear in the destination mapping window, and can be used to populate the output table?
If I use a sql statement in the source, I could, of course, include extra fields for this, but then I also have to add a Data Conversion task for translating the string fields from varchar to nvarchar.
You cannot do this in the destination mapping.
As you've already considered, you could do this by including the extra fields in the source, but then you are passing all that uniform data through the entire dataflow and perhaps having to convert it as well.
A third option would be to run through your data flow without those columns at all (let them be NULL in the destination), and then follow the data flow with an UPDATE that sets those columns with a package variable value.

SSIS excel formatting wont change from text field in destination editor **work around in place**

I have created an SSIS package in Visual Studio 2008 that take's a SQL select statement and populates a excel sheet, the excel sheet is duplicated from a template file with all the formatting and cells set up.
The issue I am having is that no matter what I do I can not change the excel destination formatting to anything other than general, it overwrites the source destination and puts decimal numbers a '1.50 always adding the ' to fields.
i have tried inserting a row as per some suggestions as people think this is where SSIS scans for formatting types. However the field always comes up as Unicode string [DT_WSTR] in the advance editor and always defaults back if i change them.
Please can someone help! Happy to provide any additional info if I've missed anything, I've seen some posts with the same issue, but none of the solutions seem to be working or i'm missing something else.
****Update****
Figured out the reason behind none of the recommended fixes working, this was due to using a select statement in the excel destination instead of selecting the table.
This essentially wipes out any change if changing formatting.
So what I decided in the end was to create a data only sheet(which is hidden) using the basic table data access mode, then reference that in a front end sheet with all the formatting all ready in and using a =value(C1) formula to return just the value. Protected the cells to hide the formula's.
I have found that, when I change a Data Flow Task in SSIS, that exports to (or imports from) Excel, I often have to "start over", or SSIS will somehow retain the some of the properties of the old Data Flow Task: data types, column positions... For me, that often means:
1) Deleting the Source and Destination objects within the Data Flow Task, AND ALSO deleting/recreating the Connection Object for the Excel spreadsheet. I've done this enough times that I now save myself time by copy/pasting my Source and Destination names to-and-from a Notepad window, and I choose names that remind me of the objects they referred to (the table and file, respectively).
2) Remembering to rebuild the ARROW's metadata, too: after you change and/or recreate the Source object, you have to remember to DOUBLE-CLICK THE ARROW NEXT, before re-creating the Destination. That shows the arrow's metadata, but it also creates/updates the arrow's metadata.
3) When recreating the destination, DELETE THE SPREADSHEET from prior runs (or rename or move, etc.), and have SSIS recreate it. (In your new destination object, there's a button to create that spreadsheet, using the metadata.)
If you still have problems after the above, take a look at your data types... make sure you've picked SQL datatypes that SSIS supports.
At the link below, about 2/3rds of the way down the page, you'll find a table "Mapping of Integration Services Data Types to Database Data Types", with SSIS data types in the 1st column ("Data Type"), and your T-SQL equivalent data types in the 3rd column ("SQL Server (SqlClient)"):
Integration Services Data Types
Hope that helps...