Variable in recordset destination replaced when the dataflow is in For Each loop container? - ssis

I have an excel spreadsheet with multiple sheets. So In For Each loop container have a script which is reading the sheets and saving them to the variable. In dataflow still inside of the For Each loop container is the process which leads to recordset destination which is saving all the columns to another variable. Then outside of the for each loop container is another dataflow which has to read from the variable all rows check for duplicity (in second and 3 sheet is duplicit product id), remove duplicity and upload data into database. I have been searching everywhere and cannot find how to setup the recordset destination to not replace the variable but append it with the values, because end up only with last sheet of data.
Cannot be doing changes in the foreach loop container in settings because of the looping through the sheets.
Thank you in advance for any advise.

While hopefully someone wiser in SSIS will chime in here, but I don't think your current approach will work.
Generally speaking, you can use Expressions within SSIS to get dynamic behaviour. However, the VariableName of the Recordset Destination does not support that.
You might be able to have a Script Task after the Data Flow that copies from rsCurrent into rsIteration1, rsIteration2, etc based on the current loop but at that point, you're double copying data for no real value.
Since you're doing a duplicate check, perhaps a read of sheet 1 goes into a Cache Connection Manager
And then the read from subsequent pages will use the CCM as the lookup. For rows that have matches, then you know you have duplicates (or maybe you only import what doesn't match, I don't quite get your logic)
Debugging some of this is going to be quite challenging. If at all possible, I would stage the data to tables. There you could load all the data + the tab name and then you can test your deduplication and refer back to your inputs and outputs.
The tooling for SSIS variables of type Object is pretty limited, which is a pity.

Related

SSIS import from excel files with multiple sheets and different amount of columns

I have SSIS with 2 loops that loop over excel files and over the sheets. I have a data flow task in the for each sheet loop with variable name for sheetname and the source is excel and odbc destination.
The table in the db has all the columns I need such as userid, username, productname, supportname.
However, some sheets can have columns username, productname and others have userid, username, productname, supportname.
How can I load the excel files? Can I add columns to a derived column task that checks if a column exists and if not add it with a default value and then map it to the destination?
thanks
SSIS is a not an any format goes at run-time data loading engine. There was a conscious design decision to make the fastest possible ETL tool and one of those requirements was that they needed to define a contract between the data source's shape and the destination. That's why you'll inevitably run into VS_NEEDSNEWMETADATA error because something has altered the shape and the package needs to be edited in designer mode to update the columns and sizes.
If you want to write the C# to make a generic Excel ingest engine, more power to you.
An alternative approach would be to have multiple data flows defined within your file and worksheet looping construct. The trick would be to conditionally enable them based on the available column set.
Columns "username and productname" detected, enable DFT UserName and ProductName. And that DFT will have default values, or a lookup, for UserId, SupportName, etc
All columns present, enable DFT All.
Finally, Azure Data Factory can "slurp and burp" whatever source to whatever destination. Perhaps that might be a better fit for your problem.

How to pass Multiple Input for SSIS Script Component

I have a Custom Source DataFlow Component whose O/P will differ every time, I need to Insert those records in to a Destination table.
Problem:-
Can't Specify the Input columns at Design time for Destination Component.. as in actual for every call to the DataFlow task, The source component gonna return Different O/P Columns based on the Table Schema.
Solution Needed For:-
How to accept whatever inputs are available without any mapping in Destination DataFlow component(either by any Existing Component or by Custom Component)?
The data flow's fixed structure is there for data validation and to optimization purposes. All of it's components are going to have fixed input and output columns. I would suggest the following possibilities:
Write a data flow for every possible schema. There are probably a finite number of possibilities. You could reduce the effort of this task by using BIML which could generate the package structure for you. This may also introduce the possibility of parallel loading.
Use a script task instead of a data flow. In the script task, write the rows for each input into a table.
m
If you need to pass multiple inputs to a single script component, the only way I know to do this is by passing the multiple inputs to a UNION ALL component, and then passing the single output from the UNION ALL to the Script.
You'll have to account for any differences between the two columns structures in the UNION ALL, and maybe use derived columns if you need an easy way to identify which original input a row came from.
I know this is way late but I keep seeing this UNION ALL approach and don't like it.
How about this approach.
Run both data flows into their own recordset destination and save into a variable of type ADO object
Create a new dataflow and use a script source and bring in both ADO objects
Fill datatables using adapter and then do what ever you want with them.

Sorting files with the same header names using SSIS

I have a folder with a lot of data files in. I want to be able to loop through the files, look at the headers and sort them into folders if they have the same headers. Is that possible to do in SSIS? If so would anyone be able point me the direction of how to do this?
I am going to try and explain this as best I can without writing a book as this a multi stepped process that isn't too complex but, might be hard to explain with just test. My apologies but I do not have access to ssdt at the moment so I can not provide images to aid here.
I would use the TextFieldParser class in the VisualBasics.dll. in a script task. This will allow you to read the header from file into a string array. You can then build the string array into a delimited column and load an object variable with a datatable that has been populated with two columns. The first column being the filename and the second being the delimiter headers.
Once you have this variable you can load a sql table with this information. (optional to skip if you want to load the columns directly into sql as you read them. your call)
Once you have your sql table you can create an enumerator for that dataset based on the unique headers column.
Then use a foreach loop task with script task to enumerate thru the unique header sets. Use a sql task to assign the file names that belong to the unique header set.
Within the script loop thru the returned file names and apply the necessary logic to move the files to there respective folders.
This is sort of a high level overview as I am assuming you are familiar enough with SSIS to understand the steps necessary to complete each step. If not then I would be able to elaborate later in the day when I am able to get to my SSIS rig.

SSIS excel formatting wont change from text field in destination editor **work around in place**

I have created an SSIS package in Visual Studio 2008 that take's a SQL select statement and populates a excel sheet, the excel sheet is duplicated from a template file with all the formatting and cells set up.
The issue I am having is that no matter what I do I can not change the excel destination formatting to anything other than general, it overwrites the source destination and puts decimal numbers a '1.50 always adding the ' to fields.
i have tried inserting a row as per some suggestions as people think this is where SSIS scans for formatting types. However the field always comes up as Unicode string [DT_WSTR] in the advance editor and always defaults back if i change them.
Please can someone help! Happy to provide any additional info if I've missed anything, I've seen some posts with the same issue, but none of the solutions seem to be working or i'm missing something else.
****Update****
Figured out the reason behind none of the recommended fixes working, this was due to using a select statement in the excel destination instead of selecting the table.
This essentially wipes out any change if changing formatting.
So what I decided in the end was to create a data only sheet(which is hidden) using the basic table data access mode, then reference that in a front end sheet with all the formatting all ready in and using a =value(C1) formula to return just the value. Protected the cells to hide the formula's.
I have found that, when I change a Data Flow Task in SSIS, that exports to (or imports from) Excel, I often have to "start over", or SSIS will somehow retain the some of the properties of the old Data Flow Task: data types, column positions... For me, that often means:
1) Deleting the Source and Destination objects within the Data Flow Task, AND ALSO deleting/recreating the Connection Object for the Excel spreadsheet. I've done this enough times that I now save myself time by copy/pasting my Source and Destination names to-and-from a Notepad window, and I choose names that remind me of the objects they referred to (the table and file, respectively).
2) Remembering to rebuild the ARROW's metadata, too: after you change and/or recreate the Source object, you have to remember to DOUBLE-CLICK THE ARROW NEXT, before re-creating the Destination. That shows the arrow's metadata, but it also creates/updates the arrow's metadata.
3) When recreating the destination, DELETE THE SPREADSHEET from prior runs (or rename or move, etc.), and have SSIS recreate it. (In your new destination object, there's a button to create that spreadsheet, using the metadata.)
If you still have problems after the above, take a look at your data types... make sure you've picked SQL datatypes that SSIS supports.
At the link below, about 2/3rds of the way down the page, you'll find a table "Mapping of Integration Services Data Types to Database Data Types", with SSIS data types in the 1st column ("Data Type"), and your T-SQL equivalent data types in the 3rd column ("SQL Server (SqlClient)"):
Integration Services Data Types
Hope that helps...

Controlling the flow in SSIS package based on a condition

Is there a way to conditionally (through a script task or anything else), control the flow of program in SSIS?
Currently I have a package that would create 5 different excel sheets (through Execute SQL Task) dynamically. There maybe times when all 5 will have data or only 1 may have data. When its just 1 that has data, it is fine. But the real problem arises when there are 5 DFT's that are trying to write the data simultaneously to the same workbook (albeit different sheets inside that). The package fails with an OLEDB error.
After a lot of head breaking, I finally figured out that it was a concurrency control issue that wasn't allowing me to write to the excel file simultaneously. To further my solution, I used expressions on precedence constraints to control if the sheets get created or not.
But the real trouble is that after creating the sheets, the package would fail trying to write data to 2 different sheets simultaneously.
Is there a way, I can assign an 'Execution Order' for the DFT's? This is the reason I am looking for a script task so that when a particular sheet's count is 0 then it does no work and the control moves to another branch.
I hope I have not confused you here. But if I have, I'll be glad to provide more details on this question. Thanks for reading.
My first thought is to have a bunch of sequence containers, one per possible Excel sheet, each of which holds three tasks:
A script task to figure out whether or not to create the sheet, and set a boolean package variable accordingly
An SQL task to create the worksheet
A data flow task to populate the worksheet
The precedence constraint between tasks 1 and 2 would be an expression of the boolean being true:
The precedence constraint between tasks 2 and 3 would be a success constraint, as would the precedence constraints between the sequence containers. Overall, it would look like this: