Is there a way to conditionally (through a script task or anything else), control the flow of program in SSIS?
Currently I have a package that would create 5 different excel sheets (through Execute SQL Task) dynamically. There maybe times when all 5 will have data or only 1 may have data. When its just 1 that has data, it is fine. But the real problem arises when there are 5 DFT's that are trying to write the data simultaneously to the same workbook (albeit different sheets inside that). The package fails with an OLEDB error.
After a lot of head breaking, I finally figured out that it was a concurrency control issue that wasn't allowing me to write to the excel file simultaneously. To further my solution, I used expressions on precedence constraints to control if the sheets get created or not.
But the real trouble is that after creating the sheets, the package would fail trying to write data to 2 different sheets simultaneously.
Is there a way, I can assign an 'Execution Order' for the DFT's? This is the reason I am looking for a script task so that when a particular sheet's count is 0 then it does no work and the control moves to another branch.
I hope I have not confused you here. But if I have, I'll be glad to provide more details on this question. Thanks for reading.
My first thought is to have a bunch of sequence containers, one per possible Excel sheet, each of which holds three tasks:
A script task to figure out whether or not to create the sheet, and set a boolean package variable accordingly
An SQL task to create the worksheet
A data flow task to populate the worksheet
The precedence constraint between tasks 1 and 2 would be an expression of the boolean being true:
The precedence constraint between tasks 2 and 3 would be a success constraint, as would the precedence constraints between the sequence containers. Overall, it would look like this:
Related
I have an excel spreadsheet with multiple sheets. So In For Each loop container have a script which is reading the sheets and saving them to the variable. In dataflow still inside of the For Each loop container is the process which leads to recordset destination which is saving all the columns to another variable. Then outside of the for each loop container is another dataflow which has to read from the variable all rows check for duplicity (in second and 3 sheet is duplicit product id), remove duplicity and upload data into database. I have been searching everywhere and cannot find how to setup the recordset destination to not replace the variable but append it with the values, because end up only with last sheet of data.
Cannot be doing changes in the foreach loop container in settings because of the looping through the sheets.
Thank you in advance for any advise.
While hopefully someone wiser in SSIS will chime in here, but I don't think your current approach will work.
Generally speaking, you can use Expressions within SSIS to get dynamic behaviour. However, the VariableName of the Recordset Destination does not support that.
You might be able to have a Script Task after the Data Flow that copies from rsCurrent into rsIteration1, rsIteration2, etc based on the current loop but at that point, you're double copying data for no real value.
Since you're doing a duplicate check, perhaps a read of sheet 1 goes into a Cache Connection Manager
And then the read from subsequent pages will use the CCM as the lookup. For rows that have matches, then you know you have duplicates (or maybe you only import what doesn't match, I don't quite get your logic)
Debugging some of this is going to be quite challenging. If at all possible, I would stage the data to tables. There you could load all the data + the tab name and then you can test your deduplication and refer back to your inputs and outputs.
The tooling for SSIS variables of type Object is pretty limited, which is a pity.
I have a Custom Source DataFlow Component whose O/P will differ every time, I need to Insert those records in to a Destination table.
Problem:-
Can't Specify the Input columns at Design time for Destination Component.. as in actual for every call to the DataFlow task, The source component gonna return Different O/P Columns based on the Table Schema.
Solution Needed For:-
How to accept whatever inputs are available without any mapping in Destination DataFlow component(either by any Existing Component or by Custom Component)?
The data flow's fixed structure is there for data validation and to optimization purposes. All of it's components are going to have fixed input and output columns. I would suggest the following possibilities:
Write a data flow for every possible schema. There are probably a finite number of possibilities. You could reduce the effort of this task by using BIML which could generate the package structure for you. This may also introduce the possibility of parallel loading.
Use a script task instead of a data flow. In the script task, write the rows for each input into a table.
m
If you need to pass multiple inputs to a single script component, the only way I know to do this is by passing the multiple inputs to a UNION ALL component, and then passing the single output from the UNION ALL to the Script.
You'll have to account for any differences between the two columns structures in the UNION ALL, and maybe use derived columns if you need an easy way to identify which original input a row came from.
I know this is way late but I keep seeing this UNION ALL approach and don't like it.
How about this approach.
Run both data flows into their own recordset destination and save into a variable of type ADO object
Create a new dataflow and use a script source and bring in both ADO objects
Fill datatables using adapter and then do what ever you want with them.
I have an SSIS package with a Data Flow Task and an FTP Task. I have to use two expression variables like this:
(These create dynamic file names using date parts)
Otherwise if I have just one variable, one task steps on the variable while the other task is trying to use it and gives me the 'cannot lock variable for readonly' error.
Is it possible to have one variable work in two places? Would seem intuitive... This is sloppy. Should someone change one variable without the other to match it would bomb.
I added an Expression Task before the Data Flow... bingo
If you do not require those two tasks to run in parallel, then force one task to complete before the other begins (precedence constraints etc.) - that should prevent race conditions on the single variable.
I have created an SSIS package in Visual Studio 2008 that take's a SQL select statement and populates a excel sheet, the excel sheet is duplicated from a template file with all the formatting and cells set up.
The issue I am having is that no matter what I do I can not change the excel destination formatting to anything other than general, it overwrites the source destination and puts decimal numbers a '1.50 always adding the ' to fields.
i have tried inserting a row as per some suggestions as people think this is where SSIS scans for formatting types. However the field always comes up as Unicode string [DT_WSTR] in the advance editor and always defaults back if i change them.
Please can someone help! Happy to provide any additional info if I've missed anything, I've seen some posts with the same issue, but none of the solutions seem to be working or i'm missing something else.
****Update****
Figured out the reason behind none of the recommended fixes working, this was due to using a select statement in the excel destination instead of selecting the table.
This essentially wipes out any change if changing formatting.
So what I decided in the end was to create a data only sheet(which is hidden) using the basic table data access mode, then reference that in a front end sheet with all the formatting all ready in and using a =value(C1) formula to return just the value. Protected the cells to hide the formula's.
I have found that, when I change a Data Flow Task in SSIS, that exports to (or imports from) Excel, I often have to "start over", or SSIS will somehow retain the some of the properties of the old Data Flow Task: data types, column positions... For me, that often means:
1) Deleting the Source and Destination objects within the Data Flow Task, AND ALSO deleting/recreating the Connection Object for the Excel spreadsheet. I've done this enough times that I now save myself time by copy/pasting my Source and Destination names to-and-from a Notepad window, and I choose names that remind me of the objects they referred to (the table and file, respectively).
2) Remembering to rebuild the ARROW's metadata, too: after you change and/or recreate the Source object, you have to remember to DOUBLE-CLICK THE ARROW NEXT, before re-creating the Destination. That shows the arrow's metadata, but it also creates/updates the arrow's metadata.
3) When recreating the destination, DELETE THE SPREADSHEET from prior runs (or rename or move, etc.), and have SSIS recreate it. (In your new destination object, there's a button to create that spreadsheet, using the metadata.)
If you still have problems after the above, take a look at your data types... make sure you've picked SQL datatypes that SSIS supports.
At the link below, about 2/3rds of the way down the page, you'll find a table "Mapping of Integration Services Data Types to Database Data Types", with SSIS data types in the 1st column ("Data Type"), and your T-SQL equivalent data types in the 3rd column ("SQL Server (SqlClient)"):
Integration Services Data Types
Hope that helps...
I'm looking for some pointers in creating an SSIS based workflow that reads a list of tables at run time from a database and then uses each of these as ADO inputs, selects specific columns from each table and then adds these to a staging area. I've had a quick play with the union task but was looking for some pointers in terms of direction to take ?
I can't seem to find anything on the net that does what I need and am not sure if SSIS can bend to suit my needs.
Many thanks in advance.
You can do this but the only method I can think of is a little convoluted.
You would need to use a "for each loop container" to loop through your list of tables & read each table name into an SSIS variable.
Within the "foreach":
add a script task to build your actual query into another SSIS variable.
add a data flow
within the Data Flow use a source of "SQL Command from variable".
do data flow "stuff"
I hope this makes some kind of sense? :-)