Feedback requested for SSIS Master package design - Running a bunch of Sub-Packages - ssis

Overall, I am looking for feedback regarding two different design options of running a master package.
I have one package that Agent calls that runs a bunch of packages that process data (I think we are up to about 50 now).
The original design was to group packages into smaller chunks called directorates which call the actual packages. Sample below:
A few perceptions I see (and experienced) with this approach is that:
1. Every package has to open (even if it is unnecessary to run ie no file present)
2. #1 adds so much time for the process to complete
3. Runs in parallel for sure
So I developed a new approach which will only run the packages that have the necessary files and logs the attempt if not. It is so much cleaner and you don't need all the file connections for each package to run since you are iterating through them.
I am not sure it runs in parallel (I actually doubt it).
I am adding the dataflow that populates the ADO Object that is being iterated in foreach to demonstrate the files being processed.
Note: Usually in DEV environment there are not many files to be processed, however, when deploying to TEST and PROD there will be most files present to be processed.
Can I get some feedback on these two different approaches?
Anyone that provides productive feedback will recieve upvotes!!!

I would go with modified first approach ie something like Inside package, use Script task to check if files are present in destination or not.
For instance :
Create a Script task and a variable.
Inside script task, write a code similar to the image below(Logic is, if file is found then flag it as true, else flag is false) :
Now constraint the execution of DFT by using this flag as shown below :
Only con is, you'll have to make changes in 50 packages, but this is a one time activity. Your parallel execution will remain intact.

I will go with 2nd approach as its cleaner and easy to debug.
Here are the suggestions to improve 2nd approach :
Create a Control table with all package Names, Enable/Disable flag, FileAvailable Flag
Create a Poll package which will go through files and sets flag and package flag accordingly
Loop through this Control table and run only those are enabled and having file.

Related

How to run an SSIS package multiple times?

After much work I managed to create an SSIS package that checks in a table if there are files to be produced, and if there are, it retrieves the data for the first file in the list (with the help of variables that I pass as parameters), produces the file, and sends it to the correct folder.
However, the list of files to be produced has 10 files, and the package only produces the first one on the list, and then the package ends. Because the package managed to update my database table, it will now only show 9 files that remain to be produced (down from 10). If I run the package again, it will once again retrieve the data for the first file in the list (which has gone down from 10 to 9 files to be produced), produce it and end.
Because the package updates the table, it will now contain 8 files to be produced. You get it by now. I have to click "Start" 10 times for all files to be produced. I was wondering if there was a way to do this by only running the SSIS package a single time.
If so, how? Seems quite easy but complicated at the same time. Thanks in advance for any help!
I don't have the details or specifics of your package, but generally when you want to repeat a process until inputs are exhausted, you wrap your task in a foreach loop container. check out this documentation:
https://learn.microsoft.com/en-us/sql/integration-services/control-flow/foreach-loop-container?view=sql-server-ver15

BIML Scripts fails package generation on second attempt with no changes to files

First time posting a question, please forgive if I don't have enough information.
I have a tiered BIML Script that has the following tiers:
10-Connection – create the connection nodes
20- Model – loop through the connections to build database, table nodes
30-Create/Drop Staging Tables – This is included as the package/s need to be run prior to the remainder of the creation process.
30- Flat File – loop through the table object to create flat file formats and connections
40-Packages – loop through the table objects and create extract and load packages
45-Project Params & Connections– attaches the project params and connections (using named connections and GUIDs from 10-Connections). Project params manually created in SSIS.
The process successfully connects to the source SQL Server database, generates the Create/Drop Staging Tables packages with correct metadata, and will create the extract packages successfully the first time.
Upon a second attempt to process the same BIML scripts with no changes made to the files, the process fails with “Object reference not set to an instance of an object.” & “Unable to Query on Connection” on the OleDBSource Node.
The BIML files generated in preview and output debugging have valid queries and source metadata that indicate a positive connection and proper model. I have used emitted queries in SSMS without error. When I move the BIML files to a new project, the process is successful the first time and fails subsequently.
I have tried the following:
Connection Managers
Delete project connection managers prior to package re-generation
GUIDs annotated and used in PackageProject and Packages Nodes.
Delay Validation/Validate External Metadata – I have tried with both true and false on Package, DFT and OleDBSource
Project
Delete .proj files from directory
Direct PackageProject to new ProjectSubpath
I also tried simply hard coding the BimlScript to simplify and remove any variables with the same result.
The most maddening point is that the metadata and queries all indicate the process can connect and query this exact table and it functions, but only on initial creation. Adding or re-generating during testing fails. Anyone ever come across this before?
Great thanks and shout out to cathrine-wilhelmsen, billinkc, whose posts and tutorials have been very helpful. Any and all help would be greatly appreciated.
I changed the driver from SQLNCLI11 to SQLOLEDB with no changes to code. I tested different drivers after seeing a few example connection strings with different drivers.
I wish I could explain why.

ssis temp table exec proc

SSIS newbie here.
I have an SSIS package I created based on the wizard. I added a SQL task to run the script I was running previously separately, in order to reduce the process to one step. The script uses lots of temp tables, and one global ##temp at the end to make the result accessible outside the process.
When I try to execute the package, I get a complex "Package Validation Error" (error code 0x80040E14). I think the operative part of the error message is "Invalid object name '##roster5'."
I just realized it was the Data Flow task that was throwing the error, so I tried to put another SQL Task before everything else to create the table so that Data Flow task would see that the table is there; but it is still giving me the error: "Invalid object name '##ROSTER_MEMBER_NEW5'."
What am I missing/doing wrong? I don't know what I don't know. It seems like this shouldn't be that complicated (As a newbie, I know that this is probably a duplicate of...something, but I don't know how else to ask the question.)
Based on your responses, another option would be to add a T-SQL step in a SQL Agent job that executes stand-alone T-SQL. You would need to rethink the flow control of your original SSIS package and split that into 2 separate packages. The first SSIS package would execute all that is needed before the T-SQL step, the next step would execute the actual T-SQL needed to aggregate, then the last step would call the second package, which would complete the process.
I'm offering this advice with the caveat that it isn't advisable. What would work best is to communicate with your DBA, who will be able to offer you a service account to execute your SSIS package with the elevated privileges needed to truncate the staging table that will need to exist for your process to manage.
I actually want to post a non-answer. I tried to follow the advice above as well as I could, but nothing worked. My script was supposed to run, and then the data pump was supposed to, essentially copy the content of a global temp to another server/table. I was doing this as two steps, and tried to use SSIS to do it all in one step. there wasn't really a need to pass values within SSIS from component to component. It doesn't seem like this should be that hard.
In any event, as I said nothing worked. Ok, let me tell what I think happened. After making a lot of mistakes, a lot of undo's, and a lot of unsuccessful attempts, something started working. One of the things I think contributes is that I had set the ResultSetType to ResultSetType_None, since I wouldn't be using any results from that step. If anyone thinks that's not what happened, I'm happy to hear the actuality, since I want to learn.
I consider this a non-answer, because I have little confidence that I'm right, or that I got it by anything other than an accident.

Drop and Restore database on package failure using SSIS

Say for example I have an SSIS package with more than 20 steps doing an assortment of tasks and I wish to do the following when the package fails:
1.) Drop the database
2.) Restore the backup taken at the beginning
3.) Send an email containing the log file
At the moment I have added these steps into the OnError event at package level, and this works apart from the fact that it is generally doing this twice each time the package fails. I understand that the OnError may occur multiple times before the whole package terminates but I don't understand how I can do what I want any other way?
I essentially want to run the said steps on package termination i.e. it will run once not several times depending on the number of errors that caused the package to fail. I don't mind receiving two emails with the only difference being an extra error in one but I don't think it is right to drop/restore the database twice for no reason. I cannot see a suitable event for this?
One solution is to put all the steps of your package in a container, changing the OnError handler to increment an ErrorCount variable, and putting another container that happens OnCompletion of the main container that checks the ErrorCount and performs the actions in your current OnError handler if the count > 0.

How to pass a Data Flow from one package to another

I'd like to pass a Data Flow from one package to another for the following reasons:
It would help in refactoring common
logic in SSIS packages.
It would enable concurrent
development of larger SSIS packages.
At first glance, the Execute Package Task sounded promising, but it looks like I can only pass fairly simple variables in and out of the package.
Is there a way to do this using SSIS?
cozyroc.com is a third party tool that can do this I believe.
A bit of clarity Paul, are you talking about 1) code reusability or 2) allowing the results from one DFT to be used in another DFT?
The first you can't do in "native" SSIS I believe - a set of DFT modules that you can call from other packages, but I would approach it as building a set of packages that are quite simple
initialision routines
DFT
cleanup
Then having variables passed to the child package that are (e.g.) table to be processed, variable(s) to be selected from the source table.
It would requrie a very clever schema and some clever thinking about what the common DFT would do. But I think it would be possible.
The second is not possible without jumping through a few hoops - like saving result sets to temporary tables then re-reading the tables into later DFTs, but then you would loose the data actually flowing through the task.