Foreach Loop Container with more than one Input Constraint - ssis

I have a weird problem with Foreach Loop Container.
I have a package to take backup of out SSAS cubes. We have both UDM and Tabular cubes. Considering below figure, based on a variable, flow should go to Find UDM Cubes OR Find TAB Cubes, so I used Expression in Constrains (connections)
With one specific parameter, the flow should go trough Find UDM Cubes and with a different parameter, flow should go through Find TAB Cubes.
When testing, I noticed that the the package is not doing as expected and the Script Task is not executing. If I remove one of the highlighted Constraints (connection), Script Tasks get hit and works. So as long as I have ONE input for the Script Task it works otherwise it just not do anything.
Appreciate if anybody can help.

Multiple Precedence Constraints
Both of your Data flow Tasks would have to succeed in order for the script task to run. Which you are stating that both data flows may not even execute so therefore both don't succeed.
here is a nice article on it https://msdn.microsoft.com/en-us/library/ms139895.aspx
One way to get your desired behavior would be to add a sequence container and move your clean up and find tasks into it and then create the precedence from the sequence container to your script task. That way even if only 1 runs everything is still considered successful and your script task should execute.
this precedence suggestion has been tested and works.

Related

ssis temp table exec proc

SSIS newbie here.
I have an SSIS package I created based on the wizard. I added a SQL task to run the script I was running previously separately, in order to reduce the process to one step. The script uses lots of temp tables, and one global ##temp at the end to make the result accessible outside the process.
When I try to execute the package, I get a complex "Package Validation Error" (error code 0x80040E14). I think the operative part of the error message is "Invalid object name '##roster5'."
I just realized it was the Data Flow task that was throwing the error, so I tried to put another SQL Task before everything else to create the table so that Data Flow task would see that the table is there; but it is still giving me the error: "Invalid object name '##ROSTER_MEMBER_NEW5'."
What am I missing/doing wrong? I don't know what I don't know. It seems like this shouldn't be that complicated (As a newbie, I know that this is probably a duplicate of...something, but I don't know how else to ask the question.)
Based on your responses, another option would be to add a T-SQL step in a SQL Agent job that executes stand-alone T-SQL. You would need to rethink the flow control of your original SSIS package and split that into 2 separate packages. The first SSIS package would execute all that is needed before the T-SQL step, the next step would execute the actual T-SQL needed to aggregate, then the last step would call the second package, which would complete the process.
I'm offering this advice with the caveat that it isn't advisable. What would work best is to communicate with your DBA, who will be able to offer you a service account to execute your SSIS package with the elevated privileges needed to truncate the staging table that will need to exist for your process to manage.
I actually want to post a non-answer. I tried to follow the advice above as well as I could, but nothing worked. My script was supposed to run, and then the data pump was supposed to, essentially copy the content of a global temp to another server/table. I was doing this as two steps, and tried to use SSIS to do it all in one step. there wasn't really a need to pass values within SSIS from component to component. It doesn't seem like this should be that hard.
In any event, as I said nothing worked. Ok, let me tell what I think happened. After making a lot of mistakes, a lot of undo's, and a lot of unsuccessful attempts, something started working. One of the things I think contributes is that I had set the ResultSetType to ResultSetType_None, since I wouldn't be using any results from that step. If anyone thinks that's not what happened, I'm happy to hear the actuality, since I want to learn.
I consider this a non-answer, because I have little confidence that I'm right, or that I got it by anything other than an accident.

Feedback requested for SSIS Master package design - Running a bunch of Sub-Packages

Overall, I am looking for feedback regarding two different design options of running a master package.
I have one package that Agent calls that runs a bunch of packages that process data (I think we are up to about 50 now).
The original design was to group packages into smaller chunks called directorates which call the actual packages. Sample below:
A few perceptions I see (and experienced) with this approach is that:
1. Every package has to open (even if it is unnecessary to run ie no file present)
2. #1 adds so much time for the process to complete
3. Runs in parallel for sure
So I developed a new approach which will only run the packages that have the necessary files and logs the attempt if not. It is so much cleaner and you don't need all the file connections for each package to run since you are iterating through them.
I am not sure it runs in parallel (I actually doubt it).
I am adding the dataflow that populates the ADO Object that is being iterated in foreach to demonstrate the files being processed.
Note: Usually in DEV environment there are not many files to be processed, however, when deploying to TEST and PROD there will be most files present to be processed.
Can I get some feedback on these two different approaches?
Anyone that provides productive feedback will recieve upvotes!!!
I would go with modified first approach ie something like Inside package, use Script task to check if files are present in destination or not.
For instance :
Create a Script task and a variable.
Inside script task, write a code similar to the image below(Logic is, if file is found then flag it as true, else flag is false) :
Now constraint the execution of DFT by using this flag as shown below :
Only con is, you'll have to make changes in 50 packages, but this is a one time activity. Your parallel execution will remain intact.
I will go with 2nd approach as its cleaner and easy to debug.
Here are the suggestions to improve 2nd approach :
Create a Control table with all package Names, Enable/Disable flag, FileAvailable Flag
Create a Poll package which will go through files and sets flag and package flag accordingly
Loop through this Control table and run only those are enabled and having file.

Load the same file using different data flow task with different script component

I have encountered a weird situation now.
Basically I have two data flow tasks in my SSIS package, these two tasks are loading the data to the same staging table. Each one is using the script component as the data source, and using the StreamReader on different two files.
If I enable two tasks at the same time, the 2nd data flow task loads the same data as the first one. But if I disable the 1st one and just leave the 2nd one enabled, it loads the correct file as expected.
I am not very sure what did I do wrong since all the StreamReader are defined on files with different names, the only common part is loading to the same destination.
And these two tasks are not in parallel, they are being constrained in sequence.
Any suggestion or advice? Thanks in advance.
Thanks for the comments, actually I found why they are trying to load the same data when two connections are open at the same time. It is because I copied the 1st script component to the 2nd one, which sharing the same task ID within the container, I tested multiple times, it turned out to be if the task ID are the same, the 2nd one will do what the 1st does, basically it is just the replica task as the 1st one even the codes are different.
Please correct me if my findings are wrong, but it seems it is what it is during my past tests.

Drop and Restore database on package failure using SSIS

Say for example I have an SSIS package with more than 20 steps doing an assortment of tasks and I wish to do the following when the package fails:
1.) Drop the database
2.) Restore the backup taken at the beginning
3.) Send an email containing the log file
At the moment I have added these steps into the OnError event at package level, and this works apart from the fact that it is generally doing this twice each time the package fails. I understand that the OnError may occur multiple times before the whole package terminates but I don't understand how I can do what I want any other way?
I essentially want to run the said steps on package termination i.e. it will run once not several times depending on the number of errors that caused the package to fail. I don't mind receiving two emails with the only difference being an extra error in one but I don't think it is right to drop/restore the database twice for no reason. I cannot see a suitable event for this?
One solution is to put all the steps of your package in a container, changing the OnError handler to increment an ErrorCount variable, and putting another container that happens OnCompletion of the main container that checks the ErrorCount and performs the actions in your current OnError handler if the count > 0.

SSIS Control Flow vs Data Flow

I don't entirely understand the purpose of control flow in an SSIS package. In all of the packages I've created, I simply add a data flow component to control flow and then the rest of the logic is located within the data flow.
I've seen examples of more complicated control flows (EX: foreach loop container that iterates over lines in an Excel file.), but am looking for an example where it could not also be implemented in the data flow. I could just as easily create a connection to the excel file within the data flow.
I'm trying to get a better understanding of when I would need to (or should) implement logic in control flow vs using the data flow to do it all.
What prompted me to start looking into control flow and it's purpose is that I'd like to refactor SSIS data flows as well as break packages down into smaller packages in order to make it easier to support concurrent development.
I'm trying to wrap my mind around how I might use control flow for these purposes.
A data flow defines a flow of data from a source to a destination. You do not start on one data flow task and move to the next. Data flows between your selected entities (sources, transformations, destinations).
Moreover within a data flow task, you cannot perform tasks such as iteration, component execution, etc.
A control flow defines a workflow of tasks to be executed, often a particular order (assuming your included precedence constraints). The looping example is a good example of a control-flow requirement, but you can also execute standalone SQL Scripts, call into COM interfaces, execute .NET components, or send an email. The control flow task itself may not actually have anything whatsoever to do with a database or a file.
A control flow task is doing nothing in itself TO the data. It is executing some that itself may (or may not) act upon data somewhere. The data flow task IS doing something with data. It defines its movement and transformation.
It should be obvious when to execute control flow logic and data flow logic, as it will be the only way to do it. In your example, you cite the foreach container, and state that you could connect to the spreadsheet in the data flow. Sure, for one spreadsheet, but how would you do it for multiple ones in a folder? In the data flow logic, you simply can't!
Hope this helps.
Data flow - are for just moving data from one source to another.
Control flow - provide the logic for when data flow components are run and how they are run. Also control flow can: perform looping, call stored procedures, move files, manage error handling, check a condition and call different tasks (incl data flows) depending on the result, process a cube, trigger another process, etc.
If you're moving data from one location to another and it's the same each time, not based on any other condition, then you can get away with a package with just a dataflow task, but in most cases packages are more complex than that.
We use the control flow for many things. First all our data concerning the data import is stored in tables. So we run procs to start the dataflow and end it, so that our logging works correctly, we do looping through a set of files, we move files to archive locations and rename with the date and delete them from processing locations. We have a separate program that does file movement and validates the files for the correct comlumns and size. We run a proc to make sure the file has been validated before going into the dataflow. Sometimes we have a requirement to send an email when a file is processed or send a report of records which could not process. These emails are put into the control flow. Sometimes we have some clean up steps that are more easily accomplished using a stored proc and thus put the step in the control flow.
Trying to give a basic answer - Control Flow performs operations; such as executing a SQL Statement or Sending an email. When a control flow is complete, it either failed or succeeded.
Data flow on the other hand is found on container flow items and offers ability to move, modify and manipulate data.