I have an SSIS Package scheduled to run every X minutes in SQLAgent that subsequenly executes a plethora of child packages if certain conditions are met. The problem I am having is that sometimes some of the child packages take a lot longer than X minutes to run which in turn means that nothing else can run until all child packages complete.
This also means that during that run time the conditions to run a child package may have come and went which would mean that they do not run, even when the original package completes.
Is there a way to allow concurrent instances of a parent package to run even if it is previously running?
ParentA is scheduled to run every 10 minutes, at 10:00 it kicks off and ChildA's criteria is met. ChildB needs to run at 10:20am, it is not met so it does not run. ChildA takes 3 hours to complete.
I need to have a new instance of ParentA kick off at 10:10 and then again at 10:20.
How can I go about doing this without having 2+ ParentA scheduled and having to do some fancy coding so multiple instances of the child packages aren't kicked off?
Thanks
I would change the ParentA package so that it starts once and does not terminate, but has a series of loops that launch Child packages.
Within PackageA I would add a For Loop Container, within that I would add an Execute SQL Task that uses SQL WAITFOR command to pause for 10 minutes. Then I would run ChildA package.
I would repeat that pattern for ChildB etc. As each For Loop Container is independant, they will loop independantly. Each individual Loop Container will wait for it's Child package to complete before looping.
You need to add some mechanism to stop the looping, e.g. check on the system time, count the number of loops or similar. This test would go in each For Loop's EvalExpression property.
Related
Good day!
I have a SSIS package that retrieves data from a database and exports it to a flat file (simple process). The issue I am having is that the data my package retrieves each morning depends on a separate process to load the data into a table prior to my package retrieving it.
Now, the process which initially loads the data does inserts metadata into a table which shows the start and end date/time. I would like to setup something in my package that checks the metadata table for an end date/time for the current date. If the current date exists, then the process continues... IF no date/ time exists then the process stops (here is the kicker) BUT the package re-triggers itself automatically an hour later to check if the initial data load is complete.
I have done research on checkpoints, etc. but all that seems to cover is if the package fails it would pick up where it left when the package is restarted. I don't want to manually re-trigger the process, I'd like it to check the metadata and re-start itself if possible. I could even put in processing that if it checks the metadata 3 times it would stop completely.
Thanks so much for your help
What you want isn't possible exactly the way you describe it. When a package finishes running, it's inert. It can't re-trigger itself, something has to re-trigger it.
That doesn't mean you have to do it manually. The way I would handle this is to have an agent job scheduled to run every hour for X number of hours a day. The job would call the package every time, and the meta data would tell the package whether it needs to do anything or just do nothing.
There would be a couple of ways to handle this.
They all start by setting up the initial check, just as you've outlined above. See if the data you need exists. Based on that, set a boolean variable (I'll call it DataExists) to TRUE if your data is there or FALSE if it isn't. Or 1 or 0, or what have you.
Create two Precedent Contraints coming off that task; one that requires that DataExists==TRUE and, obviously enough, another that requires that DataExists==FALSE.
The TRUE path is your happy path. The package executes your code.
On the FALSE path, you have options.
Personally, I'd have the FALSE path lead to a forced failure of the package. From there, I'd set up the job scheduler to wait an hour, then try again. BUT I'd also set a limit on the retries. After X retries, go ahead and actually raise an error. That way, you'll get a heads up if your data never actually lands in your table.
If you don't want to (or can't) get that level of assistance from your scheduler, you could mimic the functionality in SSIS, but it's not without risk.
On your FALSE path, trigger an Execute SQL Task with a simple WAITFOR DELAY '01:00:00.00' command in it, then have that task call the initial check again when it's done waiting. This will consume a thread on your SQL Server and could end up getting dropped by the SQL Engine if it gets thread starved.
Going the second route, I'd set up another Iteration variable, increment it with each try, and set a limit in the precedent constraint to, again, raise an actual error if you're data doesn't show up within a reasonable number of attempts.
Thanks so much for your help! With some additional research I found the following article that I was able to reference and create a solution for my needs. Although my process doesn't require a failure to attempt a re-try, I set the process to force fail after 3 attempts.
http://microsoft-ssis.blogspot.com/2014/06/retry-task-on-failure.html
Much appreciated
Best wishes
I have some a job in SSIS that has 5 steps. The way it currently works is that it will go through the steps in order waiting for the previous one to be complete before doing the next. However with this job, steps 1-4 could all run at the same time with impacting the results of each other. So I was curious if it was possible to have steps 1-4 all run at the same time and once all are complete then start step 5.
I am open to the idea of doing this in other ways such as maybe having several different jobs, using triggers or anything else that will get the end result.
The main goal here is to have step 5 start as soon as possible but step 5 can not start until all 4 steps are done.
All of these steps are merely running a stored procedure to update a table.
I am using SQL 2012. I am very new to SSIS.
This is what the Sequence Container tool is for.
You can put steps 1-4 in a Sequence Container and let them run in parallel in the container, and then have a Precedence Constraint from the container to Step 5.
In your package set MaxConcurrentExecutables to.. say.. 6 and make sure there are no precedence constraints between your tasks.
Then they should run in parallel.
See here for more detail. https://blogs.msdn.microsoft.com/sqlperf/2007/05/11/implement-parallel-execution-in-ssis.
I'm curious - did you try googling this?
ETL package finished successfully without execution of several last tasks:
Then I've tried to run task with the same type and skip other:
After that I've created separate package with the last five tasks and they run great as expected!
Question:
What's happen with the flow on the first two figures? Why does package skip several tasks without any warnings/errors etc.?
Thanks a lot for answers and any ideas about this strange behavior!
[UPDATE] Answered by #Peter_R:
I've changed both sp_updatestats inputs from AND to OR and everything is ok. Arrows was changed to dotted ones:
The Logical AND constraint requires all tasks to complete before running so you SP_Updatestats will not run until both ProcessFull and MeasureGroupSet Loop have completed.
I am guessing after Deploy Data the Expression is designed to split the workflow depending on a condition you have set. In doing this you will never have both ProcessFull and MeasureGroupSet Loop running in parallel, meaning that the SP_UpdateStats task will never run.
If you change both the connecting constraints to the SP_UpdateStats to Logical OR it will run after either the ProcessFull OR MeasureGroupSet Loop has completed.
This is still the case if something is disabled as well, slightly odd but still the case.
I have a SSIS package which is executed from application. Can the same package be called simultaneously?
Yes, you can run package simultaneously, but keep in mind, this can sometimes lead to a deadlock on processed data (depending on the flow).
Package Property MaxConcurrentExecutables defines how many can remain running at once. The default is -1 and then it depends on the available cores (threads), but you can change it to whatever you want.
Yes, That could be done.
You can use Packages like classes and call multiple instances simultaneously in a parent package providing it with different set of parameters using a sequence container and call multiple instances of same package using Execute package task using same package connection.
Maximum number of tasks that could run simultaneously is dependent on number of cores in your machine.
Number of concurrent tasks= Number of cores +2
I have a million rows in a database table. For each row I have to run a custom exe, parse the output and update another database table
How can I run process multiple rows in parallel?
I now have a simple dataflow task ->GetData->Run Script (Run Process , Parse Output)->Store Data
For 6000 rows it took 3 hours.Way too much.
There is the single bottleneck here, running the process per each row. Increasing "EngineThreads" would not help at all, as there will be only one thread running this particular script transform anyway. The time spent in other transforms probably does not matter at all. Processes are heavy weight objects, and running thousands of them will never be cheap.
I can think of following ideas to make it better:
1) The best way to fix it is to convert your custom EXE into an assembly and call it from the script transform - to avoid the overhead of creating processes, parsing the output etc.
2) If you have to use the separate processes, you can try to run these processes in parallel. It will help if the process mostly waits for some input/output (i.e. it is I/O bound). If the processes are memory bound or CPU bound, you would not win much by running them in parallel.
2A) Complex script, simple package.
To run them in parallel, modify the ProcessInput method in your script to start the process asynchronously, and don't wait for the process completion - move to the next row and create the next process. Subscribe to process output and process Exited event, so you know when it has finished. Limit the number of processes run in parallel - otherwise you'll run out of memory. Wait until all the processes are done before returning from ProcessInput call.
2B) Simple script, complex package.
Keep the current sequential script, but partition the data using SSIS. Add conditional split transform, and split the input stream into multiple streams, based on some hash expression - something that will make each output to receive approximately the same amount of data. The number of streams equals the number of process instances you want to run in parallel. Add your script transform to each output of conditional split. Now you should also increase "Engine Threads" property :) and these transforms will run in parallel. (Note: based on tag, I assume you use SSIS 2008. You'll need to insert additional Union All transforms to make it work in SSIS 2005).
This should make it perform better, but millions of processes is a lot. You'll hardly get really good performance here.
If you are executing this process using the "data flow" container, then there is a property on it called "EngineThreads" which defaults to a value of 5. You can set it to a higher number like 20, which will devote more threads to processing those rows.
That is just a performance tweak or optmisation, if your ssis package is still running really slowly then I would perhaps address the architecture and design of your package.