Executing the multiple instances of same package simultaneously through application? - ssis

I have a SSIS package which is executed from application. Can the same package be called simultaneously?

Yes, you can run package simultaneously, but keep in mind, this can sometimes lead to a deadlock on processed data (depending on the flow).
Package Property MaxConcurrentExecutables defines how many can remain running at once. The default is -1 and then it depends on the available cores (threads), but you can change it to whatever you want.

Yes, That could be done.
You can use Packages like classes and call multiple instances simultaneously in a parent package providing it with different set of parameters using a sequence container and call multiple instances of same package using Execute package task using same package connection.
Maximum number of tasks that could run simultaneously is dependent on number of cores in your machine.
Number of concurrent tasks= Number of cores +2

Related

How SSIS choose data flow tasks to be executed parallel?

I have a SSIS package which has many (more than 50) dataflow tasks. When I'm executing the package I notice that maximally 6 data flow tasks executed parallelly. What I want to know is How the SSIS choose which task to be executed first? note that data flow tasks are not connected. If anyone knows any documentation regarding this, please share.
There's no official/supported flag anywhere that enforces the order.
You can define precedence constrains to force order, but then you will loose some flexibility of parallelism.
You can increase the number of concurrent tasks (6 - default is number of cores plus 2) by setting MaxConcurrentExecutables.
You can refer to the following link for more information:
https://www.sqlservercentral.com/blogs/parallel-execution-in-ssis-with-maxconcurrentexecutables

Getting the maximum concurrent executables at runtime in SSIS

I have a SSIS package that is executing some SQL task over a big list of servers. Since the number is quite big I am trying to split the workload and make it process in parallel. The problem is that I need to know exactly in how many parts I can split it, depending on the number of Logical Processors of the machine that runs it.
Is there any way to get the number of logical processors in SSIS so the work can be organized based on that ?
A C# script task returning System.Environment.ProcessorCount , https://msdn.microsoft.com/en-us/library/system.environment.processorcount.aspx .
Or if you want the more specific details, it looks like you need to execute WMI queries, How to find the Number of CPU Cores via .NET/C#? .

Run 100+ SSIS packages in parallel from a parent package

I have 100+ child packages and I need to run them in parallel from a parent package. For this I will have to create 100+ Execute Package tasks and then 100+ File Connections. This doesn't look appealing to me and it is repetative and error prone. Is there any other way to do this. Keep two things in mind.
Child package Execution should be in parallel (so no For loop and stuffs)
I am using CheckPoint based restart-ability and hence need control flow items at compile time (no script component based solutions too)
UPDATE: Even if you have massive hardware, windows limits the number of concurrent tasks you can start simultaneously due to an inherent design issue. Though I achieved parallel execution using jobs, I had to limit it to 25 parallel packages at a time to avoid random failures due to the windows issue.
Does it have to be file connections? Have you looked at the options of having the packages stored in the SSIS package store and referencing it from there.
You would still have your 100+ components, but not your 100+ file connections.
I give up. There is no way AFAIK. I decided to create 100+ jobs, one job per package and using the same schedule. Creating jobs was easier using Dynamic SQL.
You could create the package dynamically with EzAPI.
http://blogs.msdn.com/b/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx

Scheduling identical SSIS packages to run in parallel

We've got an architecture where we intend to use SSIS as a data-loading engine for incoming batches. The intent is to reduce the need for manual intervention & configuration and automate the function as much as possible so we're looking at setting up our "batch monitoring" package to run as scheduled SQL Server Agent jobs.
Is it possible to schedule several SQL Server Agent jobs using the same package, possibly looking at different folders or working on different data chunks (grouped by batch ids?
We might also have 3 or 4 “jobs” all running the same package and all monitoring the same folder for incoming files, but at slightly different intervals to avoid file contention issues.
I don't know of any reason you couldn't do this. You could launch the packages each with a different configuration (or configurations) pointing to different working directories, input folders, etc.

How to make this SSIS scenario more parallel

I have a million rows in a database table. For each row I have to run a custom exe, parse the output and update another database table
How can I run process multiple rows in parallel?
I now have a simple dataflow task ->GetData->Run Script (Run Process , Parse Output)->Store Data
For 6000 rows it took 3 hours.Way too much.
There is the single bottleneck here, running the process per each row. Increasing "EngineThreads" would not help at all, as there will be only one thread running this particular script transform anyway. The time spent in other transforms probably does not matter at all. Processes are heavy weight objects, and running thousands of them will never be cheap.
I can think of following ideas to make it better:
1) The best way to fix it is to convert your custom EXE into an assembly and call it from the script transform - to avoid the overhead of creating processes, parsing the output etc.
2) If you have to use the separate processes, you can try to run these processes in parallel. It will help if the process mostly waits for some input/output (i.e. it is I/O bound). If the processes are memory bound or CPU bound, you would not win much by running them in parallel.
2A) Complex script, simple package.
To run them in parallel, modify the ProcessInput method in your script to start the process asynchronously, and don't wait for the process completion - move to the next row and create the next process. Subscribe to process output and process Exited event, so you know when it has finished. Limit the number of processes run in parallel - otherwise you'll run out of memory. Wait until all the processes are done before returning from ProcessInput call.
2B) Simple script, complex package.
Keep the current sequential script, but partition the data using SSIS. Add conditional split transform, and split the input stream into multiple streams, based on some hash expression - something that will make each output to receive approximately the same amount of data. The number of streams equals the number of process instances you want to run in parallel. Add your script transform to each output of conditional split. Now you should also increase "Engine Threads" property :) and these transforms will run in parallel. (Note: based on tag, I assume you use SSIS 2008. You'll need to insert additional Union All transforms to make it work in SSIS 2005).
This should make it perform better, but millions of processes is a lot. You'll hardly get really good performance here.
If you are executing this process using the "data flow" container, then there is a property on it called "EngineThreads" which defaults to a value of 5. You can set it to a higher number like 20, which will devote more threads to processing those rows.
That is just a performance tweak or optmisation, if your ssis package is still running really slowly then I would perhaps address the architecture and design of your package.