My ssis package is running very slowly - ssis

My ssis package is running very slowly. The package is like this, using FTP task we will collect files from server and then loads that data into sql server table. It is scheduled to run every night as a job. When I run it in IDE it is very fast. And when run it on SQL server Aegnt as a job, for some days it fast. But as days progress the package is taking much time to execute. What I have to do for upcoming this issue? Please give me in detail.

Logging implementation makes sense. Than you can find out certain part(-s) of your ETL, which slows down entire package.
As for some variants:
What about other tasks on SQL or FTP servers the same time as
package execution? Looks like other scheduled tasks (backups,
defrags etc.) take server resources from time to time. Is there any
repeated sequence of performance decreasing?
Amount of processed data. Let's say files
represent sales, which have drastically other amount on weekend.
Based on code. For example, manual execution let
server to truncate some temporary tables, but automatic version uses
the same tables w/o truncation, which slows down every next day.
But first of all: logging may help as a start point of fixing.

Related

SSIS ETL solution needs to import 600,000 small simple files every hour. What would be optimal Agent scheduling interval?

The hardware, infrastructure, and redundancy are not in the scope of this question.
I am building an SSIS ETL solution needs to import ~600,000 small, simple files per hour. With my current design, SQL Agent runs the SSIS package, and it takes ā€œnā€ number of files and processes them.
Number of files per batch ā€œnā€ is configurable
The SQL Agent SSIS package execution is configurable
I wonder if the above approach is a right choice? Or alternatively, I must have an infinite loop in the SSIS package and keep taking/processing the files?
So the question boils down to a choice between infinite loop vs. batch+schedule. Is there any other better option?
Thank you
In a similar situation, I run an agent job every minute and process all files present. If the job takes 5 minutes to run because there are alot of files, the agent skips the scheduled runs until the first one finishes so there is no worry that two processes will conflict with each other.
Is SSIS the right tool?
Maybe. Let's start with the numbers
600000 files / 60 minutes = 10,000 files per minute
600000 files / (60 minutes * 60 seconds) = 167 files per second.
Regardless of what technology you use, you're looking at some extremes here. Windows NTFS starts to choke around 10k files in a folder so you'll need to employ some folder strategy to keep that count down in addition to regular maintenance
In 2008, the SSIS team managed to load 1TB in 30 minutes which was all sourced from disk so SSIS can perform very well. It can also perform really poorly which is how I've managed to gain ~36k SO Unicorn points.
6 years is a lifetime in the world of computing so you may not need to take such drastic measures as the SSIS team did to set their benchmark but you will need to look at their approach. I know you've stated the hardware is outside of the scope of discussion but it very much is inclusive. If the file system (san, nas, local disk, flash or whatever) can't server 600k files then you'll never be able to clear your work queue.
Your goal is to get as many workers as possible engaged in processing these files. The Work Pile Pattern can be pretty effective to this end. Basically, a process asks: Is there work to be done? If so, I'll take a bit and go work on it. And then you scale up the number of workers asking and doing work. The challenge here is to ensure you have some mechanism to prevent workers from processing the same file. Maybe that's as simple as filtering by directory or file name or some other mechanism that is right for your situation.
I think you're headed down this approach based on your problem definition with the agent jobs that handle N files but wanted to give your pattern a name for further research.
I would agree with Joe C's answer - schedule the SQL Agent job to run as frequently as needed. If it's already running, it won't spawn a second process. Perhaps you're going to have multiple agents that all start every minute - AgentFolderA, AgentFolderB... AgentFolderZZH and they are each launching a master package that then has subprocesses looking for work.
Use WMI Event viewer watcher to know if new file arrived or not and next step you can call job scheduler to execute or execute direct the ssis package.
More details on WMI event .
https://msdn.microsoft.com/en-us/library/ms141130%28v=sql.105%29.aspx

Data driven SSRS subscription delayed execution on the first run

I have a weird issue with one of the data driven subscriptions on SSRS.
The subscription is a timed subscription that generates invoices (pdf/excel) and gets triggered by a stored procedure.
The issue we are facing is that the first run always takes 30-60 minutes regardless of how many invoices are being generated. Once the first run has completed the subsequent runs are completed under a minute throughout the day.
There is a second version of the same report that is run manually and it runs fine(ruling out any delays with the data extraction bit).
I have looked at some other questions here but that didnt help identify the problem:
SQL Reporting services: First call is very slow
SSRS report subscription not working sometime
Without knowing more about the query, data, database setup, other process, etc.; it will be quite difficult to say for sure. But if I had to guess, based on your description, it sounds like the query plan cache is lost and is rebuilt on the first run of the day. Without the plan the query can be less efficient. Each subsequent run will use the plan created on the first run, and will therefor run more quickly. There are a number of reasons that could cause the query plan to be wiped from cache. A recompile, other queries using too much memory, not enough system memory to begin with etc.
Hope that helps!

SQL Server continuous job

I have a requirement to run a job continuously which includes a stored procedure. This stored procedure does a critical task where it processes huge load of data as they come. As I know, it is not allowed to run 2 or more instances of a job in the same time by SQL Server it self. So, my questions are
Is there a way to run SQL Sever job continuously?
Do continuously running jobs hurt performance of the server?
There are continuous replication jobs; however, those are continuous because of an inline switch used in the command line and not due to the job being scheduled as continuous.
The only way to emulate a continuous job is to simply have it run often. There is an option under scheduling to run the job down to every second 24/7/365. With that said, you will need to be careful that the job isn't overrunning itself and that it is efficient enough to not cause issues with your server.
Whether it will effect performance is going to be reliant on what it does. If the job only selects the current date/time (not a very useful thing to do but an example), I would not expect an issue; however, if it runs complicated algorithms then it almost certainly going to cause issues.
I would recommend running this on a test server before putting it into production.

Advantage of SSIS package over windows scheduled exe

I have an exe configured under windows scheduler to perform timely operations on a set of data.
The exe calls stored procs to retrieve data and perform some calcualtions and updates the data back to a different database.
I would like to know, what are the pros and cons of using SSIS package over scheduled exe.
Do you mean what are the pros and cons of using SQL Server Agent Jobs for scheduling running SSIS packages and command shell executions? I don't really know the pros about windows scheduler, so I'll stick to listing the pros of SQL Server Agent Jobs.
If you are already using SQL Server Agent Jobs on your server, then running SSIS packages from the agent consolidates the places that you need to monitor to one location.
SQL Server Agent Jobs have built in logging and notification features. I don't know how Windows Scheduler performs in this area.
SQL Server Agent Jobs can run more than just SSIS packages. So you may want to run a T-SQL command as step 1, retry if it fails, eventually move to step 2 if step 1 succeeds, or stop the job and send an error if the step 1 condition is never met. This is really useful for ETL processes where you are trying to monitor another server for some condition before running your ETL.
SQL Server Agent Jobs are easy to report on since their data is stored in the msdb database. We have regualrly scheduled subscriptions for SSRS reports that provide us with data about our jobs. This means I can get an email each morning before I come into the office that tells me if everything is going well or if there are any problems that need to be tackled ASAP.
SQL Server Agent Jobs are used by SSRS subscriptions for scheduling purposes. I commonly need to start SSRS reports by calling their job schedules, so I already have to work with SQL Server Agent Jobs.
SQL Server Agent Jobs can be chained together. A common scenario for my ETL is to have several jobs run on a schedule in the morning. Once all the jobs succeed, another job is called that triggers several SQL Server Agent Jobs. Some jobs run in parallel and some run serially.
SQL Server Agent Jobs are easy to script out and load into our source control system. This allows us to roll back to earlier versions of jobs if necessary. We've done this on a few occassions, particularly when someone deleted a job by accident.
On one ocassion we found a situation where Windows Scheduler was able to do something we couldn't do with a SQL Server Agent Job. During the early days after a SAN migration we had some scripts for snapshotting and cloning drives that didn't work in a SQL Server Agent Job. So we used a Windows Scheduler task to run the code for a while. After about a month, we figured out what we were missing and were able to move the step back to the SQL Server Agent Job.
Regarding SSIS over exe stored procedure calls.
If all you are doing is running stored procedures, then SSIS may not add much for you. Both approaches work, so it really comes down to the differences between what you get from a .exe approach and SSIS as well as how many stored procedures that are being called.
I prefer SSIS because we do so much on my team where we have to download data from other servers, import/export files, or do some crazy https posts. If we only had to run one set of processes and they were all stored procedure calls, then SSIS may have been overkill. For my environment, SSIS is the best tool for moving data because we move all kinds of types of data to and from the server. If you ever expect to move beyond running stored procedures, then it may make sense to adopt SSIS now.
If you are just running a few stored procedures, then you could get away with doing this from the SQL Server Agent Job without SSIS. You can even parallelize jobs by making a master job start several jobs via msdb.dbo.sp_start_job 'Job Name'.
If you want to parallelize a lot of stored procedure calls, then SSIS will probably beat out chaining SQL Server Agent Job calls. Although chaining is possible in code, there's no visual surface and it is harder to understand complex chaining scenarios that are easy to implement in SSIS with sequence containers and precedence constraints.
From a code maintainability perspective, SSIS beats out any exe solution for my team since everyone on my team can understand SSIS and few of us can actually code outside of SSIS. If you are planning to transfer this to someone down the line, then you need to determine what is more maintainable for your environment. If you are building in an environment where your future replacement will be a .NET programmer and not a SQL DBA or Business Intelligence specialist, then SSIS may not be the appropriate code-base to pass on to a future programmer.
SSIS gives you out of the box logging. Although you can certainly implement logging in code, you probably need to wrap everything in try-catch blocks and figure out some strategy for centralizing logging between executables. With SSIS, you can centralize logging to a SQL Server table, log files in some centralized folder, or use another log provider. Personally, I always log to the database and I have SSRS reports setup to help make sense of the data. We usually troubleshoot individual job failures based on the SQL Server Agent Job history step details. Logging from SSIS is more about understanding long-term failure patterns or monitoring warnings that don't result in failures like removing data flow columns that are unused (early indicator for us of changes in the underlying source data structure) or performance metrics (although stored procedures also have a separate form of logging in our systems).
SSIS give you a visual design surface. I mentioned this before briefly, but it is a point worth expanding upon on its own. BIDS is a decent design surface for understanding what's running in what order. You won't get this from writing do-while loops in code. Maybe you have some form of a visualizer that I've never used, but my experience with coding stored procedure calls always happened in a text editor, not in a visual design layer. SSIS makes it relatively easy to understand precedence and order of operations in the control flow which is where you would be working if you are using execute sql tasks.
The deployment story for SSIS is pretty decent. We use BIDS Helper (a free add-in for BIDS), so deploying changes to packages is a right click away on the Solution Explorer. We only have to deploy one package at a time. If you are writing a master executable that runs all the ETL, then you probably have to compile the code and deploy it when none of the ETL is running. SSIS packages are modular code containers, so if you have 50 packages on your server and you make a change in one package, then you only have to deploy the one changed package. If you setup your executable to run code from configuration files and don't have to recompile the whole application, then this may not be a major win.
Testing changes to an individual package is probably generally easier than testing changes in an application. Meaning, if you change one ETL process in one part of your code, you may have to regression test (or unit test) your entire application. If you change one SSIS package, you can generally test it by running it in BIDS and then deploying it when you are comfortable with the changes.
If you have to deploy all your changes through a release process and there are pre-release testing processes that you must pass, then an executable approach may be easier. I've never found an effective way to automatically unit test a SSIS package. I know there are frameworks and test harnesses for doing this, but I don't have any experience with them so I can't speak for the efficacy or ease of use. In all of my work with SSIS, I've always pushed the changes to our production server within minutes or seconds of writing the changes.
Let me know if you need me to elaborate on any points. Good luck!
If you have dependency on Windows features, like logging, eventing, access to windows resources- go windows scheduler/windows services route. If it is just db to db movement or if you need some kind of heavy db function usage- go SSIS route.

How many tasks can task scheduler run simultaneously

How many tasks can task scheduler run at the same time?
I have set up three backup tasks from within SQLyog, all to start running at 12:00 am and run at subsequent 4 hours until midnight. Each task will backup all tables from three different databases to a network attached storage.
Will there be any impact on the MySQL Server performance or is there any chance for a task to be missed?
Thank you for any input.
It's usually considered proper to space out the scheduled tasks, even if only by 1 minute.
Since I don't know whether your tasks can be consolidated or optimized, and I don't know how long they'll take to run, I'll recommend you space them out by an hour or so.
There is some performance impact from a backup, which is part of the reason they're usually done at night (and, of course, there are fewer transactions being run on the database since people usually aren't working), and three running at the same time ... Well, it's not something I would wish on my database or my users.
To answer the original question: the scheduler can run a lot of things at the same time :)