SSIS Job Monitoring and Reporting - ssis

Our shop relies heavily on SSIS to run our back end processes and database tasks. Overall, we have hundreds of jobs, and for the most part, everything runs efficiently and smoothly.
Most of the time, we have a job failure due to an external dependency failing (data not available, files not delivered, etc). Right now, our process is set up to email us every time a job fails. SSIS will generate an email sending us the name of the job and the step it failed on.
I'm looking at creating a dashboard of sorts to monitor this process more efficiently. I know that the same information available in the Job History window from SSIS is also available by querying the msdb database. I want to set up a central location to report failures (probably using SQL Reporting Services), and also a more intelligent email alert system.
Has anyone else dealt with this issue? If so, what kind of processes/reporting did you create around the SSIS procedures to streamline notification of job failures or alerts?

We have a similar setup at our company. We primarily rely on letting the jobs notify us when there is a problem and we have employees who check job statuses at specific times to ensure that everything is working properly and nothing was overlooked.
My team receives a SQL Server Agent Job Activity Report HTML email every morning at 6am and 4pm that lists all failed jobs at the top, running jobs below that, and all other jobs below that grouped into daily, weekly, monthly, quarterly, and other categories. We essentially monitor SQL Server Agent jobs, not the SSIS packages themselves. We rely on job categories and job schedule naming conventions to automate grouping in the report.
We have a similar setup for monitoring our SSRS subscriptions. However, we only monitor this one once a day since most of our subscriptions are triggered around 3am-4am in the morning. The SSRS Subscription Activity Report goes one step further than the SQL Server Agent Job Activity Report in that it has links to the subscription screen for the report and has more exception handling built into it.
Aside from relying on reports, we also have a few jobs that are set to notify the operator via email upon job completion instead of upon job failure. This makes it easy to quickly check if all the major ETL processes have run successfully. It's sort of an early indicator of the health of the system. If we haven't received this email by the time the first team member gets into the office, then we know something is wrong. We also have a series of jobs that will fail with a job error if certain data sources haven't been loaded by a specific time. Before I had someone working an early shift, I use to check my iPhone for the email anytime I woke up in the middle of the night (which happened a lot back then since I had a newborn baby). On the rare occasion that I didn't receive an email indicating everything completed or I received an error regarding a job step, then I logged onto my machine via remote desktop to check the status of the jobs.
I considered having our data center guys check on the status of the servers by running a report each morning around 4am, but in the end I determined this wouldn't be necessary since we have a person who starts work at 6am. The main concern I had about implementing this process is that our ETL changes over time and it would have been necessary for me to maintain documentation on how to check the jobs properly and how to escalate the notifications to my team when a problem was detected. I would be willing to do this if the processes had to run in the middle of the night. However, our ETL runs every hour of the day, so if we had to kick off all our major ETL processes in the early morning, we would still complete loading our data warehouse and publishing reports before anyone made it into the office. Also, our office starts REALLY late for some reason, so people don't normally run our reports interactively until 9am onward.

If you're not looking to do a total custom build out, you can use https://cronitor.io to monitor etl jobs.

Current SSRS Job monitoring Process:
There are no SSRS job monitoring process. If any SSRS job is failed, user creates the incident, then TOPS Reporting and SSRS developer team are started to work on basis of incident. As a result, this process has taken huge turnaround time to resolve this issue.
Proposed SSRS Job monitoring Process:
SSRS subscriptions monitoring job will help to TOPS Reporting and SSRS developer for proactively monitoring the SSRS job. This job will create the report to display the failed list of report along with generic execution log status and subscription errors log status. Initially, developer can understand report failure reason from this report and then developer can start working proactively to resolve the issue.

Related

Why does SSRS need to recycle the application domain

I'm working with MS Reporting Services 2016. I noticed that the application domain is set by default to recycle every 12 hours. Now the impact on users after a recycle is either slow response from reporting services or a failed report. Both disappear after a refresh of the report, but this is not ideal.
I have come across a SO answer where people suggest that you can turn off the scheduled recycle by setting the configuration attribute RecycleTime to zero.
I have also read that writing a script to manually restart reporting services, which also recycles the app domain. Then a script that simply loads a report at a controlled time to remove the first time load issues. However this all seems like a work around to me and I would rather not have to do this.
My concern is that there must be a logical reason for having the scheduled recycle time, but I cannot find any information explaining this. Does anyone know if there is a negative impact from turning off the scheduled application domain recycle?
The RecycleTime is a function aimed at making sure SSRS isn't consuming RAM it doesn't need and potentially starving the rest of the machine. Disabling the refresh essentially removes the ability to claw back any memory used for a brief period of intensive processing.
If you are confident your machine is suitably resourced you can turn the refresh off or, if not, alternatively schedule the refresh for an out of hours time and define a Cache Refresh Plan to cache any super important reports immediately afterwards to minimise any user impact.
Further reading here: https://www.mssqltips.com/sqlservertip/2735/prevent-sql-server-reporting-services-slow-startup/
I guess I'm possibly over simplifying this, but SSRS was designed to recycle every 12 hours (default) for a reason. If it ain't broke, don't fix it. In my case, I wanted to control when the recycle occurred. I execute a 1 line powershell script from a SQL Agent job at 6:50 am, then generate a subscription report at 7 am, which kick starts SSRS and the users do not see any performance degradation.
restart-service 'ReportServer'
Leaving the SSRS config file setting at 720 minutes lets the recycle occur again at 6:50 pm. Subscription reports generate throughout the night, so if a human gets on SSRS after hours there should be no performance issue because the system is already running.
Are we possibly overthinking it?

Nightly data integration using NServiceBus

In the investment firm I work for, we have several daily data integration tasks that every night/early morning.
Most if not all are done using SSIS, and they are all scheduled to start at certain times. They work (as in the SSIS packages do their job).
However, we have serious problems when it comes to tackling with dependent processes. For example, if SSIS package is scheduled to run at 3AM does not get the FTP file from a vendor, then dependent processes that are set to run at 3:30AM and 4AM all fail.
Currently we have about 8 different inhouse applications/endpoints, with data coming from around 6 external vendors and the vendor list is beginning to grow and we expect it would around 20 or so in a year.
We don't want to go BizTalk route for financial/complexity/implementation difficulties.
I wanted to create a event-driven approach to this, where the SSIS/nightly processes are oblivious to each other, as they are, yet need subscribe to dependent parent processes to start/stop.
At a glance I feel NServiceBus would give us that flexibility. It lets us keep what we already have and work, but yet give an event driven mechanism.
I need some input here.

SSRS Data Driven Subscription Multiple reports running concurrently

We have a 2010 BI Sharepoint (SSRS 2012) site that has links to several databases:
Database A will be available #12:00am
Database B will be available #1:00am
Database B will be available #2:00am
So I have a shared schedule setup for each database for the above times. How many reports should I have running in each shared schedule? For now I only have 10 sample reports and they all kicked off and ran within the same second (maybe some kicked off a couple seconds later). But I infer from that they don't run in order, rather asynchronously.
So, what is the limit and will it kill my server's performance if I have 100's of reports running. Or should I make a schedule and limit each schedule to run about 30-40 reports each?
I can not infer from your question if you are creating a snapshot or an email service...in any case there are tools that you can use to fine tune scheduling outside of ssrs. In ssrs I would recommend that you stager your report requests, if is makes sense for huge reports. In any case, You should stager your schedules to some degree.

SSRS Subscriptions Duplicating email

Had an interesting error today and couldn't find anything online about it so wondered if any of you guys had seen this behavior before.
We had an out of memory error and the CPU usage was spiking this morning on our reports server, a clean reboot seemed to rectify the issue, however since then all the email subscriptions have been sending multiple times. What do i mean by this, the subscription as far as SSRS is concerned ran once at its normal time (10am), this has been proven by scrutinizing the logs to see if another execution occurred (it didn't) and by renaming the SPROC that the report references to ensure that it would fail, yet it didn't and the mail resent. I then checked the Exchange queues and turned on logging for the connection and i could see a new mail being resubmitted every 30mins to the exchange mail queue.
The question is, what process is causing that mail to be resubmitted to the exchange server and how, other than another reboot do we stop the emails resending.
Thanks in advance
-- Further --
Having done more digging we have noticed that the [ReportServer].[dbo].Notifications table is populated with all of the reports that are sending out multiple times with the Attempts column incrementing every time the duplicate email is sent out.
We still dont know why these are resending
It seems to be down to the logging level... If you switch the Report Server Service logging level down to level 2 (Exceptions, restarts and warnings) this error seems to manifest itself however when the logging level is switched back up to 3 or above the error seems to disappear. Some similar behavior is noticed here: http://social.msdn.microsoft.com/Forums/en-NZ/sqlreportingservices/thread/b78bb6e2-0810-4afd-ba6b-8b09a243f349
Check the SQL Agent jobs (named with GUIDs) for the subscriptions. Maybe the schedule on those go messed up somehow.

Advantage of SSIS package over windows scheduled exe

I have an exe configured under windows scheduler to perform timely operations on a set of data.
The exe calls stored procs to retrieve data and perform some calcualtions and updates the data back to a different database.
I would like to know, what are the pros and cons of using SSIS package over scheduled exe.
Do you mean what are the pros and cons of using SQL Server Agent Jobs for scheduling running SSIS packages and command shell executions? I don't really know the pros about windows scheduler, so I'll stick to listing the pros of SQL Server Agent Jobs.
If you are already using SQL Server Agent Jobs on your server, then running SSIS packages from the agent consolidates the places that you need to monitor to one location.
SQL Server Agent Jobs have built in logging and notification features. I don't know how Windows Scheduler performs in this area.
SQL Server Agent Jobs can run more than just SSIS packages. So you may want to run a T-SQL command as step 1, retry if it fails, eventually move to step 2 if step 1 succeeds, or stop the job and send an error if the step 1 condition is never met. This is really useful for ETL processes where you are trying to monitor another server for some condition before running your ETL.
SQL Server Agent Jobs are easy to report on since their data is stored in the msdb database. We have regualrly scheduled subscriptions for SSRS reports that provide us with data about our jobs. This means I can get an email each morning before I come into the office that tells me if everything is going well or if there are any problems that need to be tackled ASAP.
SQL Server Agent Jobs are used by SSRS subscriptions for scheduling purposes. I commonly need to start SSRS reports by calling their job schedules, so I already have to work with SQL Server Agent Jobs.
SQL Server Agent Jobs can be chained together. A common scenario for my ETL is to have several jobs run on a schedule in the morning. Once all the jobs succeed, another job is called that triggers several SQL Server Agent Jobs. Some jobs run in parallel and some run serially.
SQL Server Agent Jobs are easy to script out and load into our source control system. This allows us to roll back to earlier versions of jobs if necessary. We've done this on a few occassions, particularly when someone deleted a job by accident.
On one ocassion we found a situation where Windows Scheduler was able to do something we couldn't do with a SQL Server Agent Job. During the early days after a SAN migration we had some scripts for snapshotting and cloning drives that didn't work in a SQL Server Agent Job. So we used a Windows Scheduler task to run the code for a while. After about a month, we figured out what we were missing and were able to move the step back to the SQL Server Agent Job.
Regarding SSIS over exe stored procedure calls.
If all you are doing is running stored procedures, then SSIS may not add much for you. Both approaches work, so it really comes down to the differences between what you get from a .exe approach and SSIS as well as how many stored procedures that are being called.
I prefer SSIS because we do so much on my team where we have to download data from other servers, import/export files, or do some crazy https posts. If we only had to run one set of processes and they were all stored procedure calls, then SSIS may have been overkill. For my environment, SSIS is the best tool for moving data because we move all kinds of types of data to and from the server. If you ever expect to move beyond running stored procedures, then it may make sense to adopt SSIS now.
If you are just running a few stored procedures, then you could get away with doing this from the SQL Server Agent Job without SSIS. You can even parallelize jobs by making a master job start several jobs via msdb.dbo.sp_start_job 'Job Name'.
If you want to parallelize a lot of stored procedure calls, then SSIS will probably beat out chaining SQL Server Agent Job calls. Although chaining is possible in code, there's no visual surface and it is harder to understand complex chaining scenarios that are easy to implement in SSIS with sequence containers and precedence constraints.
From a code maintainability perspective, SSIS beats out any exe solution for my team since everyone on my team can understand SSIS and few of us can actually code outside of SSIS. If you are planning to transfer this to someone down the line, then you need to determine what is more maintainable for your environment. If you are building in an environment where your future replacement will be a .NET programmer and not a SQL DBA or Business Intelligence specialist, then SSIS may not be the appropriate code-base to pass on to a future programmer.
SSIS gives you out of the box logging. Although you can certainly implement logging in code, you probably need to wrap everything in try-catch blocks and figure out some strategy for centralizing logging between executables. With SSIS, you can centralize logging to a SQL Server table, log files in some centralized folder, or use another log provider. Personally, I always log to the database and I have SSRS reports setup to help make sense of the data. We usually troubleshoot individual job failures based on the SQL Server Agent Job history step details. Logging from SSIS is more about understanding long-term failure patterns or monitoring warnings that don't result in failures like removing data flow columns that are unused (early indicator for us of changes in the underlying source data structure) or performance metrics (although stored procedures also have a separate form of logging in our systems).
SSIS give you a visual design surface. I mentioned this before briefly, but it is a point worth expanding upon on its own. BIDS is a decent design surface for understanding what's running in what order. You won't get this from writing do-while loops in code. Maybe you have some form of a visualizer that I've never used, but my experience with coding stored procedure calls always happened in a text editor, not in a visual design layer. SSIS makes it relatively easy to understand precedence and order of operations in the control flow which is where you would be working if you are using execute sql tasks.
The deployment story for SSIS is pretty decent. We use BIDS Helper (a free add-in for BIDS), so deploying changes to packages is a right click away on the Solution Explorer. We only have to deploy one package at a time. If you are writing a master executable that runs all the ETL, then you probably have to compile the code and deploy it when none of the ETL is running. SSIS packages are modular code containers, so if you have 50 packages on your server and you make a change in one package, then you only have to deploy the one changed package. If you setup your executable to run code from configuration files and don't have to recompile the whole application, then this may not be a major win.
Testing changes to an individual package is probably generally easier than testing changes in an application. Meaning, if you change one ETL process in one part of your code, you may have to regression test (or unit test) your entire application. If you change one SSIS package, you can generally test it by running it in BIDS and then deploying it when you are comfortable with the changes.
If you have to deploy all your changes through a release process and there are pre-release testing processes that you must pass, then an executable approach may be easier. I've never found an effective way to automatically unit test a SSIS package. I know there are frameworks and test harnesses for doing this, but I don't have any experience with them so I can't speak for the efficacy or ease of use. In all of my work with SSIS, I've always pushed the changes to our production server within minutes or seconds of writing the changes.
Let me know if you need me to elaborate on any points. Good luck!
If you have dependency on Windows features, like logging, eventing, access to windows resources- go windows scheduler/windows services route. If it is just db to db movement or if you need some kind of heavy db function usage- go SSIS route.