I have a set of independent SSIS packages say A,B,C. I'm running them manually and in parallel using DTEXEC command. Finishing time of these jobs can be random i.e, there is no certainty that only a particular job finishes last every time.
Now I want to send a notification mail when all the packages are completed. How can I accomplish this without modifying the packages? Also I may not be able to use task scheduler or SQL agent.
Related
I have written different ETL pipelines using apache beam in python in gcp dataflow vm. Now how can we schedule those if one is dependent on others using cloud function and scheduler/ or Airflow?
You can use cloud workflow to achieve this.
In the principle, here the flow to perform
Make a HTTP call to run your dataflow.
The answer provide you a job_id
make a loop
sleep 1 minute (for example)
get the job status with the job_id
if still running, continue. If not exit the loop
Go to the next ETL job.
You can use subworflow to mutualize the loop part to wait the end of the dataflow pipeline.
Let me know if you need more guidance to implement this.
I have an older machine, running Windows Server 2003 and SQLServer 2005. It is nearing its end of life, but until all the users have migrated to the new server, we need to keep an SSAS cube on that machine up to date.
The problem is the network connection is sometimes flaky. The cube on this server points to a SQL database on another server, and if the connection experiences a problem while the cube is being rebuilt, the process stops with an error message.
Since 99 out of a 100 times when there is a processing error it is caused by the network, I'm looking for a way to repeat the processing of the cube if it fails. The perfect solution should stop after x tries (just in case this is the 100th time and it's not a network issue).
I've looked at SQL Server Agent Jobs, and SSIS, and don't see a way to loop until it succeeds or reaches the x number of tries. Isn't there a command line option to run the cube processing, so that I can include it in a Powershell loop or something? Or is there an option is SQL or SSIS that I have missed?
We ran into a similar situation with our processes and were able to have the Agent jobs retry 3 times after a 10 minute wait period. Only if all 3 tries failed did our process alert us that "this is a real failure and not a transient issue."
In any SQL Agent job step type, the Advanced tab ought to look like this.
As you can see, you can specify your Retry attempts and an interval in minutes for it to retry.
You could use some sort of loop where you have three objects in your package :
SSAS Processing Task
Script component that increments a counter
Package execution task that calls back the current package if the counter is less than the number of tries for your processing task.
I have 1 upstream job and 2 parallel downstream jobs. When the upstream job succeeds, 2 downstream jobs will be triggered.
Currently, I send mail notice for every jobs separately. Not the receivers are complaining for to many mails.
I need to find out a way to gather the build result of those 3 jobs together and send 1 mail notice.
Use the parameterized trigger plugin as a build step (not as a post-build action). I believe it can wait for downstream projects to finish, examine their status, and set the current project's status accordingly.
I have an exe configured under windows scheduler to perform timely operations on a set of data.
The exe calls stored procs to retrieve data and perform some calcualtions and updates the data back to a different database.
I would like to know, what are the pros and cons of using SSIS package over scheduled exe.
Do you mean what are the pros and cons of using SQL Server Agent Jobs for scheduling running SSIS packages and command shell executions? I don't really know the pros about windows scheduler, so I'll stick to listing the pros of SQL Server Agent Jobs.
If you are already using SQL Server Agent Jobs on your server, then running SSIS packages from the agent consolidates the places that you need to monitor to one location.
SQL Server Agent Jobs have built in logging and notification features. I don't know how Windows Scheduler performs in this area.
SQL Server Agent Jobs can run more than just SSIS packages. So you may want to run a T-SQL command as step 1, retry if it fails, eventually move to step 2 if step 1 succeeds, or stop the job and send an error if the step 1 condition is never met. This is really useful for ETL processes where you are trying to monitor another server for some condition before running your ETL.
SQL Server Agent Jobs are easy to report on since their data is stored in the msdb database. We have regualrly scheduled subscriptions for SSRS reports that provide us with data about our jobs. This means I can get an email each morning before I come into the office that tells me if everything is going well or if there are any problems that need to be tackled ASAP.
SQL Server Agent Jobs are used by SSRS subscriptions for scheduling purposes. I commonly need to start SSRS reports by calling their job schedules, so I already have to work with SQL Server Agent Jobs.
SQL Server Agent Jobs can be chained together. A common scenario for my ETL is to have several jobs run on a schedule in the morning. Once all the jobs succeed, another job is called that triggers several SQL Server Agent Jobs. Some jobs run in parallel and some run serially.
SQL Server Agent Jobs are easy to script out and load into our source control system. This allows us to roll back to earlier versions of jobs if necessary. We've done this on a few occassions, particularly when someone deleted a job by accident.
On one ocassion we found a situation where Windows Scheduler was able to do something we couldn't do with a SQL Server Agent Job. During the early days after a SAN migration we had some scripts for snapshotting and cloning drives that didn't work in a SQL Server Agent Job. So we used a Windows Scheduler task to run the code for a while. After about a month, we figured out what we were missing and were able to move the step back to the SQL Server Agent Job.
Regarding SSIS over exe stored procedure calls.
If all you are doing is running stored procedures, then SSIS may not add much for you. Both approaches work, so it really comes down to the differences between what you get from a .exe approach and SSIS as well as how many stored procedures that are being called.
I prefer SSIS because we do so much on my team where we have to download data from other servers, import/export files, or do some crazy https posts. If we only had to run one set of processes and they were all stored procedure calls, then SSIS may have been overkill. For my environment, SSIS is the best tool for moving data because we move all kinds of types of data to and from the server. If you ever expect to move beyond running stored procedures, then it may make sense to adopt SSIS now.
If you are just running a few stored procedures, then you could get away with doing this from the SQL Server Agent Job without SSIS. You can even parallelize jobs by making a master job start several jobs via msdb.dbo.sp_start_job 'Job Name'.
If you want to parallelize a lot of stored procedure calls, then SSIS will probably beat out chaining SQL Server Agent Job calls. Although chaining is possible in code, there's no visual surface and it is harder to understand complex chaining scenarios that are easy to implement in SSIS with sequence containers and precedence constraints.
From a code maintainability perspective, SSIS beats out any exe solution for my team since everyone on my team can understand SSIS and few of us can actually code outside of SSIS. If you are planning to transfer this to someone down the line, then you need to determine what is more maintainable for your environment. If you are building in an environment where your future replacement will be a .NET programmer and not a SQL DBA or Business Intelligence specialist, then SSIS may not be the appropriate code-base to pass on to a future programmer.
SSIS gives you out of the box logging. Although you can certainly implement logging in code, you probably need to wrap everything in try-catch blocks and figure out some strategy for centralizing logging between executables. With SSIS, you can centralize logging to a SQL Server table, log files in some centralized folder, or use another log provider. Personally, I always log to the database and I have SSRS reports setup to help make sense of the data. We usually troubleshoot individual job failures based on the SQL Server Agent Job history step details. Logging from SSIS is more about understanding long-term failure patterns or monitoring warnings that don't result in failures like removing data flow columns that are unused (early indicator for us of changes in the underlying source data structure) or performance metrics (although stored procedures also have a separate form of logging in our systems).
SSIS give you a visual design surface. I mentioned this before briefly, but it is a point worth expanding upon on its own. BIDS is a decent design surface for understanding what's running in what order. You won't get this from writing do-while loops in code. Maybe you have some form of a visualizer that I've never used, but my experience with coding stored procedure calls always happened in a text editor, not in a visual design layer. SSIS makes it relatively easy to understand precedence and order of operations in the control flow which is where you would be working if you are using execute sql tasks.
The deployment story for SSIS is pretty decent. We use BIDS Helper (a free add-in for BIDS), so deploying changes to packages is a right click away on the Solution Explorer. We only have to deploy one package at a time. If you are writing a master executable that runs all the ETL, then you probably have to compile the code and deploy it when none of the ETL is running. SSIS packages are modular code containers, so if you have 50 packages on your server and you make a change in one package, then you only have to deploy the one changed package. If you setup your executable to run code from configuration files and don't have to recompile the whole application, then this may not be a major win.
Testing changes to an individual package is probably generally easier than testing changes in an application. Meaning, if you change one ETL process in one part of your code, you may have to regression test (or unit test) your entire application. If you change one SSIS package, you can generally test it by running it in BIDS and then deploying it when you are comfortable with the changes.
If you have to deploy all your changes through a release process and there are pre-release testing processes that you must pass, then an executable approach may be easier. I've never found an effective way to automatically unit test a SSIS package. I know there are frameworks and test harnesses for doing this, but I don't have any experience with them so I can't speak for the efficacy or ease of use. In all of my work with SSIS, I've always pushed the changes to our production server within minutes or seconds of writing the changes.
Let me know if you need me to elaborate on any points. Good luck!
If you have dependency on Windows features, like logging, eventing, access to windows resources- go windows scheduler/windows services route. If it is just db to db movement or if you need some kind of heavy db function usage- go SSIS route.
Our shop relies heavily on SSIS to run our back end processes and database tasks. Overall, we have hundreds of jobs, and for the most part, everything runs efficiently and smoothly.
Most of the time, we have a job failure due to an external dependency failing (data not available, files not delivered, etc). Right now, our process is set up to email us every time a job fails. SSIS will generate an email sending us the name of the job and the step it failed on.
I'm looking at creating a dashboard of sorts to monitor this process more efficiently. I know that the same information available in the Job History window from SSIS is also available by querying the msdb database. I want to set up a central location to report failures (probably using SQL Reporting Services), and also a more intelligent email alert system.
Has anyone else dealt with this issue? If so, what kind of processes/reporting did you create around the SSIS procedures to streamline notification of job failures or alerts?
We have a similar setup at our company. We primarily rely on letting the jobs notify us when there is a problem and we have employees who check job statuses at specific times to ensure that everything is working properly and nothing was overlooked.
My team receives a SQL Server Agent Job Activity Report HTML email every morning at 6am and 4pm that lists all failed jobs at the top, running jobs below that, and all other jobs below that grouped into daily, weekly, monthly, quarterly, and other categories. We essentially monitor SQL Server Agent jobs, not the SSIS packages themselves. We rely on job categories and job schedule naming conventions to automate grouping in the report.
We have a similar setup for monitoring our SSRS subscriptions. However, we only monitor this one once a day since most of our subscriptions are triggered around 3am-4am in the morning. The SSRS Subscription Activity Report goes one step further than the SQL Server Agent Job Activity Report in that it has links to the subscription screen for the report and has more exception handling built into it.
Aside from relying on reports, we also have a few jobs that are set to notify the operator via email upon job completion instead of upon job failure. This makes it easy to quickly check if all the major ETL processes have run successfully. It's sort of an early indicator of the health of the system. If we haven't received this email by the time the first team member gets into the office, then we know something is wrong. We also have a series of jobs that will fail with a job error if certain data sources haven't been loaded by a specific time. Before I had someone working an early shift, I use to check my iPhone for the email anytime I woke up in the middle of the night (which happened a lot back then since I had a newborn baby). On the rare occasion that I didn't receive an email indicating everything completed or I received an error regarding a job step, then I logged onto my machine via remote desktop to check the status of the jobs.
I considered having our data center guys check on the status of the servers by running a report each morning around 4am, but in the end I determined this wouldn't be necessary since we have a person who starts work at 6am. The main concern I had about implementing this process is that our ETL changes over time and it would have been necessary for me to maintain documentation on how to check the jobs properly and how to escalate the notifications to my team when a problem was detected. I would be willing to do this if the processes had to run in the middle of the night. However, our ETL runs every hour of the day, so if we had to kick off all our major ETL processes in the early morning, we would still complete loading our data warehouse and publishing reports before anyone made it into the office. Also, our office starts REALLY late for some reason, so people don't normally run our reports interactively until 9am onward.
If you're not looking to do a total custom build out, you can use https://cronitor.io to monitor etl jobs.
Current SSRS Job monitoring Process:
There are no SSRS job monitoring process. If any SSRS job is failed, user creates the incident, then TOPS Reporting and SSRS developer team are started to work on basis of incident. As a result, this process has taken huge turnaround time to resolve this issue.
Proposed SSRS Job monitoring Process:
SSRS subscriptions monitoring job will help to TOPS Reporting and SSRS developer for proactively monitoring the SSRS job. This job will create the report to display the failed list of report along with generic execution log status and subscription errors log status. Initially, developer can understand report failure reason from this report and then developer can start working proactively to resolve the issue.