I have some a job in SSIS that has 5 steps. The way it currently works is that it will go through the steps in order waiting for the previous one to be complete before doing the next. However with this job, steps 1-4 could all run at the same time with impacting the results of each other. So I was curious if it was possible to have steps 1-4 all run at the same time and once all are complete then start step 5.
I am open to the idea of doing this in other ways such as maybe having several different jobs, using triggers or anything else that will get the end result.
The main goal here is to have step 5 start as soon as possible but step 5 can not start until all 4 steps are done.
All of these steps are merely running a stored procedure to update a table.
I am using SQL 2012. I am very new to SSIS.
This is what the Sequence Container tool is for.
You can put steps 1-4 in a Sequence Container and let them run in parallel in the container, and then have a Precedence Constraint from the container to Step 5.
In your package set MaxConcurrentExecutables to.. say.. 6 and make sure there are no precedence constraints between your tasks.
Then they should run in parallel.
See here for more detail. https://blogs.msdn.microsoft.com/sqlperf/2007/05/11/implement-parallel-execution-in-ssis.
I'm curious - did you try googling this?
Related
I would like to delay deletion of data from the database. I am using MySQL, nest.js. I heard that CRON is what I need. I want to delete the entry in a week. Can you help me with this? CRON is what I need, or i need to use something another?
A cron job (or at in Windows) or a MySQL EVENT can be created to periodically check for something and take action. The resolution is only 1 minute.
If you need a very precise resolution, another technique would be required. For example, if you don't want to show a user something that is more than 1 week old to the second, then simply exclude that from the SELECT. That is add something like this to the WHERE: AND created_date >= NOW() - INTERVAL 7 DAY.
Doing the above gives you the freedom to schedule the actual DELETE for only, say, once a day -- rather than pounding on the database only to usually find nothing to do.
If you do choose to "pound on the database", be aware of the following problem. If one instance of the deleter script is running for a long time (for any of a number of reasons), it might not be finished before the next copy comes along. In some situations these scripts can stumple over each other to the extent of effectively "crashing" the server.
That leads to another solution -- a single script that runs forever. It has a simple loop:
Do the actions needed (deleting old rows)
Sleep 1 -- or 10 or 60 or whatever -- this is to be a "nice guy" and not "pound on the system".
The only tricky part is making sure that starts up after any server restart or crash of the script.
You can configure a cronjob to periodically delete it.
There are several ways to configure a cron job.
You can write a shell script that periodically deletes entities in the db using linux crontab, or you can configure an application that provides cronjobs such as jenkins or airflow.
AWS lambda also provides cronjob.
Using crontab provided by nestjs seems to be the simplest to solve the problem.
See this link
https://docs.nestjs.com/techniques/task-scheduling
ETL package finished successfully without execution of several last tasks:
Then I've tried to run task with the same type and skip other:
After that I've created separate package with the last five tasks and they run great as expected!
Question:
What's happen with the flow on the first two figures? Why does package skip several tasks without any warnings/errors etc.?
Thanks a lot for answers and any ideas about this strange behavior!
[UPDATE] Answered by #Peter_R:
I've changed both sp_updatestats inputs from AND to OR and everything is ok. Arrows was changed to dotted ones:
The Logical AND constraint requires all tasks to complete before running so you SP_Updatestats will not run until both ProcessFull and MeasureGroupSet Loop have completed.
I am guessing after Deploy Data the Expression is designed to split the workflow depending on a condition you have set. In doing this you will never have both ProcessFull and MeasureGroupSet Loop running in parallel, meaning that the SP_UpdateStats task will never run.
If you change both the connecting constraints to the SP_UpdateStats to Logical OR it will run after either the ProcessFull OR MeasureGroupSet Loop has completed.
This is still the case if something is disabled as well, slightly odd but still the case.
I have a SSIS package that is processing a queue.
I currently have a singel package that is broken into 3 containers
1. gather some meta data
2. do the work
3. re-examine meta data, update the queue w/ what we think happened (success of flavor of failure )
I am not super happy with the speed, part of it is that I am running on a hamster powered server, but that is out of my control.
The middle piece may offer an opportunity for an improvement...
There are 20 tables that may need to be updated.
Each queue item will update 1 table.
I currently have a sequence that contains 20 sequence containers.
They all do essentially the same thing, but I couldnt figure out a way to abstract them.
The first box in each is an empty script action. There is a conditional flow to 'the guts' if there is a match on tablename.
So I open up 20 sequence tasks, 20 empty script tasks and do 20 T/F checks.
Watching the yellow/green light show, this seems to be slow.
Is there a more efficient way? The only way I can think to make it better is to have the 20 empty scripts outside the sequence containers. What that would save is opening the container. I cant believe that is all that expensive to open a sequence container. Does it possibly reverify every task in the container every time?
Just fishing, if anyone has any thoughts I would be very happy to hear them.
Thanks
Greg
Your main issue right now is that you are running this in BIDS. This is designed to make development and debugging of packages easy, so yes to your point it validates all of the objects as it runs. Plus, the "yellow/green light show" is more overhead to show you what is happening in the package as it runs. You will get much better performance when you run it with DTSExec or as part of a scheduled task from Sql server. Are you logging your packages? If so, run from the server and look at the logs to verify how long the process actually takes on the server. If it is still taking too long at that point, then you can implement some of #registered user 's ideas.
Are you running each of the tasks in parallel? If it has to cycle through all 60 objects serially, then your major room for improvement is running each of these in parallel. If you are trying to parallelize the processes, then you could do a few solutions:
Create all 60 objects, each chains of 3 objects. This is labor intensive to setup, but it is the easiest to troubleshoot and allows you to customize it when necessary. Obviously this does not abstract away anything!
Create a parent package and a child package. The child package would contain the structure of what you want to execute. The parent package contains 20 Execute Package tasks. This is similar to 1, but it offers the advantage that you only have one set of code to maintain for the 3-task sequence container. This likely means you will move to a table-driven metadata model. This works well in SSIS with the CozyRoc Data Flow Plus task if you are transferring data from one server to another. If you are doing everything on the same server, then you're really probably organizing stored procedure executions which would be easy to do with this model.
Create a package that uses the CozyRoc Parallel Task and Data Flow Plus. This can allow you to encapsulate all the logic in one package and execute all of them in parallel. WARNING I tried this approach in SQL Server 2008 R2 with great success. However, when SQL Server 2012 was released, the CozyRoc Parallel Task did not behave the way it did in previous versions for me due to some under the cover changes in SSIS. I logged this as a bug with CozyRoc, but as best as I know this issue has not been resolved (as of 4/1/2013). Also, this model may abstract away too much of the ETL and make initial loads and troubleshooting individual table loads in the future more difficult.
Personally, I use solution 1 since any of my team members can implement this code successfully. Metadata driven solutions are sexy, but much harder to code correctly.
May I suggest wrapping your 20 updates in a single stored procedure. Not knowing how variable your input data is, I don't know how suitable this is, but this is my first reaction.
well - here is what I did....
I added a dummy task at the 'top' of the parent sequence container. From that I added 20 flow links to each of the child sequence containers (CSC). Now each CSC gets opened only if necessary.
My throughput did increase by about 30% (26 rpm--> 34 rpm on minimal sampling).
I could go w/ either zmans answer or registeredUsers. Both were helpful. I choose zmans because the real answer always starts with looking at the log to see exactly how long something takes (green/yellow is not real reliable in my experience).
thanks
I have two different jobs.
These jobs could be started independently of each other.
But they work with the same object.
And I would like to have only one of these jobs running at the same time.
Is it possible to configure job in Hudson/Jenkins to do not run until some other jobs are running?
Have a look at the locks and latches plugin. If I understood your question correctly, it does exactly what you want.
Use Build triggers - Build after other projects are built.
Use this trigger option to achieve your task.
point the job to run on a specific environment which limit the job to run simultaneously. with this the second job will be in the queue rather than running together
How many tasks can task scheduler run at the same time?
I have set up three backup tasks from within SQLyog, all to start running at 12:00 am and run at subsequent 4 hours until midnight. Each task will backup all tables from three different databases to a network attached storage.
Will there be any impact on the MySQL Server performance or is there any chance for a task to be missed?
Thank you for any input.
It's usually considered proper to space out the scheduled tasks, even if only by 1 minute.
Since I don't know whether your tasks can be consolidated or optimized, and I don't know how long they'll take to run, I'll recommend you space them out by an hour or so.
There is some performance impact from a backup, which is part of the reason they're usually done at night (and, of course, there are fewer transactions being run on the database since people usually aren't working), and three running at the same time ... Well, it's not something I would wish on my database or my users.
To answer the original question: the scheduler can run a lot of things at the same time :)