I have a problem with multiple executions of the same SSIS package.
I would like to allow parallel executions each of which handles a subset of data.
So far, I am thinking of using some state variable, but I don't know where to store it.
One option is to use keep the connection open and use temp tables to coordinate the task load. However, temptables cause lots of compilation issues, and they are not maintable.
Are there any other ways to identify the current execution id of the package or scope of execution? I have found no state (either in memory or stored elsewhere) in SSIS so far which I can use to partition/isolate each execution.
So based on my comments above you can try this. I dont know it is quite what your looking for, but maybe it can give you a hint to get further.
I am calling the example with workflowid 1. This is what i mean you can change in your SQL Job agent steps, and then change the parameter on each step, so fx you could add 2 steps executing workflowid 1 and workflowid 4. Then it will only run that sequence container where the constraint is success.
Create a package variable
Create your package flow
Edit your SQL Task Get WorkflowID
Add Parametermapping to your package variable
Get the resultset into local variable called WorkflowIDrun
Make your precedence constraints so it only allows one id to pass through
Notice: You could add parentworkflowid's so that you can diverse your flow inside the sequence container if you need some of the same logic
End result when package is run with workflowid 1
Create a new SQL Job in your agent. Add the needed steps Notice; I Created two steps for workflowid 1 and 2. Truncate and delete
I then edit my step and correct the variable with the right value. This will be workflowid 1 for truncate and workflowid 2 for delete
This could of cause also be in another job you do it, that depends on your needs.
Related
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I'm really new to SSIS and I would like to automate some of my workflow. However, most of my tasks require that other jobs/packages in SQL Server Job Agent have previously been run for the day/week by other team members. My question is: how can I direct my workflow based on the status of these other packages? Control Flow doesn't have a Conditional Split tool, and I can't kick off a separate job/package from Data Flow. I can easily write a SQL statement to determine whether any or all of the related packages have been run, and even have it return a boolean value if needed, but I'm at a loss for how to make this direct the workflow to either proceed to the next step or fail the entire package if any the others haven't been run (desired result).
I really appreciate any help in this. Thank you!
In SSIS:
You can set conditions in the Control Flow by double-clicking on the arrow that links one task to another. If you set up an Execute SQL Task to run a query to check whether a package has run and then return a result into a variable, you can then set up a condition to check that variable, and only continue to the next task if it is (or isn't) a certain value.
Here's an example of how you might set up a precedence constraint so that the next step would only happen if the previous one was successful, and also a particular variable did not have a value of C:
In SQL Server Agent:
You could also - if you want to avoid running an entire package - set something up in the SQL Server Agent job instead. You could add an initial step which checks on the status of a package, then quits the job reporting success. This does mean that if the step fails for another reason (say the database it's trying to run the select statement on is down), it will still quit and report success, so be careful - you might want to set up some other specific steps first which check for such situations.
Here's an article I used a while back when I wanted to understand how to set up a SQL Server Agent job in this way:
http://sqlactions.com/2012/08/05/how-to-create-custom-schedule-for-sql-server-agent-job/
I would like to make a package that would copy data from a table only if table is not empty. I know how to do count and how to make a package for copying data but problem is that Source can't have any inputs so I don't know how to do it. Any suggestions?
I don't understand your comment about dragging a "green line from a package to a source" but instead of trying to determine in advance if the table is empty, just do your copy anyway and then see how many rows were copied:
Create a package variable for the rowcount
Populate the variable using the rowcount transformation
Use an expression in the precedence constraint to check the variable: if it's greater than zero then continue executing the rest of your package
#Pondlife I don't think you can use precedence constraint on the data flow task, can you?
I believe you can use it only on the control flow.
I would add a "Execute SQL Task" with the count, sending the result to a variable and from this task, I would drag the green arrow to the Data Flow task that makes the copy and on this arrow I would add the expression on the precedence constraint.
As you have correctly noted, a data flow source does not accept input so one cannot perform logic in the dataflow to determine whether this task should run.
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
However, there's nothing stopping you from setting up this logic in your control flow. I would use a query that hits the DMVs for a fast rowcount on the destination system, filtered to only the tables I wished to replicate.
Armed with the list of empty tables, it'd probably depend how I'd handle it. For a small number of tables, I'd define N dataflows all with a do nothing script task as a precedent and then use an expression on table name to enable a path, much like I did on this question.
If there are many tables, I'd define a package per table and then invoke execute package task with the package name built dynamically based on the empty table name.
I am in process of creating an ssis package that need to do following in specified order:
process some data
move that data to some other tables
Get some data and push it in a plain text file.
I have created 3 store procedure for these, I have 2 "Execute SQL tasks" for 1 and 2 and a "Data Flow task" for 3rd.
Now when i run the package i can see all 3 step are completed (no errors) but they are not running in correct order.
I see step 3 is run first then step 1 and 2, i think then step 3 runs again. Normally i can ignore it but as the data in the text file can be 700 mb, i need to find a way to get SSIS to run these task in sequence.
I have tried "Sequence Container" but no luck.
Can some one help me with this please?
KA
You need to use precedence constraints to tell SSIS what order your tasks need to be executed in.
Drag the green arrow from task one to task two, and from task two to task three.
You could connect as
first SQL execute task
precedence constraint on success
second SQL execute task
precedence constraint on success
data flow
SSIS will follow the sequence as we required.
thanks
prav
I had exactly this problem. Tasks were being executed in something like the order I'd created them rather than the sequence I specified later. It turned out that I'd managed to get a task that belonged to the first sequence container to appear in the last sequence container without loosing it's allegiance to the first. I discovered this by taking a backup and deleting sequence containers - the rogue task disappeared when I deleted the first sequence container.
The fix was to cut and past the task into the desired sequence container.
I encounterd an issue on SQL Server Denali when individual components were running out of sequence even though they were joined by success constraints. The problem seemed to occur when I had cut and pasted the components and the constraint. By deleting and reapplying the constraints, the package then ran in the correct order.
In my case, if I want to decide execute order in sequence containers, I will use [sub sequence containers] between execute sql task and data flow task. Hope useful for you.
The best is to use Sequence Containers... basically they help in creating a Sequence.
But since it does not work in your case, create Child Packages for all your different process
and then create the Master Package which will have a link to those child packages, USE "Execute Package task"
Unfortunately I don't have a repro for my issue, but I thought I would try to describe it in case it sounds familiar to someone... I am using SSIS 2005, SP2.
My package has a package-scope user variable - let's call it user_var
first step in the control flow is an Execute SQL task which runs a stored procedure. All that SP does is insert a record in a SQL table (with an identity column) and then go back and get the max ID value. The Execute SQL task saves this output into user_var
the control flow then has a Data Flow Task - it goes and gets some source data, has a derived column which sets a column called run_id to user_var - and saves the data to a SQL destination
In most cases (this template is used for many packages, running every day) this all works great. All of the destination records created get set with a correct run_id.
However, in some cases, there is a set of the destination data that does not get run_id equal to user_var, but instead gets a value of 0 (0 is the default value for user_var).
I have 2 instances where this has happened, but I can't make it happen. In both cases, it was just less that 10,000 records that have run_id = 0. Since SSIS writes data out in 10,000 record blocks, this really makes me think that, for the first set of data written out, user_var was not yet set. Then, after that first block, for the rest of the data, run_id is set to a correct value.
But control passed on to my data flow from the Execute SQL task - it would have seemed reasonable to me that it wouldn't go on until the SP has completed and user_var is set. Maybe it just runs the SP, but doesn't wait for it to complete?
In both cases where this has happened there seemed to be a few packages hitting the table to get a new user_var at about the same time. And in both cases lots of data was written (40 million rows, 60 million rows) - my thinking is that that means the writes were happening for a while.
Sorry to be both long-winded AND vague. A winning combination! Does this sound familiar to anyone? Thanks.
Updating to show the SP I use to get the user_var:
CREATE PROCEDURE [dbo].[sp_GetRunIDForPackage] (#pkg varchar(50)) AS
-- add a new entry for this run of this package - the RUN_ID is an IDENTITY column and so
-- will get created for us
INSERT INTO shared.STAGE_LOAD_JOB( EFFECTIVE_TS, EXECUTED_BY )
VALUES( getdate(), #pkg )
-- now go back into the table and get the new RUN_ID for this package
SELECT MAX( RUN_ID )
FROM shared.STAGE_LOAD_JOB
WHERE EXECUTED_BY = #pkg
Is this variable being accessed lots of times, from lots of places? Do you have a bunch of parallel data flows using the same variable?
We've encountered a bug in both SQL 2005 and 2008 whereby a "race condition" causes the variable to be inaccessable from some threads, and the default value is used. In our case, the variable was our "base folder" location for packages, causing our overall execution control package to not find its sub-packages.
More detail here: SSIS Intermittent variable error: The system cannot find the file specified
Unfortunately, the work-around is to hard-code a default value into the variable that will work when the race condition happens. Easy for us (set base folder to be correct for our prod environment), but looks a lot hard for your issue.
Perhaps you could use multiple variables (one for each data flow), and a bunch of Execute SQL tasks to populate those variables? REALLY ugly, but it should help.
Did you check the value of user_var before getting to the Derived Column Component? It sounds like user_var may be 0 so you are doing run_id = user_var; run_id = 0. I may be naive to think it is that simple but that's the first thing I would check.
Given the procedure code, you might want to replace this:
SELECT MAX( RUN_ID )
FROM shared.STAGE_LOAD_JOB
WHERE EXECUTED_BY = #pkg
with this:
select scope_identity()
The scope_identity() function returns the identity that was entered in the current scope, which is the procedure. Not sure if this will solve the problem, but I find it best to work through them all as they might have unrelated consequences.