I am in process of creating an ssis package that need to do following in specified order:
process some data
move that data to some other tables
Get some data and push it in a plain text file.
I have created 3 store procedure for these, I have 2 "Execute SQL tasks" for 1 and 2 and a "Data Flow task" for 3rd.
Now when i run the package i can see all 3 step are completed (no errors) but they are not running in correct order.
I see step 3 is run first then step 1 and 2, i think then step 3 runs again. Normally i can ignore it but as the data in the text file can be 700 mb, i need to find a way to get SSIS to run these task in sequence.
I have tried "Sequence Container" but no luck.
Can some one help me with this please?
KA
You need to use precedence constraints to tell SSIS what order your tasks need to be executed in.
Drag the green arrow from task one to task two, and from task two to task three.
You could connect as
first SQL execute task
precedence constraint on success
second SQL execute task
precedence constraint on success
data flow
SSIS will follow the sequence as we required.
thanks
prav
I had exactly this problem. Tasks were being executed in something like the order I'd created them rather than the sequence I specified later. It turned out that I'd managed to get a task that belonged to the first sequence container to appear in the last sequence container without loosing it's allegiance to the first. I discovered this by taking a backup and deleting sequence containers - the rogue task disappeared when I deleted the first sequence container.
The fix was to cut and past the task into the desired sequence container.
I encounterd an issue on SQL Server Denali when individual components were running out of sequence even though they were joined by success constraints. The problem seemed to occur when I had cut and pasted the components and the constraint. By deleting and reapplying the constraints, the package then ran in the correct order.
In my case, if I want to decide execute order in sequence containers, I will use [sub sequence containers] between execute sql task and data flow task. Hope useful for you.
The best is to use Sequence Containers... basically they help in creating a Sequence.
But since it does not work in your case, create Child Packages for all your different process
and then create the Master Package which will have a link to those child packages, USE "Execute Package task"
Related
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I have a problem with multiple executions of the same SSIS package.
I would like to allow parallel executions each of which handles a subset of data.
So far, I am thinking of using some state variable, but I don't know where to store it.
One option is to use keep the connection open and use temp tables to coordinate the task load. However, temptables cause lots of compilation issues, and they are not maintable.
Are there any other ways to identify the current execution id of the package or scope of execution? I have found no state (either in memory or stored elsewhere) in SSIS so far which I can use to partition/isolate each execution.
So based on my comments above you can try this. I dont know it is quite what your looking for, but maybe it can give you a hint to get further.
I am calling the example with workflowid 1. This is what i mean you can change in your SQL Job agent steps, and then change the parameter on each step, so fx you could add 2 steps executing workflowid 1 and workflowid 4. Then it will only run that sequence container where the constraint is success.
Create a package variable
Create your package flow
Edit your SQL Task Get WorkflowID
Add Parametermapping to your package variable
Get the resultset into local variable called WorkflowIDrun
Make your precedence constraints so it only allows one id to pass through
Notice: You could add parentworkflowid's so that you can diverse your flow inside the sequence container if you need some of the same logic
End result when package is run with workflowid 1
Create a new SQL Job in your agent. Add the needed steps Notice; I Created two steps for workflowid 1 and 2. Truncate and delete
I then edit my step and correct the variable with the right value. This will be workflowid 1 for truncate and workflowid 2 for delete
This could of cause also be in another job you do it, that depends on your needs.
I'm really new to SSIS and I would like to automate some of my workflow. However, most of my tasks require that other jobs/packages in SQL Server Job Agent have previously been run for the day/week by other team members. My question is: how can I direct my workflow based on the status of these other packages? Control Flow doesn't have a Conditional Split tool, and I can't kick off a separate job/package from Data Flow. I can easily write a SQL statement to determine whether any or all of the related packages have been run, and even have it return a boolean value if needed, but I'm at a loss for how to make this direct the workflow to either proceed to the next step or fail the entire package if any the others haven't been run (desired result).
I really appreciate any help in this. Thank you!
In SSIS:
You can set conditions in the Control Flow by double-clicking on the arrow that links one task to another. If you set up an Execute SQL Task to run a query to check whether a package has run and then return a result into a variable, you can then set up a condition to check that variable, and only continue to the next task if it is (or isn't) a certain value.
Here's an example of how you might set up a precedence constraint so that the next step would only happen if the previous one was successful, and also a particular variable did not have a value of C:
In SQL Server Agent:
You could also - if you want to avoid running an entire package - set something up in the SQL Server Agent job instead. You could add an initial step which checks on the status of a package, then quits the job reporting success. This does mean that if the step fails for another reason (say the database it's trying to run the select statement on is down), it will still quit and report success, so be careful - you might want to set up some other specific steps first which check for such situations.
Here's an article I used a while back when I wanted to understand how to set up a SQL Server Agent job in this way:
http://sqlactions.com/2012/08/05/how-to-create-custom-schedule-for-sql-server-agent-job/
I have 2 questions about ssis buffer.
1) In my ssis package i am using 1 data flow task contain 2 oledb source,2 oledb destination.Both are independent.All setting are keeping default,We know the DefaultBufferMaxRows is 10000 rows.So,if i run this package,whether the number of record per buffer is 10000 per each or 5000 per each?
2) I have tried to use ssis logging(xml and database),but i dont know,it is not showing any thing.It is creating xml file,but there is no usefull information(other that some xml tags).also it is not creating any table.
logging event window also not displaying any thing.Can u please help me...
Addressing the second question as I'd need to do research on the first.
I find logging to SQL Server to be my preferred destination. The other options (file, event viewer, trace file, etc) are fine and dandy but to sieve through that data, a query is my tool.
You state it isn't showing anything useful, what did you expect it to show and what have you selected?
I generally select the following events: OnInformation, OnError, OnWarning, OnPreExecute, OnPostExecute. The first three provide information about what's wrong, potentially wrong or could be improved with my packages. The last two I use to establish duration of various tasks.
Check them only at the top level. I had a coworker that had checked the above events at the package level but on each sub task they checked one event. They had anticipated it would inherit the logging established at the root and add in the single event selected at that level. The reverse was true: only the innermost checked items were logged.
Where does everything get logged? In 2005, it'll be found in dbo.sysdtslog90. For 2008 forward, it can be found in dbo.sysssislog The master copy of this table exists in msdb but if you point your OLEDB connection to a different catalog (SALES, Adventureworks, etc) the first invocation of the package will result in that table being copied down into the target catalog.
I would like to make a package that would copy data from a table only if table is not empty. I know how to do count and how to make a package for copying data but problem is that Source can't have any inputs so I don't know how to do it. Any suggestions?
I don't understand your comment about dragging a "green line from a package to a source" but instead of trying to determine in advance if the table is empty, just do your copy anyway and then see how many rows were copied:
Create a package variable for the rowcount
Populate the variable using the rowcount transformation
Use an expression in the precedence constraint to check the variable: if it's greater than zero then continue executing the rest of your package
#Pondlife I don't think you can use precedence constraint on the data flow task, can you?
I believe you can use it only on the control flow.
I would add a "Execute SQL Task" with the count, sending the result to a variable and from this task, I would drag the green arrow to the Data Flow task that makes the copy and on this arrow I would add the expression on the precedence constraint.
As you have correctly noted, a data flow source does not accept input so one cannot perform logic in the dataflow to determine whether this task should run.
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
However, there's nothing stopping you from setting up this logic in your control flow. I would use a query that hits the DMVs for a fast rowcount on the destination system, filtered to only the tables I wished to replicate.
Armed with the list of empty tables, it'd probably depend how I'd handle it. For a small number of tables, I'd define N dataflows all with a do nothing script task as a precedent and then use an expression on table name to enable a path, much like I did on this question.
If there are many tables, I'd define a package per table and then invoke execute package task with the package name built dynamically based on the empty table name.