Regarding executing data flow tasks in a specific order in SSIS - csv

I created 4 Data Flow Tasks as shown in the image in which order I want them to run:
Customers
Locations
Orders
Items
Each data task flow consists of:
Flat File data source
Data Conversion
OLE DB Destination
When I run, according to the Execution Results tab in the image, not in the order I wanted but:
Customers
Items
Locations
Orders
The execution completes successfully though. I was able to check all data successfully inserted into DB.
Questions:
Does sequence of tables migration matter when tables have foreign keys?
If yes, then why the above execution didn't throw any errors?
How to specify the order of execution?
Thanks.

The order you've specified is the order in which it has actually run. The Execution Results tab is showing them alphabetically.
If you'd like to see the order and prove it to yourself, add some tasks in the sequence to write to a table with datetimes in, or make or inspect a log (using the SSIS Catalog if you're using a Project deployment, or a Log Provider if using Package deployment).

The order is exactly the way you set. but If its important to you to be sure its in specific order, you can set the Constraint to Completion.
i this way the second will not start till the first complete.

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

Package is struck at "Execute phase is beginning" at Lookup task

I have used a Lookup in my data flow task. When I use Full Cache mode, the data flow task runs fine. But when I use Partial Cache or no Cache in my lookup, the records do not go past the lookup task and it keeps running for hours. I have checked for errors but there aren't any errors displayed. Could anyone please help me on this?
A Lookup is not appropriate for your task. Instead:
Add an OLE DB Source to pull in the data
Sort the records from the incoming source and the OLE DB Source
Perform a merge join (Full outer).
Add a Derived Column Transformation to check for ISNULL on the two joining columns. Create a new output column Called Action. For the NULLs in the target then you will tag that as an INSERT record.
Add a conditional split to send the INSERT record to an OLE DB Destination to insert the new records.
You can also check to see if there are matches between the two populations and perform updates, or look for NULLs in the source and DELETE in the destination.

SSIS Incremental Load Error Handling

I have SSIS packages to extract fact tables into the staging tables. I have a control table which contains the last extract date for each table. So, the package extract rows where > control table date. The problem I have is I want to redirect rows with error to an error file in the data flow task of the package. If I do that the package will not fail (so I can't rollback) and some rows might actually go through which if I coninue with the process will ultimately get to my fact table. Now, next time when I run the package if I had updated the control table, I will miss the rows which had erros. If I had not updated the control table with the date, I will re-extract the rows which went through. What is the best practice for this?
How about adding a Row Count Transformation onto the error branch? It sounds like you are using the transaction option in SSIS so put the Data Flow in a sequence container and post Data Flow, evaluate the value of your row count variable. If it's greater than zero, rollback/abort processing.

What is the best tool to use to transfer Data from Reporting Database to another?

I have a reporting database and have to transfer data from that to another server where we run some other reports or functions on Data. What is the best way to transfer data periodically like months or by-weekly. I can use SSIS but is there anyway I can put some where clause on what rows should be extracted from the source database? like i only want to extract data for a current month. Please do let me know.
Thanks,
Vivek
For scheduling periodic extractions, I'd leave to that SQL Agent.
As for restricting the results by some condition, that's an easy thing. Instead of this (and you should always use SQL Command or SQL Command From Variable over Table Name/Table Name From Variable as they are faster)
Add a parameter. If you're use OLE DB connection manager, your indicator for a variable is ?. ADO.NET will be #parameterName
Now, wire the filter up by clicking the Parameters... button. With OLE DB, it's ordinal position starting at 0. If you wanted to use the same parameter twice, you will have to list it each time or use the ADO.NET connection manager.
The biggest question you will have to answer is how do I identify what row(s) need to go. Possibilities are endless: query into the target database and find most recent modified date for a table or highest key value. You could create a local table that tracks what's been sent and query that. You could perform an incremental load / ETL Instrumentation to identify new/updated/unchanged rows, etc.

Can SSIS execute tasks in specific order?

I am in process of creating an ssis package that need to do following in specified order:
process some data
move that data to some other tables
Get some data and push it in a plain text file.
I have created 3 store procedure for these, I have 2 "Execute SQL tasks" for 1 and 2 and a "Data Flow task" for 3rd.
Now when i run the package i can see all 3 step are completed (no errors) but they are not running in correct order.
I see step 3 is run first then step 1 and 2, i think then step 3 runs again. Normally i can ignore it but as the data in the text file can be 700 mb, i need to find a way to get SSIS to run these task in sequence.
I have tried "Sequence Container" but no luck.
Can some one help me with this please?
KA
You need to use precedence constraints to tell SSIS what order your tasks need to be executed in.
Drag the green arrow from task one to task two, and from task two to task three.
You could connect as
first SQL execute task
precedence constraint on success
second SQL execute task
precedence constraint on success
data flow
SSIS will follow the sequence as we required.
thanks
prav
I had exactly this problem. Tasks were being executed in something like the order I'd created them rather than the sequence I specified later. It turned out that I'd managed to get a task that belonged to the first sequence container to appear in the last sequence container without loosing it's allegiance to the first. I discovered this by taking a backup and deleting sequence containers - the rogue task disappeared when I deleted the first sequence container.
The fix was to cut and past the task into the desired sequence container.
I encounterd an issue on SQL Server Denali when individual components were running out of sequence even though they were joined by success constraints. The problem seemed to occur when I had cut and pasted the components and the constraint. By deleting and reapplying the constraints, the package then ran in the correct order.
In my case, if I want to decide execute order in sequence containers, I will use [sub sequence containers] between execute sql task and data flow task. Hope useful for you.
The best is to use Sequence Containers... basically they help in creating a Sequence.
But since it does not work in your case, create Child Packages for all your different process
and then create the Master Package which will have a link to those child packages, USE "Execute Package task"