How to sync data insert-update-delete between source/destination in SSIS - ssis

I'm a beginner in SSIS and i want to know a simple way to sync data between source/destination like:
Insert - Update - Delete data.

You can achieve the Insert-Updet-Delete with the Merge Join and Conditional Split components.
Let's consider you have a table Employees in SQL Server as your data source component and a table EmployeesSTG as your source data component for data existing in your staging database :
Your data in the DB Application Source :
You need to sort both dataset why because merge join component needs data to be sorted either in an ascending order or descending order so here we will sort the data in an ascending order with sort component available in SSIS :
This is how the Sort component is configured :
This is how the Merge Join is configured :
After merging data, you need to split data since you did left outer join if data matches then only both sides data will be there otherwise right side will be replaced with a null so unmatched data which has null value on right side you will considered as NEW data so you will insert that and rest you will match data column wise like if Position for example does not match with left side then you will update it.
Since you have split data into 3 parts so we need 3 destinations.
1- As per first condition data will be inserted so for this we will use OLEDB Destination component.
2- As per second condition data will be updated so for this we will use OLEDB Command component to write update command :
UPDATE Employees
SET FullName=?,
Position=?
WHERE EmployeeID=?
And this is the mapping overview :
3- As per the 3rd condition, you will delete the records in the staging table using an OLEDB Command component to write update command :
DELETE FROM Employees
WHERE EmployeeID=?
And this is the mapping overview :

Related

Copying LookupId into all rows in SSIS

I have two tables, one is LookUpTable where I have an id for AttendanceType = Remote. And I have another table where there is a column called AttendanceType.
I want to fetch the id from LookupTable and copy in all rows of another table into the column AttendanceType in SSIS package?
You can use merge join in ssis to join both the sources based on attendence type field and fetch required fields from both the tables.
You can use a MERGE JOIN query depending on your RDBMS.
You can also use the MERGE JOIN component and the SSIS OLE DB Destination. The data coming out of the sources must be sorted. To tell SSIS that the data is sorted, right-click on your source component. Click on show advanced editor then on input and output properties. Then pass the IsSorted box to True.
Then, you must indicate to SSIS which column acts as SortKey, ie on which column to sort the data.

SSIS Getting the most up to date record using datetime columns

I am building an SSIS for SQL Server 2014 package and currently trying to get the most recent record from 2 different sources using datetime columns between the two sources and implementing a method to accomplish that. So far I am using a Lookup Task on thirdpartyid to match the records that I need to eventually compare and using a Merge Join to bring them together and eventually have a staging table that has the most recent record.I have a previous data task, not shown that already inserts records that are not in AD1 into a staging table so at this point these records are a one to one match. Both sources look like this with the exact same datetime columns just different dates and some information having null values as there is no history of it.
Sample output
This is my data flow task so far. I am really new to SSIS so any ideas or suggestions would be greatly appreciated.
Given that there is a 1:1 match between your two sources, I would structure this as an Source (V1) -> Lookup (AD1)
Define your lookup based on thirdpartyid and retrieve all the AD1 columns. You'll likely end up with data flow look like like name and name_ad1, etc
I would then add a Derived Column that identifies if the dates are different (assuming in that situation, you need to take action)
IsNull(LastUpdated_AD1) || LastUpdated > LastUpdated_AD1
That expression would create a boolean if the column in AD1 is null or if the V1 last updated column is greater than the AD1 version.
You'd likely then add a Conditional Split into your Data Flow and base it on the value of our new column and then route the changed data into your mechanism for handling updates (OLE DB Command or preferably, an OLE DB Destination + an Execute SQL Task post Data Flow to perform an batch update)
The comment asks
should it all be one expression? like IsNull(AssignmentLastUpdated_AD1) || AssignmentLastUpdated > AssignmentLastUpdated_AD1 || IsNull(RoomLastUpdated_AD1) || RoomLastUpdated > RoomLastUpdated_AD1
You can do it like that but when you get a weird result and someone asks how you got that value, long expressions make it hard to debug. I'd likely have two derived columns components in the data flow. The first would create a has changed column for each set of conditions
HasChangedAssignment
(IsNull(AssignmentLastUpdated_AD1) || AssignmentLastUpdated > AssignmentLastUpdated_AD1)
HasChangedRoom
IsNull(RoomLastUpdated_AD1) || RoomLastUpdated > RoomLastUpdated_AD1
etc
And then in the final derived column, you create the HasChanged column
HasChangedAssignment || HasChangedRoom || HasChangedAdNauseum
Using a pattern based approach like this makes it much easier to build, troubleshoot and/or or make small changes that can have a big impact to the correctness, maintainability and performance of your packages.

column mapping trouble within ADFv2

I have a source .csv with 21 columns and a destination table with 25 columns.
Not ALL columns within the source have a home in the destination table and not all columns in the destination table come from the source.
I cannot get my CopyData task to let me pick and choose how I want the mapping to be. The only way I can get it to work so far is to load the source data to a "holding" table that has a 1:1 mapping and then execute a stored procedure to insert data from that table into the final destination.
I've tried altering the schemas on both the source and destination to match but it still errors out because the ACTUAL source has more columns than the destination or vice versa.
This can't possibly be the most efficient way to accomplish this but I'm at a loss as to how to make it work.
The error code that is returned is some variation on:
"errorCode": "2200",
"message": "ErrorCode=UserErrorInvalidColumnMappingColumnCountMismatch,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Invalid column mapping provided to copy activity: '{LONG LIST OF COLUMN MAPPING HERE}', Detailed message: Different column count between target structure and column mapping. Target column count:25, Column mapping count:16. Check column mapping in table definition.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "LoadPrimaryOwner"
Have you tried mapping the columns in the graphical editor? Just click on the copy activity, then mapping and click the blue button "Import Schemas". This will import both schemas and let you pick which column from source maps with which column from sink.
Hope this helped!
In the sink dataset delete the columns that you don't want to be mapped.
delete the columns that are not required in the sink by selecting and then click delete button
[
In order for the copy to work smoothly.
1.The source dataset should have all the columns in the same sequence.
2. All the columns selected in the sink dataset have to be mapped.
It seems that you were trying to extract the 16 columns from the source table to target table. If your target is Sql Server or Azure Sql DB, you can try the following settings:
Set the source structure as 21 columns in csv file.
Set the column mapping with 16 columns mapping as your wanted data.
Set the target structure as 16 columns, which has the same name and order in the column mapping definition.

SSIS to import data from excel into multiple tables

I have an Excel sheet (input) where each row needs to be saved in one of three SQL server tables based on the Record type (column 1) of the row.
Example:
If the Record type is EMP, the whole row should go to the Employee table.
If the Record type is CUS, the whole row should go to the Customer table
I am trying to use a multicast and not sure how to split the data from multicast to the destination table. Do I need any other control in between?
Any idea would be appreciated.
A Conditional Split Component sounds like just want you need. A Conditional Split uses expressions you define to route each input row to one output. In your case, your Conditional Split would define three outputs, each of which would be attached to a SQL destination.
In comparison, the Multicast Component you're currently using sends each input row to all outputs. This component would be useful if you were trying to save a copy of each row to all three SQL destinations.

SSIS 2008. Transferring data from one table to another ONLY if the data is not duplicated

I'm going to do my best to try to explain this. I currently have a data flow task that has an OLE DB Source transferring data from a table from a different database to a table to another database. It works fine but the issue I'm having is the fact that I keep adding duplicate data to the destination table.
So a CustomerID of '13029' with an amount of '$56.82' on Date '11/30/2012' is seen in that table multiple times. How do I make it so I can only have unique data transferring over to that destination table?
In the dataflow task, where you transfer the data, you can insert a Lookup transformation. In the lookup, you can specify a data source (table or query, what serves you best). When you chose the data source, you can go to the Columns view and create a mapping, where you connect the CustomerID, Date and Amount of both tables.
In the general view, you can configure, what happens with matched/non matched row. Simply take the not matched output and direct it to the DB destination.
You will need to identify what makes that data unique in the table. If it's a customer table, then it's probably the customerid of 13029. However if it's a customer order table, then maybe it's the combination of CustomerId and OrderDate (and maybe not, I have placed two unique orders on the same date). You will know the answer to that based on your table's design.
Armed with that knowledge, you will want to write a query to pull back the keys from the target table SELECT CO.CustomerId, CO.OrderId FROM dbo.CustomerOrder CO If you know the process only transfers data from the current year, add a filter to the above query to restrict the number of rows returned. The reason for this is memory conservation-you want SSIS to run fast, don't bring back extraneous columns or rows it will never need.
Inside your dataflow, add a Lookup Transformation with that query. You don't specify 2005, 2008 or 2012 as your SSIS version and they have different behaviours associated with the Lookup Transformation. Generally speaking, what you are looking to do is identify the unmatched rows. By definition, unmatched means they don't exist in the target database so those are the rows that are new. 2005 assumes every row is going to match or it errors. You will need to click the Configure Error Output... button and select "Redirect Rows". 2008+ has an option under "Specify how to handle rows with no matching entries" and there you'll want "Redirect rows to no match output."
Now take the No match output branch (2008+) or the error output branch (2005) and plumb that into your destination.
What this approach doesn't cover is detecting and handling when the source system reports $56.82 and the target system has $22.38 (updates). If you need to handle that, then you need to look at some change detection system. Look at Andy Leonard's Stairway to Integration Services series of articles to learn about options for detecting and handling changes.
Have you considered using the T-SQL MERGE statement? http://technet.microsoft.com/en-us/library/bb510625.aspx
It will compare both tables on defined fields, and take an action if matched or not.