Which SSIS transform can I use to join and update table?

Which SSIS transform can I use to join and update table? - ssis

I have a data flow task in which I am reading data from sql server table A (id, order_no, amount).
I want to join this result to table B (order_no, amount) located on another sql server, ON tableA.order_no = table B.order_no and perform a addition of the both amounts and store it back into table B.
I have connection manager setup for both sql server databases.
Which transform can I use to perform this operation?

This dtsx design may help:
Create a temp table on server B temp db via Execute SQL Task.
Create a temp table and load table A data into via a Data Flow Task.
Create another Execute SQL Task to join table B to the newly
created temp table on order_no and make a tableB.amount +
temptable.amount as [Added Amounts] column.
You did not specify if the new column will be needed in table B or you will need to update table B's amounts. Pending this the code will be adjusted.
This process overall will eliminate db overhead.

Related

SQL Server tables : It takes a long time to join temporary table variable ( #table ) in SQL Server 2014

I create a temporary table variable (#table). Then I inner join it with another table. It takes a long time to display result. I try again with #table. It's normal. What's wrong with it?

If you store too much data in temp table or temp table variable then it result in poor performance. Because these temp tables variables not allow indexing and in temp table most of the time developer forget to use proper indexing therefore during join complete table scan happen which slow the query output.
Another important point to notice, avoid joins on varchar column.

How to only sync the new records from table A to table B in SSIS?

Does any one how can I only sync the new records from table A to table B in SSIS?
I have 2 tables which table A and table B. Table A will be update time to time from user. However, records from table B is sync from table A every 30 mins. I have created a SSIS job to do the sync, see below for more details:
I have an issue here, every times when I'm ruining the job, it will copy all data from table A and insert into table B (this causing duplicate records had been insert). Any way that I can set the job so that it will only sync the new records into table B?

I would use a MERGE statement into an execute SQL Task.
Please review some examples about how to use the merge command:
Using MERGE in SQL Server to insert, update and delete at the same time
The MERGE Statement in SQL Server 2008
Kind Regards,
Paul

How can I store the output of a query into a temporary table and use the table in a new query?

I have a MySQL query which uses 3 tables with 2 inner joins. Then, I have to find the maximum of a group from this query output. Combining them both is beyond me. Can I break down the problem by storing the output of the first complicated query into some sort of temporary table, give it a name and then use this table in a new query? This will make the code more manageable. Thank you for your help.

This is very straightforward:
CREATE TEMPORARY TABLE tempname AS (
SELECT whatever, whatever
FROM rawtable
JOIN othertable ON this = that
)
The temporary table will vanish when your connection closes. A temp table contains the data that was captured at the time it was created.
You can also create a view, like so.
CREATE VIEW viewname AS (
SELECT whatever, whatever
FROM rawtable
JOIN othertable ON this = that
)
Views are permanent objects (they don't vanish when your connection closes) but they retrieve data from the underlying tables at the time you invoke them.

SSIS Data Flow performance slow vs select into

so I have a script:
select *
into my_archive_05302013
from archive_A
where my_Date = '05/18/2013'
and:
insert into archive_B (ID,my_date,field_n )
select ID, my_Date,field_n from my_archive_05302013
where the n in field_n is about 100 or so. so in other words there are more than 100 columns in the table that I am loading.
which run pretty fast the query inserts about 200000 records. my_date is a non-clustered index in table archive_A
Now when I create a dataflow using SSIS 2008 it takes HOURS to complete
I have the following in my OLE DB source:
SELECT * FROM Archive_A
WHERE My_Date = (SELECT MAX(My_Date) from Archive_A)
and for OLE DB Destination:
Data access mode of: "Table or view - fast load"
Name of the table: archive_B
Table lock and Check constraints is checked
Any one know what the problem could be?
Thanks in advance

The problem is that because you are using a data source and a data destination what you are doing is pulling all of the data out of the database to then put it all back in again whereas your INSERT statement keeps it all contained within the database. Use an Excute SQL Task with your INSERT statement instead.

SSIS prefilter records before extraction

I have to migrate records from a table in Oracle to SQL Server 2008 R2. I already designed the solution that allows me to move the data and save a copy of the migrated IDs into a stage table.
Thanks to a Lookup component and the stage table I can ensure to avoid duplicates, since the migration is done in several different moments and the objects migrated do not follow a specific sequencial order.
Below my SSIS schema:
I use an expression and two variables to gather data from Oracle in slots:
SELECT *
FROM ORDERS
WHERE OrderID > [#User::Start] AND OrderID <= [#User::End]
AND STATUS <> 'Open'
When all the orders that are not in statsu "Open" have been migrated, we will migrate the remaining delta. To do this I need to lookup to the aready migrated data in Stage. SO the query for the data source will be:
SELECT *
FROM ORDERS
WHERE OrderID NOT IN ([#User::AlreadyMigratedIDs])
My need would be able to store in variable "AlreadyMigratedIDs" all the IDs present in table Stage.
How would it be possible to use the information in the stage table (on SQL Server) as a condition for the query used in the ADO.net component to gather the source data from Oracle? Can I use any other SSIS, like lookup, before the ADO.net object?

Use Execute SQL Task before your Data Flow to store the values of User::Start and User::End from the sql server staging table
Name Data Type
StartID int
EndID int
Use ResultSet as Single Row
For sample I have taken the query from Adventure Works 2008 R2
Select max([BusinessEntityID]) as StartID
,max([DepartmentID]) as EndID
FROM [AdventureWorks2008R2].[HumanResources].[EmployeeDepartmentHistory]
Change the above query to match your needs to get the ID values from your staging table
In the resultset match the results to the variables created
Now use the 2 variables in your oracle query
SELECT *
FROM ORDERS
WHERE OrderID > [#User::StartID] AND OrderID <= [#User::EndID]
AND STATUS <> 'Open'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Which SSIS transform can I use to join and update table? - ssis

Related

SQL Server tables : It takes a long time to join temporary table variable ( #table ) in SQL Server 2014

How to only sync the new records from table A to table B in SSIS?

How can I store the output of a query into a temporary table and use the table in a new query?

SSIS Data Flow performance slow vs select into

SSIS prefilter records before extraction

Categories

Resources