SSIS - Delete rows - ssis

I'm new to SSIS and need help on this one. I found an article which describes how to detect rows which exist and which have changed. The part that I'm missing is how to update rows that changed. I found some articles which say that it's also good solution to delete records which have changed and insert new recordset. The thing is I don't know how to do that step of deleting (red box).
Any suggestions?

If you have to delete the rows within Data Flow Task, then you need to use the OLE DB Command transformation and write a DELETE statement like DELETE FROM dbo.Table WHERE ColumnName = ?. Then in the column mappings of the OLE DB Command transformation, you will map the parameter represented by the question mark with the data that comes from the previous transformation. In your case, the data that comes from Union All 2.
However, I wouldn't recommend that option because OLE DB Command executes for every row and it might slow down your package if there are too many rows.
I would recommend something like this:
Redirect the output from the Union All 2 to a temporary staging table (say dbo.Staging) using OLE DB Destination.
Let's us assume that your final destination table is dbo.Destination. Now, your Staging table has all the records that should be deleted from the table Destination.
On the Control Flow tab, place an Execute SQL Task after the Data Flow Task. In the Execute SQL Task, write an SQL statement or use a stored procedure that would call an SQL statement to join the records between Staging and Destination to delete all the matching rows from Destination table.
Also, place another Execute SQL Task before the Data Flow Task. In this Execute SQL Task, delete/truncate rows from the Staging table.
Something like this might work to delete the rows:.
DELETE D
FROM dbo.Destination D
INNER JOIN dbo.Staging S
ON D.DestinationId = S.StagingId
Hope that helps.

In addition to user756519 answer. If you have millions of records to delete the last step (4) for ExecuteSQL Delete statement can be done in batches with something like this:
WHILE (1=1)
BEGIN
DELETE D
from dbo.Destination D
inner join
(
-- select ids that should be removed from table
SELECT TOP(10000) DestinationId
FROM
(
SELECT
D1.DestinationId,
S.StagingId
from
dbo.Destination as D1
LEFT JOIN
dbo.Staging as S
ON
D1.DestinationId = S.StagingId
) AS G
WHERE
StagingId IS NULL
) as R
on D.DestinationId = R.DestinationId;
IF ##ROWCOUNT < 1 BREAK
-- info message
DECLARE #timestamp VARCHAR(50)
SELECT #timestamp = CAST(getdate() AS VARCHAR)
RAISERROR ('Chunk deleted %s', 10, 1,#timestamp) WITH NOWAIT
END

Related

Delete records from MySql Using SSIS

I'm trying to delete (and update, but if I can delete than I'll be able to update) product data from MySQL website database using SSIS, when those products have been marked in our ERP (and in the sql server database used for reporting) as discontinued. I've tried the following:
First Attempt: Saving the rows-to-be-deleted to a recordset and using a for-each loop with an execute sql task to delete them as described here.
Result: Partially works, but is extremely slow and fails after about 500 deletes each time. Makes me wonder if the MySql database has some kind of hacker-protection feature.
Second Attempt: Converting the primary key for all rows-to-be-deleted into a comma-separated string variable using FOR XML PATH : as described here (or, rather, a series of them because of the 4000 char limit).
SQL Select Code (works fine)
WITH CTE (Product_sku,rownumber) AS
(
SELECT product_sku
, row_number() over(order by product_sku)
FROM product_updates
WHERE action = 'delete'
)
SELECT
Delete1= cast(
(SELECT TOP 1
STUFF(
(SELECT ',''' + product_sku+'''' FROM CTE
WHERE cte.RowNumber BETWEEN 1 and 700
FOR XML PATH (''))
, 1, 1, '') )
AS varchar(8000))
... and nine more of these select statements into additional variables to allow for larger delete operations.
And then using this result to delete records from MySql using an Execute SQL command with the following code:
DELETE FROM datarepo.product
WHERE product_sku in (?)
Result: The package executed but failed to delete anything. When viewing the MySql query log file I saw the following, which tells me why it failed to delete anything.
DELETE FROM datarepo.product
WHERE product_sku in ('\'')
Note that this same SSIS Execute SQL statement , when using hardcoded values (like the following), deletes just fine.
DELETE FROM datarepo.product
WHERE product_sku in ('1234','5678','abcd', etc...)
I haven't been able to find anything else online. As Reza Rad said in the first linked post, it's hard to find material about using SSIS to perform operations on MySql.

SSIS Data flow task - trying to execute a stored procedure as part of the OL DB SQL command

Question: Can a stored procedure be ran as part of the SQLCommand in the OLE DB provider?
I'm new to SSIS and have gotten simple SQL commands to work. But I'm trying to use a stored procedure inside the SQL command. Results of the Main SQL are pulled from the pivot table created by the stored procedure (which is used as a temporary table)
The stored procedure is one of the JOINS in the main SQL (actually it's an 'OUTER APPLY xxx (where xxx is the stored procedure name) as Alias'. The purpose of the stored procedure is to create a single row of results where normally there would have been one row per account listing all DX.
---------------------------------------------------------
Sample source table
account line DX
acct1 1 abc123
acct1 2 cdf123
acct1 3 xxx12
acct2 1 bcv12
acct2 2 xul35
Note: the Account is passed to the Stored procedure from the main table
---------------------------------------------------------
Sample result generated by the stored procedure:
account ICD01 ICD02 ICD03
acct1 abc123 cdf123 xxx12
acct2 bcv12 xul35
note: the stored procedure takes and create a column (created by ColName + Line#) ie ColName = DX + Line 1 = DX1 (which is the row header). Vs having 3 rows returned for Acct1 & 2 rows returned for Acct2
---------------------------------------------------------
sample sql
select table1.account_id 'Account'
,ICD.[ICD01] 'ICD01'
,ICD.[ICD50] 'IC02'
from Table1
left outer join Table2 on table2 = tablel
join table3 on table1= table3
OUTER APPLY StoredProcedure(table1.ACCOUNT_ID) as [ICD]
where table1.date_time between #Date_From and #Date_To
---------------------------------------------------------
When I run the SSIS package, the Headers are created in the file & the correct number of rows are returned, but no Data is pulled for the columns created by the stored procedure. Running from the Data Flow. I did see posts on adding an 'Execute SQL Task' in the Control Flow, but my problem is the Account_ID is passed to the stored procedure (to pull only data for those Accounts) from Table1 so I can't run the Stored procedure outside of the Data Flow before the Data Flow SQL is ran (from what I can tell - but as I said, I'm new to SSIS).
When I run the SQL via SQL Server Management Studio, I get what I'm expecting. So I know the Stored procedure works.
Any help with this is greatly appreciated.

How to neglect "Invalid column name" error in SQL Server 2008 R2

I am using SQL Server 2008 R2. I have created some SQL statements for some migration:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='TableA' AND COLUMN_NAME='Status')
BEGIN
UPDATE TableA
SET Status = 'Active'
WHERE Status IS NULL
END
Now, I have dropped the column Status from database table TableA.
Again when I am executing the above block, and although I have placed a check whether that column exists, only then it should execute the UPDATE statement, it gives me error
Invalid column name 'Status'
How to get rid of this error?
Thanks
You need to put the code to run in a separate scope/batch:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='TableA' AND COLUMN_NAME='Status')
BEGIN
EXEC('UPDATE TableA SET Status=''Active'' WHERE Status IS NULL')
END
The problem you currently have is that the system wants to compile your batch of code before it executes any part of it. It can't compile the UPDATE statement since there's a column missing, so it never even has a chance to start executing the code and considering whether the EXISTS predicate returns true or false.
Your current SQL Block might fail some times because Information_schema is view and not a table. Also, according to MSDN
Some changes have been made to the information schema views that break backward compatibility.
Hence we can't rely on information schema views.
Instead use sys.tables
IF EXISTS(SELECT 1 FROM SYS.COLUMNS
WHERE NAME = N'Status' AND OBJECT_ID = OBJECT_ID(N'TableA'))
BEGIN
UPDATE TableA SET Status='Active' WHERE Status IS NULL
END

SQL Merge Statement

I am trying to use the new "MERGE" statement in SQL Server 2008. The statement will get records from a temporarty table and update the same in some other table.The statement is as following:
create table #TempTable(ProcPOAmdDel_ProcessAmendmentId bigint,ProcPOAmdDel_SemiFinProdId bigint,ProcPOAmdDel_ChallanQty int)
MERGE PurProcessPOAmendmentDelivery AS pod
USING (SELECT ProcPOAmdDel_ProcessAmendmentId,
ProcPOAmdDel_SemiFinProdId FROM #TempTable ) AS temp
ON pod.ProcPOAmdDel_ProcessAmendmentId = temp.ProcPOAmdDel_ProcessAmendmentId AND
pod.ProcPOAmdDel_SemiFinProdId=temp.ProcPOAmdDel_SemiFinProdId
WHEN MATCHED THEN UPDATE
SET pod.ProcPOAmdDel_ChallanQty = temp.ProcPOAmdDel_ChallanQty;
While running the state I encountered an error Invalid column name'ProcPOAmdDel_ChallanQty'.
Could anybody help me in resolving the issue?
Include column ProcPOAmdDel_ChallanQty in Source table i.e. temp
MERGE PurProcessPOAmendmentDelivery AS pod
USING (SELECT ProcPOAmdDel_ProcessAmendmentId,
rocPOAmdDel_SemiFinProdId,
ProcPOAmdDel_ChallanQty
FROM #TempTable ) AS temp
ON pod.ProcPOAmdDel_ProcessAmendmentId = temp.ProcPOAmdDel_ProcessAmendmentId AND
pod.ProcPOAmdDel_SemiFinProdId=temp.ProcPOAmdDel_SemiFinProdId
WHEN MATCHED THEN
UPDATE SET pod.ProcPOAmdDel_ChallanQty = temp.ProcPOAmdDel_ChallanQty;

SSIS Update rows by using a flat-file containing the ID and update values

I'm new to SSIS and trying to create a dataflow task that will accomplish this type of thing:
UPDATE dbo.table1
SET lastname = t2.lastname
FROM table1 t1
JOIN table2 t2
ON t1.Id = t2.Id
Except I want to do it with the values for table2 being in a tab-delimited file like this:
ID lastname
1 Carroll
2 Patel
3 Smith
And I don't want to have to ETL table 2 into the database.
I have tried using a flat-file to pull in the values and then adding an OLE DB Data Destination, however this causes SSIS to INSERT the values rather than joining on the ID and UPDATING the field listed.
What is the correct way to approach an update of this kind with SSIS?
TIA,
Trey Carroll
This is how I'd do it:
Set a dataflow task, with the flat file as source.
Add a lookup transformation, and set it up so it looks up table 1 by id and returns lastname.
Add an Execute OLE DB command transformation to your "on success" dataflow, and execute the appropriate SQL code to update Table 1.
The con of this approach is that it executes the SQL command for every row that matches, and it can be inefficient it that number is high. It would be much more efficient if you could load the flat file to a temporal table, and then perform the update.