I am doing data transfer from Finacle db to my sql server db by usingh SSIS.
I am in bank and we use finacle for data storage.now under my project i have to transfer only customer and account information..But as the data is huge,it is taking too much time...
for examople:I started my query for data fetch from finacle on 18th for 3 regions and it got completed on 19th .Then I have run that same query on 19th for another 3 regions.This way we are proceeding:::::
last we will run the query to upload all the new accounts and customer data from our first day.
My problem is:I do not want duplicate data....i.e as i am uploading data from server again and again,it will led to duplicacy... is there any way i can put a check that if data is already there in my destination table- dont put it again and if it is not there ,then only put it.Please help me in this.
Import your data to a staging table on SQL Server. Then run a stored procedure that copies only the data that doesn't exist in the destination table from the staging table to the destination table.
I needed this previously and did a little research. I happened to find this really simple workable solution here:
https://finaclestack.com/questions/how-to-prevent-data-duplication-when-running-an-import-from-tabley-to-tablex-in-finacle/
It says:
1. If the tableY content is not much, then just select * from tableY, fetch ‘insert statement’ of the result and run the statement in tableX.
2. If the tableY content is much, then select in batches using a where clause. Then fetch ‘insert statement’ of each result and run the statement in tableX.
Related
I would like to delete the MySQL DB records using ADF.
I have created the pipeline in ADF and I am copying the data from a MySQL Database to Storage Account using copy activity in ADF, once that's completed, I would like to delete those copied records from MySQL database.
I am not able to find out any activity which can allow to delete the records from MySQL Database.
Script Activity doesn't allow MySQL linked Service only SQL DB allowed.
Please need your suggestion to complete it.
You can use lookup activity which supports both SQL and MySQL with query after copy activity to delete the records after copy.
After the copy activity join it with the look up and give your source dataset.
Select query and give the truncate query to delete the records in the table.
truncate table [dbo].[output];
I have added the select script above only to avoid the lookup error which gives error if the query didn’t give any data. However, it will truncate the records in the table even after giving error.
If you want to delete the total table, you can give drop query.
drop table <tablename>;
Data copied to blob storage after copy activity:
Table after copy activity:
Here I did it using azure SQL database. You can do the same with Azure MySQL Database as lookup supports both.
You need to create a stored procedure in your Database and add the stored procedure activity as a final step in your Azure Data Factory pipeline. If you'd like to truncate the whole data once the copy is finished, here's how you would create your Stored Procedure:
GO
CREATE PROCEDURE SP_Truncate
AS
BEGIN
TRUNCATE TABLE mytable
END
Once you've created this, add a stored procedure activity as a last step in your Azure Data Factory. It'll delete the copied data. Read a bit more about this in the documentation; you can also add parameters in your stored procedure, which you can refer to using lookup activity. Let me know if you need more help.
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I have one excel sheet which has information which has to be saved in 4 tables. I have to create unique ids for each table.I also have to insert some data into table 1 and then the unique id created there will be used for inserting data into second table(Referential Integrity) . Moreover one table will always get records to be inserted but for rest 3 tables if some data already exists then it has to be updated and not inserted. I m new to SSIS so please guide me on how to proceed further in SSIS.
loads of requirements :)
First, here is an example of a package that loads an excel sheet to a sql database.
You can easily follow it to build your package.
Differences:
You say you need to insert the same data on 4 tables, so between your excel source and your destination, you will add a multicast component and them instead of 1 destination, you will have 4. The "multicast" will create 4 copies of your data, so you can insert into your 4 tables.
The IDs may be a problem, since the 4 destinations will execute separately, you cant get the ID inserted on the first table to update the second. I suggest you do it using a T-SQL on a "Execute SQL task" after everything is imported .
If that is not possible you will need to have 4 separately data flows where on each one you do the inserts reading from your excel and joining with the result of the previous insert with a lookup task
Import it into a Temp table on SQL server. Then you will be able to write a query which retrieves from the Temp table to multiple table.
Hope this solves your problem as per your requirement.
I have a reporting database and have to transfer data from that to another server where we run some other reports or functions on Data. What is the best way to transfer data periodically like months or by-weekly. I can use SSIS but is there anyway I can put some where clause on what rows should be extracted from the source database? like i only want to extract data for a current month. Please do let me know.
Thanks,
Vivek
For scheduling periodic extractions, I'd leave to that SQL Agent.
As for restricting the results by some condition, that's an easy thing. Instead of this (and you should always use SQL Command or SQL Command From Variable over Table Name/Table Name From Variable as they are faster)
Add a parameter. If you're use OLE DB connection manager, your indicator for a variable is ?. ADO.NET will be #parameterName
Now, wire the filter up by clicking the Parameters... button. With OLE DB, it's ordinal position starting at 0. If you wanted to use the same parameter twice, you will have to list it each time or use the ADO.NET connection manager.
The biggest question you will have to answer is how do I identify what row(s) need to go. Possibilities are endless: query into the target database and find most recent modified date for a table or highest key value. You could create a local table that tracks what's been sent and query that. You could perform an incremental load / ETL Instrumentation to identify new/updated/unchanged rows, etc.
So here is my situation: I have a vendor supplied DB we cannot modify and a custom db that imports data from the vendor app and acts on it. Once records are imported form the vendor app, they cannot appear on the list of records to be imported. Also we only want to display the 250 most recent records that have not been imported.
What I originally started with was select the list of ids that have been imported from the custom db, and then query the vendor db, using the list of ids in a .Where(x => !idList.Contains(x.Id)) clause on the remote query.
This worked up until we broke 2100 records imported into the custom db, as 2100 is the limit on the number of parameters that can be passed into SQL. After finding out this was the actual problem and not the 'invalid buffer'/'severe error' ADO.Net reported, my solution was to remove the first 2000 ids in the remote query, and then remove the remaining records in the local query.
Having to pull back a large number of irrelevant records, just to exclude them, so I can get the correct 250 records seems very inelegant. Is there a better way to do this, short of doing a cross db stored procedure?
Thanks in advance.
This might not be the best answer, depending on how many records you're dealing with, but you could force the SQL to execute and just deal with it as in-memory objects. Calling the ToList() method will execute the SQL and convert to an IEnumerable .
What I might suggest is to have started by querying the vendor database first ordering the results by some kind of criteria (perhaps a date field, oldest to most recent).
You could do a Skip().Take() to "skim" the results and then take each bulk set and insert them into the custom db where the ID doesn't already exist. That way you avoid the problem you have now.
If you have db-create access to the SQL Server that the vendor's db is running on (or if your custom db is on the same server), you could create a "has been imported" table in a different database on that same server, and then write a stored proc that does a cross-database join of that table against the vendor db, e.g.:
select top 250 from vendordb.to_be_imported
where not exists
(select 1 from customdb.has_been_imported where idWasImported = idToBeImported)
order by whatever;
You might even be able to do this in Linq 2 SQL -- I've never tried adding objects from different databases into a single DataContext...