How to prevent duplicate records to be inserted in database using SSIS - ssis

I have created SSIS job for inserting records from csv file to sql server database.
if i run the job first time, records are inserted in db successfully, but if i again run the job for second time, again it will store the same records(duplicate).
So if i run my job multiple time , then multiple time records are inserted in db.
So Is there any way to avoid duplicate records to be inserted in database.?

Please use Look up transformation in SSIS to find out a match in old records and if no match is found then insert that record. Or you can always load the new data to a staging area where you will use CDC(change data capture) to load only unmatched out put through an execute SQL task.

Related

SSIS: How to get the number of updated and deleted rows in an audit?

Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
‌
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
P‌S: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.

Package is struck at "Execute phase is beginning" at Lookup task

I have used a Lookup in my data flow task. When I use Full Cache mode, the data flow task runs fine. But when I use Partial Cache or no Cache in my lookup, the records do not go past the lookup task and it keeps running for hours. I have checked for errors but there aren't any errors displayed. Could anyone please help me on this?
A Lookup is not appropriate for your task. Instead:
Add an OLE DB Source to pull in the data
Sort the records from the incoming source and the OLE DB Source
Perform a merge join (Full outer).
Add a Derived Column Transformation to check for ISNULL on the two joining columns. Create a new output column Called Action. For the NULLs in the target then you will tag that as an INSERT record.
Add a conditional split to send the INSERT record to an OLE DB Destination to insert the new records.
You can also check to see if there are matches between the two populations and perform updates, or look for NULLs in the source and DELETE in the destination.

How to insert/update rows from MySQL to SQL Server by using SSIS

I'm looking for the best practice to insert or update rows from a MySQL connection to a SQL Server connection.
First of all, I added a ADO.NET data source to grab MySQL content (a simple table Supplier with two fields id and name). Then, I added a Lookup transformation to split new rows / updated rows. It works well when I need to insert new rows. However, I would like to use a Command OLE DB to update existing rows but It doesn't work due to a incompatibility between my connection manager and the component (ADO.NET vs OLE DB).
Any idea to update modified rows ?! Should I use a cache component ?!
Thanks in advance !
Just get rid of the lookup and conditional split all together.
Outside of your SSIS package, build a staging table that contains the fields you need for inserts/updates.
In your SSIS Package, create a control flow that does the following:
Execute SQL Task to truncate the staging table.
Data Flow task to load the MySQL data from the source system to the staging table. If you can do this based on a "changes-only" type process, such as using a timestamp that you check, it would be faster.
Execute SQL Task to perform an UPDATE statement on your target table using the staging table joined to the target table.
Execute SQL Task to perform an INSERT statement on your target table using a query based on the target table and your staging table (with a WHERE NOT EXISTS or some such on a key fied)
I would change the SQL connection to use OLE DB. As well as allowing the OLE DB Command to work, you may also find the OLE DB Destination is faster.

SSIS DataFlowTask using Record Sets instead of Records

I am using SSIS 2012 with a data flow task having a data source and an Ole DB Sql Task.
The data source is creating a set of Id's { 1,2,3, etc } with the Ole DB Sql Task deleting a record in another database-table. What I am seeing in the Sql Profiler is a delete command for each Id which is expected as it is working on a record by record basis. I can get upto 10,000 records.
Is there any way I work with the output of the data source as a set and say:
delete from Table1 where Id in { set of Id's }
You cannot do that in SSIS.
In fact, you can build an expression and execute that expression in SSIS, but you don't WANT to do that. Expressions are limited in the number of characters they can have, and they are a mess at maintenance time.
Some things are better done directly in a stored procedure, while other are better in SSIS. The art of SSIS is to know when to do it in SSIS or in a procedure.
Good luck!

ssis want to update a sql table based on a flat file

Rather new to SSIS so not sure how to handle this.
I have a flat file which i managed to successfully read from. So right now my data flow consists of just a flat file source.
What i want to do is something like this:
Update SqlTable S
set s.columnA = f.columna
from FlatFile f
where s.columnID = f.columnID
Right now the only way i can see of doing this would be to insert the contents of the flat file into a sql table, then doing my update. This seems wasteful considering i don't need to save the data of the flat file. I just need to update an existing sql table based on the data in the flat file. So is there some way to run the query directly in the SSIS package instead of having to insert a bunch of data into a sql table that i will just wind up dropping?
thanks
Update SqlTable S set s.columnA = f.columna from FlatFile f where s.columnID = f.columnID
That statement above is a SQL statement. You cannot connect a sql table to a flat file. You need to work in SQL to do an update, since that is where the table lives
You have 2 choices:
Use an OLEDB Command component within the data flow. The downside is this calls the statement for each record, so if you have 1,000s of records it is very inefficient.
Push the records to a table using an OLE DB Destination and then you can call your update using an Execute SQL Task. You can then truncate the table if you like
A possible 3rd option is to roll your own OLE DB destination to do an update on record sets vs records.
While this might sound wasteful, to create a table in the database to store update records, it is done very often. You just drop the worktable or truncate when complete.
You could add an OLE DB Command component to the Data Flow that retrieves data from the flat file. The OLE DB Command would do a single row update for each record retrieved from the flat file. This might be okay if there are few rows in the flat file; but, you can imagine how bad performance will be if there are many rows in the flat file.
I think you'll find that sending the flat file rows to a database table and running a single UPDATE is going to be the best performer for lots of data.
I haven't tried this but have you tried sending to a recordset destination and then running the update using that?
The bulk load into a temporary table is the way to go and then do your updates from the temp table. As a previous poster says it is quite a common aproach to stuff data into a staging area prior to doing some more work with the data and then dropping or truncating the table