I am using SSIS 2012 with a data flow task having a data source and an Ole DB Sql Task.
The data source is creating a set of Id's { 1,2,3, etc } with the Ole DB Sql Task deleting a record in another database-table. What I am seeing in the Sql Profiler is a delete command for each Id which is expected as it is working on a record by record basis. I can get upto 10,000 records.
Is there any way I work with the output of the data source as a set and say:
delete from Table1 where Id in { set of Id's }
You cannot do that in SSIS.
In fact, you can build an expression and execute that expression in SSIS, but you don't WANT to do that. Expressions are limited in the number of characters they can have, and they are a mess at maintenance time.
Some things are better done directly in a stored procedure, while other are better in SSIS. The art of SSIS is to know when to do it in SSIS or in a procedure.
Good luck!
Related
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I have created Ole DB connection to execute different SQL task across SSIS package. Its working fine too.
In one of task where i need to do insert data into table, used SQLBulkCopy as i have dynamic tables and columns based on getting files from different sources.
SQLBulkCopy only works with SqlConnection, so i opened SqlConnection, executed SqlBulkCopy. This is also working fine.
After done with SqlBulkCopy, i have created Sql Task which updates metadata of inserted rows, for e.g. Count, Min & Max date and so on in different table.
This table is not getting updated and if i execute stored procedure from Sql Management Studio, it works as expected.
So my assumption is that, Ole DB connection is not able to get the latest data Data inserted thru SQL Connection.
I may be wrong, but not sure why i can see sql execution task successful but still table is not updated.
Am i missing anything here?
My bad.
Instead of passing data type as long (int in SQL), i was passing it as Varchar.
I was looking from last few hours and as soon as i post question here, it strikes me to check the data type.
Hope it will help somebody.
I have an SSIS package, There are only two tasks in my SSIS package. One is Execute SQL Task and another one is Data Flow Task. My first task is doing to truncate a table (Table_1) and second task is just load data in truncated table means Table_1. An SSRS report get data from that table means Table_1. My SSIS packages run every hour. when SSIS package running in the same time users has complaining that they are not able to view data in the report. How do I do that my SSIS package start running. Users can able to view data in my Report.
Off the top of my head, have a table that stores values on whether this deal is running or not, doing something like setting a single value to 1 when running and 0 when not running.
Then in your SSRS, instead of a 'SELECT * FROM Table_1', write a stored procedure that has logic to lookup the value in the above table. If 0 return the set normally, if 1 then return some kind of friendly message that data is currently being loaded and the report is not available.
I'm looking for the best practice to insert or update rows from a MySQL connection to a SQL Server connection.
First of all, I added a ADO.NET data source to grab MySQL content (a simple table Supplier with two fields id and name). Then, I added a Lookup transformation to split new rows / updated rows. It works well when I need to insert new rows. However, I would like to use a Command OLE DB to update existing rows but It doesn't work due to a incompatibility between my connection manager and the component (ADO.NET vs OLE DB).
Any idea to update modified rows ?! Should I use a cache component ?!
Thanks in advance !
Just get rid of the lookup and conditional split all together.
Outside of your SSIS package, build a staging table that contains the fields you need for inserts/updates.
In your SSIS Package, create a control flow that does the following:
Execute SQL Task to truncate the staging table.
Data Flow task to load the MySQL data from the source system to the staging table. If you can do this based on a "changes-only" type process, such as using a timestamp that you check, it would be faster.
Execute SQL Task to perform an UPDATE statement on your target table using the staging table joined to the target table.
Execute SQL Task to perform an INSERT statement on your target table using a query based on the target table and your staging table (with a WHERE NOT EXISTS or some such on a key fied)
I would change the SQL connection to use OLE DB. As well as allowing the OLE DB Command to work, you may also find the OLE DB Destination is faster.
I have a reporting database and have to transfer data from that to another server where we run some other reports or functions on Data. What is the best way to transfer data periodically like months or by-weekly. I can use SSIS but is there anyway I can put some where clause on what rows should be extracted from the source database? like i only want to extract data for a current month. Please do let me know.
Thanks,
Vivek
For scheduling periodic extractions, I'd leave to that SQL Agent.
As for restricting the results by some condition, that's an easy thing. Instead of this (and you should always use SQL Command or SQL Command From Variable over Table Name/Table Name From Variable as they are faster)
Add a parameter. If you're use OLE DB connection manager, your indicator for a variable is ?. ADO.NET will be #parameterName
Now, wire the filter up by clicking the Parameters... button. With OLE DB, it's ordinal position starting at 0. If you wanted to use the same parameter twice, you will have to list it each time or use the ADO.NET connection manager.
The biggest question you will have to answer is how do I identify what row(s) need to go. Possibilities are endless: query into the target database and find most recent modified date for a table or highest key value. You could create a local table that tracks what's been sent and query that. You could perform an incremental load / ETL Instrumentation to identify new/updated/unchanged rows, etc.