Rather new to SSIS so not sure how to handle this.
I have a flat file which i managed to successfully read from. So right now my data flow consists of just a flat file source.
What i want to do is something like this:
Update SqlTable S
set s.columnA = f.columna
from FlatFile f
where s.columnID = f.columnID
Right now the only way i can see of doing this would be to insert the contents of the flat file into a sql table, then doing my update. This seems wasteful considering i don't need to save the data of the flat file. I just need to update an existing sql table based on the data in the flat file. So is there some way to run the query directly in the SSIS package instead of having to insert a bunch of data into a sql table that i will just wind up dropping?
thanks
Update SqlTable S set s.columnA = f.columna from FlatFile f where s.columnID = f.columnID
That statement above is a SQL statement. You cannot connect a sql table to a flat file. You need to work in SQL to do an update, since that is where the table lives
You have 2 choices:
Use an OLEDB Command component within the data flow. The downside is this calls the statement for each record, so if you have 1,000s of records it is very inefficient.
Push the records to a table using an OLE DB Destination and then you can call your update using an Execute SQL Task. You can then truncate the table if you like
A possible 3rd option is to roll your own OLE DB destination to do an update on record sets vs records.
While this might sound wasteful, to create a table in the database to store update records, it is done very often. You just drop the worktable or truncate when complete.
You could add an OLE DB Command component to the Data Flow that retrieves data from the flat file. The OLE DB Command would do a single row update for each record retrieved from the flat file. This might be okay if there are few rows in the flat file; but, you can imagine how bad performance will be if there are many rows in the flat file.
I think you'll find that sending the flat file rows to a database table and running a single UPDATE is going to be the best performer for lots of data.
I haven't tried this but have you tried sending to a recordset destination and then running the update using that?
The bulk load into a temporary table is the way to go and then do your updates from the temp table. As a previous poster says it is quite a common aproach to stuff data into a staging area prior to doing some more work with the data and then dropping or truncating the table
Related
I have a table in database 1 with columns x and y. I have another table in database 2 with columns x and y. I want to update all the y columns in database 1 to the y columns in database 2 where the x columns in database 1 match the x columns in database 2.
This seems like an unbelievably trivial task, but I can't figure out how to do it in SSIS. I have an OLE DB Source and Destination in my data flow task and I have the 2 columns mapped, but it keeps trying to insert instead of update, and it fails because there are a bunch of other non-nullable columns in the destination that I don't have mapped.
The problem with using SSIS to do data transformation is that both the source and target data sets need to be pulled up into memory on the ETL server, the transformation needs to happen there, and then the results have to be written back down to the destination server.
It's network intensive. It's memory intensive. It's just less than ideal. That's also why you're having trouble figuring it out. On a server, it's just an UPDATE statement, but getting it up into SSIS requires many more steps than just that, and absent third party tools, there's no out of the box method to do anything other than row by row updates.
In your situation, where your source data is comparatively lightweight, I would suggest that the most efficient approach would be to use SSIS to move the source data from the source server to the target server and drop it into a working/holding/intermediate table. SSIS is absolutely awesome at moving data from point A to point B. Then, after the Data Flow, use an Execute SQL task to either call an UPDATE stored procedure, or go ahead and write the UPDATE statement in the package.
Doing it that way off-loads the DML from the ETL server to the SQL Server, which is designed for exactly that kind of work. Sort of a "let everybody do what they're good at" approach, if you will.
OK so rather than trying to directly map the data from DB2 into DB1, it's probably a better idea to stage the data from DB2 into DB1 and then update the table of interest in DB1.
The best way to do this is to create a new table in DB1 that stores all of the data from the table in DB2. Let's call this table 'staging'. Use SSIS to do a flat insert from the table in DB2 to your new 'staging' table in DB1 and then create an UPDATE stored procedure in your DB1 database to update existing entries in your endpoint table based on the entries you now have in your 'staging' table. You can trigger the SP from SSIS to run after your staging table is populated. You can cut out the 'staging' table here if you have a synonym from DB1 that references the table in DB2.
SSIS is more about bulk movement of data than it is updating what already exists. Use stored procedures for anything in between.
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I'm looking for the best practice to insert or update rows from a MySQL connection to a SQL Server connection.
First of all, I added a ADO.NET data source to grab MySQL content (a simple table Supplier with two fields id and name). Then, I added a Lookup transformation to split new rows / updated rows. It works well when I need to insert new rows. However, I would like to use a Command OLE DB to update existing rows but It doesn't work due to a incompatibility between my connection manager and the component (ADO.NET vs OLE DB).
Any idea to update modified rows ?! Should I use a cache component ?!
Thanks in advance !
Just get rid of the lookup and conditional split all together.
Outside of your SSIS package, build a staging table that contains the fields you need for inserts/updates.
In your SSIS Package, create a control flow that does the following:
Execute SQL Task to truncate the staging table.
Data Flow task to load the MySQL data from the source system to the staging table. If you can do this based on a "changes-only" type process, such as using a timestamp that you check, it would be faster.
Execute SQL Task to perform an UPDATE statement on your target table using the staging table joined to the target table.
Execute SQL Task to perform an INSERT statement on your target table using a query based on the target table and your staging table (with a WHERE NOT EXISTS or some such on a key fied)
I would change the SQL connection to use OLE DB. As well as allowing the OLE DB Command to work, you may also find the OLE DB Destination is faster.
Can someone please advise:
I have got SOURCE (which is OLD SQL Server) and we need to move data to new DESTINATION (which is NEW SERVER). So moving data between different instances.
I'm struggling how to write the package which looks up in destination first and check if row exists then do nothing else INSERT.
Regards
Here are the steps:
Take an OLEBD Source, connect it with a Lookup task.
Select the column that can be looked up. There should be some kind of ID for you to do this. Also select all the columns that need to be passed(SSIS has provisions of applying check boxes).
Connect the lookup no match rows to an OLEDB destination, do the mapping, and you are done.
If you want to redirect all those matching rows to somewhere, say a notepad file, you can do that too...
I would use an Lockup transformation and redirect match output to something else like a OLEDB command in there you can write a IF exist statement or create that in a stored procedure that way it will either insert data or update data not insert duplicates
I am trying to update one of my SQL tables with new columns in my source CSV file. The CSV records in this file are already in this SQL table, but this SQL table is lacking some of the new columns from this CSV file.
I already added the new columns to my SQL table structure via ALTER TABLE. But now I just need to import the data from this CSV file into the new columns. How can I do this? I am trying to use SSIS and SQL Server to accomplish this, but am pretty new to Excel.
This is probably too late to solve salvationishere's problem; though I'm posting this for future readers!
You could just generate the SQL INSERT/UPDATE/etc command by parsing the csv file (a simple python script will do).
You could alternatively use this online parser:
http://www.convertcsv.com/csv-to-sql.htm
(Hoping that it'd still be available when you click!)
to generate your SQL command. The interface is extremely straight forward and it does the entire job in an awesome way.
You have several options:
If you are loading the data into a non-production system where you can edit the target tables, you could load the data into a new table, rename the old table to obsolete, and rename the new table to the old table name.
You can load the data into a staging table and then write a SQL statement to update the target table from the staging table.
You can open the CSV file in Excel and write a formula to generate an update script, drag the formula down across all rows so that you get a separate update statement for each row, and then run the separate update statements in management studio.
You can truncate the target table and update your existing ssis package that imports the file to use the new columns if you have the full history in your CSV file.
There are more options, but any of the above would probably be more than adequate solutions.