I have a table in database 1 with columns x and y. I have another table in database 2 with columns x and y. I want to update all the y columns in database 1 to the y columns in database 2 where the x columns in database 1 match the x columns in database 2.
This seems like an unbelievably trivial task, but I can't figure out how to do it in SSIS. I have an OLE DB Source and Destination in my data flow task and I have the 2 columns mapped, but it keeps trying to insert instead of update, and it fails because there are a bunch of other non-nullable columns in the destination that I don't have mapped.
The problem with using SSIS to do data transformation is that both the source and target data sets need to be pulled up into memory on the ETL server, the transformation needs to happen there, and then the results have to be written back down to the destination server.
It's network intensive. It's memory intensive. It's just less than ideal. That's also why you're having trouble figuring it out. On a server, it's just an UPDATE statement, but getting it up into SSIS requires many more steps than just that, and absent third party tools, there's no out of the box method to do anything other than row by row updates.
In your situation, where your source data is comparatively lightweight, I would suggest that the most efficient approach would be to use SSIS to move the source data from the source server to the target server and drop it into a working/holding/intermediate table. SSIS is absolutely awesome at moving data from point A to point B. Then, after the Data Flow, use an Execute SQL task to either call an UPDATE stored procedure, or go ahead and write the UPDATE statement in the package.
Doing it that way off-loads the DML from the ETL server to the SQL Server, which is designed for exactly that kind of work. Sort of a "let everybody do what they're good at" approach, if you will.
OK so rather than trying to directly map the data from DB2 into DB1, it's probably a better idea to stage the data from DB2 into DB1 and then update the table of interest in DB1.
The best way to do this is to create a new table in DB1 that stores all of the data from the table in DB2. Let's call this table 'staging'. Use SSIS to do a flat insert from the table in DB2 to your new 'staging' table in DB1 and then create an UPDATE stored procedure in your DB1 database to update existing entries in your endpoint table based on the entries you now have in your 'staging' table. You can trigger the SP from SSIS to run after your staging table is populated. You can cut out the 'staging' table here if you have a synonym from DB1 that references the table in DB2.
SSIS is more about bulk movement of data than it is updating what already exists. Use stored procedures for anything in between.
Related
I have an SSIS job which is getting data from an excel file and updating a database table SampleTable".
for instance consider,I am updating the following fields from the table "SampleTable" :
field1
field2
LastUpdated
the field (LastUpdated) needs to be changed only if any change happens to the values of either of the fields(i.e. field1 and field2)
the logic for updation is written in a script component.
and fetching the record value from database and comparing it with the value from the excel will be a huge performance hit.
hence i m looking for a performance friendly solution so that only once I will fetch the data from the database table (before the script execution may be) and store it temporarily somewhere .
There is option where redirect updates to staging/temp table and only insert new records to original table. Then synchronize records between original and staging table by using SP/Task. It's just isolation of update and insert operations. But you will see gain in performance.
You can achieve same by Lookup transformation so that you can avail the facility of Cache Connection Manager.
This article will provide guideline on how to use Lookup tranformation for Update/Insert and this about MSDN cache connection manager.
Using the cache connection manager
Checking if a row exists and if it does, has it changed?
Have a look of this article as well for performance best practices.
I'm looking for the best practice to insert or update rows from a MySQL connection to a SQL Server connection.
First of all, I added a ADO.NET data source to grab MySQL content (a simple table Supplier with two fields id and name). Then, I added a Lookup transformation to split new rows / updated rows. It works well when I need to insert new rows. However, I would like to use a Command OLE DB to update existing rows but It doesn't work due to a incompatibility between my connection manager and the component (ADO.NET vs OLE DB).
Any idea to update modified rows ?! Should I use a cache component ?!
Thanks in advance !
Just get rid of the lookup and conditional split all together.
Outside of your SSIS package, build a staging table that contains the fields you need for inserts/updates.
In your SSIS Package, create a control flow that does the following:
Execute SQL Task to truncate the staging table.
Data Flow task to load the MySQL data from the source system to the staging table. If you can do this based on a "changes-only" type process, such as using a timestamp that you check, it would be faster.
Execute SQL Task to perform an UPDATE statement on your target table using the staging table joined to the target table.
Execute SQL Task to perform an INSERT statement on your target table using a query based on the target table and your staging table (with a WHERE NOT EXISTS or some such on a key fied)
I would change the SQL connection to use OLE DB. As well as allowing the OLE DB Command to work, you may also find the OLE DB Destination is faster.
I have an SSIS application that needs to get data from 2 databases of different servers (not link). I need to get the match names and DOB records between 2 database then use the results to insert/update a table.
My initial approach is to use OLE DB source then Merge Join and put the results to recordset. Then on controlflow, use the results of the recordset to insert/update a table. But I can't see the recordset at the control flow.
Alternative solution is to create temp tables. But the temp tables are not visible since they reside at the tempdb database of each servers.
What is a better approach for this problem?
what do you mean by put the results to recordset?
If you join two sources on the data flow using a join, that "recordset" on the join will only be available during the current dataflow. You cant use it on the control flow after the data flow is finisehd.
why cant you just insert the resultset on the destination DB? You can perform any other transform operation on the same data flow and insert the result on the destination database.
Or, if you really need to do something that can only be done on the control flow before insert the data, you can yes, insert the recordset on a temp table on the destination using a oleDBDestination and access in on another dataflow (not a very good approach, though)
In this case, I would keep a database around for work table or create a schema for those work tables.
Next, add a SQL control flow task that truncates the table that will hold the intermediate result. After this, load the intermediate result set into the table, do the operation and optionally, truncate the table again.
The recordset destination is fine for smaller datasets. But if you plan to use it for larger datasets that dont fit memory it will be very slow.
If you dont have a database/schema that can serve as a workspace, you could use RAW files to hold the intermediate result. Those are very fast too.
I'm getting data from an MSSQL DB ("A") and inserting into a MySQL DB ("B") using the date created in the MSSQL DB. I'm doing it with simple logics, but there's got to be a faster and more efficient way of doing this. Below is the sequence of logics involved:
Create one connection for MSSQL DB and one connection for MySQL DB.
Grab all of data from A that meet the date range criterion provided.
Check to see which of the data obtained are not present in B.
Insert these new data into B.
As you can imagine, step 2 is basically a loop, which can easily max out the time limit on the server, and I feel like there must be a way of doing this must faster and during when the first query is made. Can anyone point me to right direction to achieve this? Can you make "one" connection to both of the DBs and do something like below?
SELECT * FROM A.some_table_in_A.some_column WHERE
"it doesn't exist in" B.some_table_in_B.some_column
A linked server might suit this
A linked server allows for access to distributed, heterogeneous
queries against OLE DB data sources. After a linked server is created,
distributed queries can be run against this server, and queries can
join tables from more than one data source. If the linked server is
defined as an instance of SQL Server, remote stored procedures can be
executed.
Check out this HOWTO as well
If I understand your question right, you're just trying to move things in the MSSQL DB into the MySQL DB. I'm also assuming there is some sort of filter criteria you're using to do the migration. If this is correct, you might try using a stored procedure in MSSQL that can do the querying of the MySQL database with a distributed query. You can then use that stored procedure to do the loops or checks on the database side and the front end server will only need to make one connection.
If the MySQL database has a primary key defined, you can at least skip step 3 ("Check to see which of the data obtained are not present in B"). Use INSERT IGNORE INTO... and it will attempt to insert all the records, silently skipping over ones where a record with the primary key already exists.
Rather new to SSIS so not sure how to handle this.
I have a flat file which i managed to successfully read from. So right now my data flow consists of just a flat file source.
What i want to do is something like this:
Update SqlTable S
set s.columnA = f.columna
from FlatFile f
where s.columnID = f.columnID
Right now the only way i can see of doing this would be to insert the contents of the flat file into a sql table, then doing my update. This seems wasteful considering i don't need to save the data of the flat file. I just need to update an existing sql table based on the data in the flat file. So is there some way to run the query directly in the SSIS package instead of having to insert a bunch of data into a sql table that i will just wind up dropping?
thanks
Update SqlTable S set s.columnA = f.columna from FlatFile f where s.columnID = f.columnID
That statement above is a SQL statement. You cannot connect a sql table to a flat file. You need to work in SQL to do an update, since that is where the table lives
You have 2 choices:
Use an OLEDB Command component within the data flow. The downside is this calls the statement for each record, so if you have 1,000s of records it is very inefficient.
Push the records to a table using an OLE DB Destination and then you can call your update using an Execute SQL Task. You can then truncate the table if you like
A possible 3rd option is to roll your own OLE DB destination to do an update on record sets vs records.
While this might sound wasteful, to create a table in the database to store update records, it is done very often. You just drop the worktable or truncate when complete.
You could add an OLE DB Command component to the Data Flow that retrieves data from the flat file. The OLE DB Command would do a single row update for each record retrieved from the flat file. This might be okay if there are few rows in the flat file; but, you can imagine how bad performance will be if there are many rows in the flat file.
I think you'll find that sending the flat file rows to a database table and running a single UPDATE is going to be the best performer for lots of data.
I haven't tried this but have you tried sending to a recordset destination and then running the update using that?
The bulk load into a temporary table is the way to go and then do your updates from the temp table. As a previous poster says it is quite a common aproach to stuff data into a staging area prior to doing some more work with the data and then dropping or truncating the table