SSIS- Load destination Table with new column - ssis

We have one table (S1) from database1 and are loading data into another database (datbase2) and table (D1). We implemented this in ssis using OLEDB source (database1.S1) and OLEDB Destination (datbase2.D1).
We have to add a new column 'Addeddate' to destination table. For this we have used a Derived column in between source and destination.
Now my Thought is, instead of using a derived column can we create the added column in the source itself? because we need just record loaded date.

Yes, if you use a SQL Query in your Source (and you should) all you have to do is add the column to the query.
Something like,
SELECT
S1.Column1, S1.Column2, ... S1.ColumnN, GETDATE() AS RecordLoadedDate
FROM S1
WHERE ...

Related

Update a table (that has relationships) using another table in SSIS

I want to be able to update a specific column of a table using data from another table. Here's what the two tables look like, the DB type and SSIS components used to get the tables data (btw, both ID and Code are unique).
Table1(ID, Code, Description) [T-SQL DB accessed using ADO NET Source component]
Table2(..., Code, Description,...) [MySQL DB accessed using ODBC Source component]
I want to update the column Table1.Description using the Table2.Description by matching them with the right Code first (because Table1.Code is the same as Table2.Code).
What i tried:
Doing a Merge Join transformation using the Code column but I couldn't figure out how to reinsert the table because since Table1 has relationships i can't simply drop the table and replace it with the new one
Using a Lookup transformation but since both tables are not the same type it didn't allow me to create the lookup table's connection manager (which would be for in my case MySQL)
I'm still new to SSIS but any ideas or help would be greatly appreciated
My solution is based on #Akina's comments. Although using a linked server would've definitely fit, my requirement is to make an SSIS package to take care of migrating some old data.
The first and last are SQL tasks, while the Migrate ICDDx is the DFT that transfers the data to a staging table created during the first SQL task.
Here's the SQL commands that gets executed during Create Staging Table :
DROP TABLE IF EXISTS [tempdb].[##stagedICDDx];
CREATE TABLE ##stagedICDDx (
ID INT NOT NULL,
Code VARCHAR(15) NOT NULL,
Description NVARCHAR(500) NOT NULL,
........
);
and here's the sql command (based on #Akina's comment) for transferring from staged to final (inside Transfer Staged):
UPDATE [MyDB].[dbo].[ICDDx]
SET [ICDDx].[Description] = [##stagedICDDx].[Description]
FROM [dbo].[##stagedICDDx]
WHERE [ICDDx].[Code]=[##stagedICDDx].[Code]
GO
Here's the DFT used (both TSQL and MySQL sources return sorted output using ORDER BY Code, so i didnt have to insert Sort components before the Merge Join) :
Note: Btw, you have to setup the connection manager to retain/reuse the same connection so that the temporary table doesn't get deleted before we transfer data to it. If all goes well, then after the Transfer Staged SQL Task, the connection would be closed and the global temporary table would be deleted.

how to upsert data from mysql to ssms using ssis

how can we insert new data or update the data from one table to another table from MySQL to SQL server using ssis and by not using lookup.
A common way to do this is to insert new data to an empty temporary table, and then run SQL Merge command (using separate SQL Query task).
MERGE command is super powerful and can do updates, inserts or even deletes. See full description of Merge here:
https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-2017
The design for this will look like below :
You will have 4 tables and 1 view : Source, TMP_Dest (exactly as source with no PK), CHG_Dest(for changes, exactly as destination with no PK), Dest(will have PK), FV_TMP_Dest (this is in case the destination looks different than the source - different field types)
SSIS package :
1.Use ExecuteSQLTask and truncate TMP_Dest because it is just temporary for the extracted data
Use ExecuteSQlTask and truncate CHG_Dest because it is just temporary for the extracted data
Use one DataFlowTask for loading data from Source to TMP_Dest
Define two variables OperationIDInsert=1 and OperationIDUpdate=2 (the values are not important, you can set them as you want) -> you will use them at 5. point below
Use another DataFlowTask in which you will have:
on the left side OLE DB Source in which you will extract data from the view, ordered by PK (do not forget to set the SortKeyPosition from Advanced Editor for the PK fields)
on the right side OLE DB Source in which you will extract data from the Dest ordered by PK (do not forget to set the SortKeyPosition from Advanced Editor for the PK fields)
LEFT JOIN between this
on the left side ( "insert side") you will have: a derived column in which you will assign as Expression the OperationIDInsert variable AND an OLE DB Destination for inserting the data in CHG_Dest table. In this way, you will insert the data that have to be inserted in the destination table and you know this because you have the OperationIDInsert column.
on the right side you will do the same thing but using OperationIDUpdate column
You will use ExecuteSQLTask in the ControlFlow and will have an SQL Merge. Based on the PK fields and OperationIDInsert/OperationIDUpdate fields you will either insert the data or update it.
Hope this will help you. Let me know if you need additional info.

How to specify column mapping in AWS Data pipeline?

I am using AWS data pipeline to copy data from RedShift to MySql in RDS. The data is copied to MySQL. In the pipeline the insert query is specified as below:
insert into test_Employee(firstname,lastname,email,salary) values(?,?,?,?);
Is there any way for me to specify the column name for the source table in place of ? in the above query? I tries adding the column names for the source table but that does not seem to work. Currently the column names in both the source and destination table are same.
Thanks for your time. Let me know if any other information is required.
Specifying columns instead of ? wouldn't work because insert SQL queries know nothing about your source datasource. AWS Copy activity just passes the parameters to this query in the same order you selected them from the source dataset.
However column names of destination table (test_Employee) in insert query don't have to match the order of columns specified in the DDL, so you can change this query to match the order of columns in the source table.
E.g if your source dataset has following columns:
email,first_name,last_name,salary
Insert query:
insert into test_Employee(email,firstname,lastname,salary) values(?,?,?,?);
Note. As you can see the column names of the source and destination table don't have to match.

Truncate table in SSIS

I have a simple data flow in SSIS (defined in visual studio 2013), which uses SQL to extract data from one sql server instance's table A to then add it to another SQL server instance's table B.
What is the best practice pattern to truncate the data in table B? A truncate statement like this:
TRUNCATE TABLE B
after the select statement for table A - especially when you have have a fairly big table to 'transmit'?
One thing I have done in cases like that is to create two copies of the same table and then a view that points to one or the other that has the name of the current table.
The SSIS package then determines which table is in use and sets the connection for the table to populate to the other table.
Then an exec SQl task truncates the table not currently in use. You may also want to drop any indexes at this point.
Then a dataflow populates the table not currently in use.
Then recreate any indexes you dropped.
Finally an exec SQL task drops and creates the view to use the table you just populated instead of the other one.
Total down time of the table being referenced? Generally less than a second for the drop and create view no matter how long it takes to populate the table.

Creating centralized DB

I have 2 databases(A) with same name in different servers( B & C). Both the databases have same schema. (sql server 2008 r2)
Task 1: Copy(transfer) both the databases into 3rd server (D) with the names (A_B and A_C).
Task 2: Merge both the databases into one database(A_D). (I don't know how will I handle keys)
Task 3: On daily basis I have to get data from servers B & C and put in centralized server D.
Any help would be appreciated.
Thanks.
Ritesh
Here are a few ideas:
Task 1: Transfer databases by doing a backup an restore to server D.
Task 2: I think this will involve ETL processes and creating new surrogate keys in database A_D. Keep keys from original source in a data source id column. I think a MERGE statement would be helpful.
Task 3: Leverage logic in Task 2
Update for Task 2:
Say a source Table1 in database A and B has an key column named Table1_ID. In database A_D add columns Table1_SourceID and Table1_Source. Populate Table1_SourceID with the key from source database, and use Table1_Source to indicate the source database.
Use Table1_ID as the key for Table1, and is unique to database A_D. This will account for collisions for key columns in the source databases. Also, you can track the row to the source database.
Task 1: Create destination Databases with no structures. I'd use tasks -> export function on the source databases with create structures option in SSMS. After export you will have exact copies in destination.
Task 2: In each table of A_D create a new key column (SurKey). It has to be a combination of values which will give unique values in the whole table. E.g. source table abbreviation + PK column + date.
For each table create two Data Flows in SSIS Package, which will load data from A_B and A_C. Put a Derived Column component, which will add a new column - SurKey.
In A_B DataFlow put the A_B as an abbreviation, A_C in the second one.
Task 3: Use Data Flows you created. Script a Job in SSMS, add it to the daily plan.