Inserting rows to other tables while importing in SSIS - sql-server-2008

I have a transactions table in a flat file like
ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
and target transaction table will have
ItemID,CustomerID,Qty,Price,TotalValue
Now I have to import it into the transactions table using SSIS package
But before importing ItemID and CustomerID I should look into the lookup tables ItemMaster and CustomerMaster, if not there, then I have insert new tuples into the tables and take the new itemID or customerID and import the transaction to the transactions table. It can be done using lookup transformations in SSIS.
Or is it better to import transactions into a temporary table using a SSIS package ,update new ItemIDs and customer IDs in the temporary table and then insert transactions from the temp table to the main transactions table
Which option will be better from the performance wise ?

There are several ways of doing it .
1.Using Staging Table
2.Using Lookup
3.Transforming the stored procedure logic in SSIS
1.Using Staging Table
Dump all the flat file data into a staging table .Lets name it as StgTransaction.Create a procedure to perform the tasks .
Merge ItemMaster target
using StgTransaction src
on target.ItemID = src.ItemID
WHEN NOT MATCHED THEN
INSERT (ItemName)
values (src.ItemID);
Merge CustomerMaster target
using Trans src
on target.CustomerID = Src.CustomerID
WHEN NOT MATCHED THEN
INSERT (CustomerName)
values (src.CustomerID);
with cte(ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue) as
(
Select I.ItemID,I.ItemName,
C.CustomerID,C.CustomerName,
f.Qty,f.price,f.TotalValue
from ItemMaster I inner join Trans f
on I.ItemName = f.ItemName
inner join CustomerMaster c
on c.CustomerName = f.CustomerName
)
Insert into Transactions
Select ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
from cte
Basically I'm inserting all the missing values into the 2 master tables using Merge Syntax .Instead of Merge you can use NOT EXISTS
Insert into ItemMaster
Select ItemName from stgTransaction s
where not exists
(Select 1 from ItemMaster im
where im.ItemName = s.ItemName
);
Once the missing values are inserted then just join the staging table with the 2 master tables and insert it into target .
Wrap the above query into a procedure and call the procedure after the Data Flow Task (Which loads the Data from flat file to staging table)
2.Using Lookup
The package design will look like
You should go with this approach if you are not allowed to create staging table in your database . This will be slow because of blocking components (Union ALL) and OlEDB command(problem with RBAR (row by agonizing row) issue)
Steps :-
1.Use lookup with ItemMaster table
2.Create a ItemID column (name it as NewItemID) using Derived transformation which will store the new ItemID generated from ItemMaster table when the data is loaded .join Lookup with Derived Transformation using No Match Output
3.The No Matched values should be inserted into ItemMaster table.For this lets create a procedure which inserts the data and retrieves the ItemID value as an Output
ALTER PROCEDURE usp_InsertMaster
#ItemName AS varchar(20),
#id AS INT OUTPUT AS
INSERT INTO ItemMaster
(ItemName)
VALUES
(#ItemName)
SET #id = SCOPE_IDENTITY()
//If your using ID as Identity value else use Output clause to retrieve the ID
3.Call this procedure in OLEDB command and map the output with the column created in Derived transformation
After the OLEDB command using Union ALL to combine the rows from matched and No Matched values and then again follow the same procedure with the CustomerMaster table
3.Last option is Transforming procedure logic in SSIS
Package Design is
1.Load the data into staging
2.Use Merge or Not Exists and load the missing values in 2 Master tables using Execute SQL Task
3.Use Data Flow Task with source as Staging and 2 lookups with the master tables .Since all the missing values are already inserted into Master tables ,so there wont be any Lookup No match Output. Just connect the Lookup Match output with Oledb Destination (Transaction Table)
IMHO i think the 1st approach will be fast . The problem arises only because there are 2 master tables which needs to be updated along with that get the inserted ID's and load it into target table.So doing it synchronously is difficult .

Related

Sync result of SQL Server view and table MySQL

I have a table with many columns in SQL Server and I have move part of data into MySQL. I made a view or function on the table in SQL Server and these two databases must be synced once a day through job. Because the data of this view may change every day.
View return a table with 3 columns: (char, varchar, varchar) that none of them are unique or primary key.
My solution is:
create a job
execute view on SQL Server
return result of view
create temp table with 3 column in MySQL
move result view from SQL Server to temp table
move records from temp table to new table one by one if not exist before
delete temp table
To transfer without using the temp table, I wanted to use below type of query but could not find the correct query. That's why I used the temp table:
insert into new_table
values (array of records) where record if not exist in new table.
And for the solution I mentioned above, I used the following query:
insert into new_table
select *
from temp_table
where not exist new_table.column = temp_table.column
Do you have a better suggestion that new records can be fetch and added to previous records?
It should look more like this:
insert into new_table
select *
from temp_table
where not exists (
select 1
from new_table
where new_table.column = temp_table.column
)
or maybe this:
insert into new_table
select *
from temp_table
where not exists (
select 1
from new_table
where new_table.column = temp_table.column
and new_table.column2 = temp_table.column2
and new_table.column3 = temp_table.column3
)

Matlab Bulk Update MySQL Table

I want to update a MySQL table from matlab in bulk. The current logic that I use iterates over the array and inserts it one-by-one which takes way too long.
Here is my current implementation-
function update_table(customer_id_list, cluster_id_list, write_conn)
num_customers = size(customer_id_list, 1);
for idx=1:num_customers+1
customer_id = customer_id_list(idx);
cluster_id = cluster_id_list(idx);
sql = strcat(sql, 'UPDATE table SET cluster_id = ', num2str(cluster_id), ' WHERE customer_id = ', num2str(customer_id));
exec(write_conn, sql);
end
end
Tried to look for documentation to do bulk update/insert, but haven't found anything yet.
Do an "upjoin" using a temporary table.
Build your update specification as a Matlab table array with all the cluster_id and customer_id pairs that specify the new values.
Create a SQL temporary table that contains columns for the key columns you'll be matching on and the columns to update.
CREATE TEMPORARY TABLE my_temp_table SELECT customer_id, cluster_id FROM table WHERE 1 = 0
Batch-insert your update specification data from Matlab into the temporary table using Matlab Database Toolbox's datainsert or sqlwrite.
Update the target table en masse by joining it to the temp table: UPDATE table SET targ.cluster_id = upd.cluster_id FROM table targ INNER JOIN my_temp_table upd ON targ.customer_id = upd.customer_id.
Drop the temp table.
Boom. If you're going to do this a lot, wrap it up in a generic upjoin() function.
See the Matlab documentation for datainsert and sqlwrite. Do not use fastinsert; despite its name, it is much slower than datainsert and sqlwrite.

SSIS amount of insert operations based on record value

I'm migrating data from an old database to a new one in SSIS (2008 R2 Enterprise Edition). In the old database, I have a table called [Financial] and a column named: [Installments]. This column has a numeric value in it: 1, 2, 3 or 4. These are payments in installments. The old database only stores this numeric value and do not provide any more information about individual installments. The new database, however, provide more information of each installment, with columns like: [InstallmentPaid] (if the customer paid the installment), [DateInstallmentPaid] (when the customer paid the installment), [InstallmentNumber] (this is important to specify which installmentnumber it is. If the customer wants to pay in 4 installments, then 4 records will be created. 1 with InstallmentNr1, second with InstallmentNr2, third with InstallmentNr3 and fourth with InstallmentNr4.) and of course the [InstallmentPrice].
So the old database has the table [Financial] with the column [Installments]. The new database has the same [Financial] table, but instead of having a column called [Installments], it has a new relationship called [CustInstallments] ([CustInstallments] has FK FinancialID (1-to-many relationship)
So now that I'm migrating the data from the old database to the new one, I don't want to lose the information about the amount of installments. The following logic should be executed in SSIS in order to prevent information loss:
Foreach [Installments] in [Financial] from the old database, insert a
new [CustInstallment] referencing to the corresponding [FinancialID]
within the new database
So if in the old database the numeric value within [Installments] is 3, then I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) This ? should be 1 at the first insert, 2 and the 2nd and 3 at the 3rd. So I need some kind of a loop here? Is that even possible within the data flow of SSIS?
Below the visualization (figure) and description of my flow so far.
I select the old database source [Financial]
I convert the data so it matches the current database data types
Since I already migrated the old [Financial] database data to the new one, I can use the lookup on the FinancialID's in the new database, so the first variable ? of the INSERT query can be linked to the lookup output.
I split all the possibilities, like when the Installment contains NULL, 1, 2, 3 or 4.
The 5th step is what I'm looking for. Some clue, some direction towards something useful. When NumberOfInstallments is 1, I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) with the second ? variable as 1. When the NumberOfInstallments are 2, then I need to do two inserts, one with InstallmentNumber 1, and one with InstallmentNumber 2. When NumberOfInstallmentNumber is 3, then 3 inserts with a counting NumberOfInstallmentNumber. When 4, then four.
Is there any smart way to achieve this? Is there any built-in function available of SSIS that I am not aware of, and could be used here?
I appreciate any input here!
Thank you.
EDIT 10/02/2014
I have tried the following code:
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, X);
WITH nums AS(select top 4 row_number() over (order by (select 1)) AS id
from sys.columns
) SELECT f.* FROM CustInstallments f
JOIN nums n on f.InstallmentNumber>= n.id
But this query doesn't create X-amount of records, instead, the JOIN nums just replicates it X-amount of times, so I still can't track every installment individually.
I have written my own code - toke me a while since I never worked with TSQL before - and this works like a charm in SQL Server:
DECLARE #MyCounter tinyint;
SET #MyCounter = 1;
WHILE (SELECT COUNT(*) FROM CustInstallments WHERE FinancialID = #ID) < 4
BEGIN
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (#ID, #MyCounter)
IF (SELECT COUNT(*) FROM CustInstallments) > 4
BREAK
ELSE
SET #MyCounter = #MyCounter +1;
CONTINUE
END
Now in SSIS, I cannot change the #ID to a ?-variable, and use the lookup FinancialID, because as soon as I do, I get the following error:
Could anyone explain me why SSIS doesn't like this?
EDIT 10/02/2014
My last and least preferable option would be to use multicast to cast an insert query X amount of times, where each X is an OLE DB Command. For example, when there are 3 [Installments] in the old column, I would create a multicast with 3 OLE DB commands, with their SqlCommand:
OLE DB Command 1: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 1);
OLE DB Command 2: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 2);
OLE DB Command 3: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 3);
This is an ugly approach, but with the small amount of data I am using, perhaps it's not a big deal.
I would try to resolve this with TSQL in your source query. Do a join to some kind of numbers table like this:
create table #fininancial (id int not null identity(1,1), investments int);
go
insert into #fininancial (investments) values (1),(2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.* from #fininancial f
JOIN nums n on f.investments >= n.id
EDIT:
The above example is unclear - sorry about that. I was only presenting the concept of replicating the rows, but not completing the thought of how you will apply it. Try this out:
create table #fininancial (financialid int not null, investments int);
go
insert into #fininancial (financialid, investments) values (123, 1),(456, 2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.financialid, n.id as investments from #fininancial f
JOIN nums n on n.id <= f.investments
So for each financialid you will get multiple investments with different investment ids. This is a set-based way to handle the operation, which will perform better than a procedural method and will require less effort in ssis. Does that make more sense?

Flagging records on large mysql file

We are currently importing very large CSV files into a mySQL data warehouse. A key part of the processing is to flag whether a record in the CSV file match an existing record in the warehouse. The "match" is done by comparing specific fields in the new data against the previous version of the table. If the record is "new" or if there have been updates, we want to add it to the warehouse.
At the moment the processing plan is as follows :
~ read CSV file into mySQL table A
~ is primary key on A on old-A? If it isnt set record status to "NEW"
~ if key is on old-A, issue update statement , JOINING old-A to A
~ if A.field1 = old-A.field1 OR A.field2 = A.old-A.field2 OR A.field3 = old-A.field3 THEN flag record status as "UPDATE"
~ process NEW or UPDATEd records according to record status
File-size on A and old-A is currently in the order of 50M records. We would expect new records to be 1M, updates to be 5-10M.
Although we are currently using MYSQL for this processing, I am wondering whether it would simply be better to do this using a scripting language? We are finding in particular that the step to flag the updates is very time consuming. Essentially we have an UPDATE statement that is unable to use any indexation.
so
CREATE TABLE A (key1 bigint,
field1 varchar(50),
field2 varchar(50),
field 3 varchar(50) );
LOAD DATA ...
... add field rec_status to table A
... then
UPDATE A
LEFT JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'NEW'
WHERE old-A.key1 = NULL;
UPDATE A
JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'UPDATED'
WHERE A.field1 <> old-A.field1
OR A.field2 <> old-A.field2
OR A.field3 <> old-A.field3;
...
I will consider skipping the "flag" step. Process the CSV file using script or MySql table A using MySQL statement, select a record from old-A table base on whatever criteria, such as field1, or/and field2... of table A, if found, lock and update old-A record, delete processed record from CSV or table A. If not found, create record in old-A with data.

How do I check condition while mapping in SSIS?

i want to create one ssis package which takes values from flat file and insert it into database table depending upon there companyname.
for example:
I have table fields:
Date SecurityId SecurityType EntryPrice Price CompanyName
2011-08-31 5033048 Bond 1.05 NULL ABC Corp
now i want to insert Price into this table but i need to match with CompanyName
and in that also in file CompanyName is like ABC so how can i checked that and insert only particular data...
like this i have 20 records in my file with different company names.
I DID LIKE THIS
in lookup i did
and now my problem is i need to check company name from flat file and insert that company price into table but in flat file company name is given like 'AK STL' ans in table it is like 'AK STEEL CORPORATION' so for this i have used column transformation but what expression i write to find match ...same with other company names only 1ft 2-3 charachters are there in flat file please help
Basically, you are looking to "Upsert" your data into the database. Here is a simple look up upsert example. With as few of records in your dataset as you have said, this method will suffice. With larger datasets, you probably want to look into using temp tables and using sql logic similar to this:
--Insert Portion
INSERT INTO FinalTable
( Colums )
SELECT T.TempColumns
FROM TempTable T
WHERE
(
SELECT 'Bam'
FROM FinalTable F
WHERE F.Key(s) = T.Key(s)
) IS NULL
--Update Portion
UPDATE FinalTable
SET NonKeyColumn(s) = T.TempNonKeyColumn(s)
FROM TempTable T
WHERE FinalTable.Key(s) = T.Key(s)
AND CHECKSUM(FinalTable.NonKeyColumn(s)) <> CHECKSUM(T.NonKeyColumn(s))