SSIS amount of insert operations based on record value - ssis

I'm migrating data from an old database to a new one in SSIS (2008 R2 Enterprise Edition). In the old database, I have a table called [Financial] and a column named: [Installments]. This column has a numeric value in it: 1, 2, 3 or 4. These are payments in installments. The old database only stores this numeric value and do not provide any more information about individual installments. The new database, however, provide more information of each installment, with columns like: [InstallmentPaid] (if the customer paid the installment), [DateInstallmentPaid] (when the customer paid the installment), [InstallmentNumber] (this is important to specify which installmentnumber it is. If the customer wants to pay in 4 installments, then 4 records will be created. 1 with InstallmentNr1, second with InstallmentNr2, third with InstallmentNr3 and fourth with InstallmentNr4.) and of course the [InstallmentPrice].
So the old database has the table [Financial] with the column [Installments]. The new database has the same [Financial] table, but instead of having a column called [Installments], it has a new relationship called [CustInstallments] ([CustInstallments] has FK FinancialID (1-to-many relationship)
So now that I'm migrating the data from the old database to the new one, I don't want to lose the information about the amount of installments. The following logic should be executed in SSIS in order to prevent information loss:
Foreach [Installments] in [Financial] from the old database, insert a
new [CustInstallment] referencing to the corresponding [FinancialID]
within the new database
So if in the old database the numeric value within [Installments] is 3, then I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) This ? should be 1 at the first insert, 2 and the 2nd and 3 at the 3rd. So I need some kind of a loop here? Is that even possible within the data flow of SSIS?
Below the visualization (figure) and description of my flow so far.
I select the old database source [Financial]
I convert the data so it matches the current database data types
Since I already migrated the old [Financial] database data to the new one, I can use the lookup on the FinancialID's in the new database, so the first variable ? of the INSERT query can be linked to the lookup output.
I split all the possibilities, like when the Installment contains NULL, 1, 2, 3 or 4.
The 5th step is what I'm looking for. Some clue, some direction towards something useful. When NumberOfInstallments is 1, I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) with the second ? variable as 1. When the NumberOfInstallments are 2, then I need to do two inserts, one with InstallmentNumber 1, and one with InstallmentNumber 2. When NumberOfInstallmentNumber is 3, then 3 inserts with a counting NumberOfInstallmentNumber. When 4, then four.
Is there any smart way to achieve this? Is there any built-in function available of SSIS that I am not aware of, and could be used here?
I appreciate any input here!
Thank you.
EDIT 10/02/2014
I have tried the following code:
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, X);
WITH nums AS(select top 4 row_number() over (order by (select 1)) AS id
from sys.columns
) SELECT f.* FROM CustInstallments f
JOIN nums n on f.InstallmentNumber>= n.id
But this query doesn't create X-amount of records, instead, the JOIN nums just replicates it X-amount of times, so I still can't track every installment individually.
I have written my own code - toke me a while since I never worked with TSQL before - and this works like a charm in SQL Server:
DECLARE #MyCounter tinyint;
SET #MyCounter = 1;
WHILE (SELECT COUNT(*) FROM CustInstallments WHERE FinancialID = #ID) < 4
BEGIN
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (#ID, #MyCounter)
IF (SELECT COUNT(*) FROM CustInstallments) > 4
BREAK
ELSE
SET #MyCounter = #MyCounter +1;
CONTINUE
END
Now in SSIS, I cannot change the #ID to a ?-variable, and use the lookup FinancialID, because as soon as I do, I get the following error:
Could anyone explain me why SSIS doesn't like this?
EDIT 10/02/2014
My last and least preferable option would be to use multicast to cast an insert query X amount of times, where each X is an OLE DB Command. For example, when there are 3 [Installments] in the old column, I would create a multicast with 3 OLE DB commands, with their SqlCommand:
OLE DB Command 1: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 1);
OLE DB Command 2: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 2);
OLE DB Command 3: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 3);
This is an ugly approach, but with the small amount of data I am using, perhaps it's not a big deal.

I would try to resolve this with TSQL in your source query. Do a join to some kind of numbers table like this:
create table #fininancial (id int not null identity(1,1), investments int);
go
insert into #fininancial (investments) values (1),(2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.* from #fininancial f
JOIN nums n on f.investments >= n.id
EDIT:
The above example is unclear - sorry about that. I was only presenting the concept of replicating the rows, but not completing the thought of how you will apply it. Try this out:
create table #fininancial (financialid int not null, investments int);
go
insert into #fininancial (financialid, investments) values (123, 1),(456, 2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.financialid, n.id as investments from #fininancial f
JOIN nums n on n.id <= f.investments
So for each financialid you will get multiple investments with different investment ids. This is a set-based way to handle the operation, which will perform better than a procedural method and will require less effort in ssis. Does that make more sense?

Related

mysql - how to insert multiple records and delete remaining ones

Lets start with an example. There's a table:
CREATE TABLE X (
id INT(32) NOT NULL auto_increment,
account INT(32),
value INT(32),
primary KEY (id)
)
What I want to achieve is to ensure that given account has specified set of values (and only these values). Basically, how to do below efficiently:
delete from X where account = 5;
insert into X (account, value) values
(5, 0),
(5, 2),
(5, 3)
...
(5, 99);
The caveat is that usually just one value changes (appears or disappears).
Pretty much I have a set of values which changes, but instead of receiving deltas I'm getting whole set I need to efficiently reflect the difference. There will not be more than 100 values for given account at any time and usually there will be just 2-3. The changes happen thousands times per second. Changes to the same account are rare (usually just several, but could be more occasionally).
What I thought about is to set id account*1000+sequence_id to increase data locality for rows with the same account. Also instead of delete+reinsert I can do (pseudocode):
$current_values = select value from X where account = 5;
$to_add = $new_values not in $current_values
$to_remove = $current_values not in $new_values
delete from X where value in $to_remove
insert into X (account, value) values (5, $to_add[0]), (5, $to_add[1])...
How can I do it better?
Add INDEX(account) to X.
Collect the incoming information in a temporary table. tmp.
DELETE X FROM X, ( SELECT DISTINCT account FROM tmp ) y WHERE x.account=y.account; [Check the syntax, and test before going into production.]
INSERT INTO X (account, value) SELECT (account, value) FROM tmp;
If you are using InnoDB , then I recommend BEGIN; DELETE...; INSERT...; COMMIT;. This will keep other connections from finding rows deleted that are about to be re-inserted.
If that is too slow; let's talk further.
An aside: INT(32) -- the "(32)" means nothing. An INT is a 4-byte integer regardless of the value after it.
use a not exists thus leaving the existing records alone and onyl removing those you didn't provide in your list..
Delete from X t1 where not exists
(Select 1 from X t2 where t1.account=t2.account and t1.value=t2.value
and value in (0,2,3,...99))
now if you have a way of identifying 0,2,3...999 without having to pass it in, all the better as you may run into a limit using an in clause.
This is all contained in one statement making it less vulnerable to problems with multiple users playing with the same account.

How to ignore duplicate keys when extracting data using OPENQUERY while joining two tables?

I am trying to insert records into MySQL database from a MS SQL Server using the "OPENQUERY" but what I am trying to do is ignore the duplicate keys messages. so when the query run into a duplicate then ignore it and keep going.
What ideas can I do to ignore the duplicates?
Here is what I am doing:
pulling records from MySQL using "OpenQuery" to define MySQL "A.record_id"
Joining those records to records in MS SQL Server "with a specific criteria and not direct id" from here I find a new related "B.new_id" record identifier in SQL Server.
I want to insert the found results into a new table in MySQL like so A.record_id, B.new_id Here in the new table I have A.record_id set as a primary key for that table.
The problem is that when joining table A to Table B some times I find 2+ records into table B matching the criteria that I am looking for which causes the value A.record_id to 2+ times in my data set before inserting that into table A which causes the problem. Note I can use aggregate function to eliminate the records.
I don't think there is a specific option. But it is easy enough to do:
insert into oldtable(. . .)
select . . .
from newtable
where not exists (select 1 from oldtable where oldtable.id = newtable.id)
If there is more than one set of unique keys, you can add additional not exists statements.
EDIT:
For the revised problem:
insert into oldtable(. . .)
select . . .
from (select nt.*, row_number() over (partition by id order by (select null)) as seqnum
from newtable nt
) nt
where seqnum = 1 and
not exists (select 1 from oldtable where oldtable.id = nt.id);
The row_number() function assigns a sequential number to each row within a group of rows. The group is defined by the partition by statement. The numbers start at 1 and increment from there. The order by clause says that you don't care about the order. Exactly one row with each id will have a value of 1. Duplicate rows will have a value larger than one. The seqnum = 1 chooses exactly one row per id.
If you are on SQL Server 2008+, you can use MERGE to do an INSERT if row does not exist, or an UPDATE.
Example:
MERGE
INTO dataValue dv
USING tmp_holding_DataValue t
ON t.dateStamp = dv.dateStamp
AND t.itemId = dv.itemId
WHEN NOT MATCHED THEN
INSERT (dateStamp, itemId, value)
VALUES (dateStamp, itemId, value)

Why would a SQL MERGE have a duplicate key error, even with HOLDLOCK declared?

There is a lot of information that I could find on SQL Merge, but I can't seem to get this working for me. Here's what's happening.
Each day I'll be getting an Excel file uploaded to a web server with a few thousand records, each record containing 180 columns. These records contain both new information which would have to use INSERT, and updated information which will have to use UPDATE. To get the information to the database, I'm using C# to do a Bulk Copy to a temp SQL 2008 table. My plan was to then perform a Merge to get the information into the live table. The temp table doesn't have a Primary Key set, but the live table does. In the end, this is how my Merge statement would look:
MERGE Table1 WITH (HOLDLOCK) AS t1
USING (SELECT * FROM Table2) AS t2
ON t1.id = t2.id
WHEN MATCHED THEN
UPDATE SET (t1.col1=t2.col1,t1.col2=t2.col2,...t1.colx=t2.colx)
WHEN NOT MATCHED BY TARGET THEN
INSERT (col1,col2,...colx)
VALUES(t2.col1,t2.col2,...t2.colx);
Even when including the HOLDLOCK, I still get the error Cannot insert duplicate key in object. From what I've read online, HOLDLOCK should allow SQL to read primary keys, but not perform any insert or update until after the task has been executed. I'm basically learning how to use MERGE on the fly, but is there something I have to enable for SQL 2008 to pick up on MERGE Locks?
I found a way around the problem and wanted to post the answer here, in case it helps anyone else. It looks like MERGE wouldn't work for what I needed since the temporary table being used had duplicate records that would be used as a Primary Key in the live table. The solution I came up with was to create the below stored procedure.
-- Start with insert
INSERT INTO LiveTable(A, B, C, D, id)
(
-- Filter rows to get unique id
SELECT A, B, C, D, id FROM(
SELECT A, B, C, D, id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS row_number
FROM TempTable
WHERE NOT EXISTS(
SELECT id FROM LiveTable WHERE LiveTable.id = TempTable.id)
) AS ROWS
WHERE row_number = 1
)
-- Continue with Update
-- Covers skipped id's during insert
UPDATE tb_TestMLS
SET
LiveTable.A = T.A,
LiveTable.B = T.B,
LiveTable.C = T.C,
LiveTable.D = T.D
FROM LiveTable L
INNER JOIN
TempTable T
ON
L.id= T.id

Inserting rows to other tables while importing in SSIS

I have a transactions table in a flat file like
ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
and target transaction table will have
ItemID,CustomerID,Qty,Price,TotalValue
Now I have to import it into the transactions table using SSIS package
But before importing ItemID and CustomerID I should look into the lookup tables ItemMaster and CustomerMaster, if not there, then I have insert new tuples into the tables and take the new itemID or customerID and import the transaction to the transactions table. It can be done using lookup transformations in SSIS.
Or is it better to import transactions into a temporary table using a SSIS package ,update new ItemIDs and customer IDs in the temporary table and then insert transactions from the temp table to the main transactions table
Which option will be better from the performance wise ?
There are several ways of doing it .
1.Using Staging Table
2.Using Lookup
3.Transforming the stored procedure logic in SSIS
1.Using Staging Table
Dump all the flat file data into a staging table .Lets name it as StgTransaction.Create a procedure to perform the tasks .
Merge ItemMaster target
using StgTransaction src
on target.ItemID = src.ItemID
WHEN NOT MATCHED THEN
INSERT (ItemName)
values (src.ItemID);
Merge CustomerMaster target
using Trans src
on target.CustomerID = Src.CustomerID
WHEN NOT MATCHED THEN
INSERT (CustomerName)
values (src.CustomerID);
with cte(ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue) as
(
Select I.ItemID,I.ItemName,
C.CustomerID,C.CustomerName,
f.Qty,f.price,f.TotalValue
from ItemMaster I inner join Trans f
on I.ItemName = f.ItemName
inner join CustomerMaster c
on c.CustomerName = f.CustomerName
)
Insert into Transactions
Select ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
from cte
Basically I'm inserting all the missing values into the 2 master tables using Merge Syntax .Instead of Merge you can use NOT EXISTS
Insert into ItemMaster
Select ItemName from stgTransaction s
where not exists
(Select 1 from ItemMaster im
where im.ItemName = s.ItemName
);
Once the missing values are inserted then just join the staging table with the 2 master tables and insert it into target .
Wrap the above query into a procedure and call the procedure after the Data Flow Task (Which loads the Data from flat file to staging table)
2.Using Lookup
The package design will look like
You should go with this approach if you are not allowed to create staging table in your database . This will be slow because of blocking components (Union ALL) and OlEDB command(problem with RBAR (row by agonizing row) issue)
Steps :-
1.Use lookup with ItemMaster table
2.Create a ItemID column (name it as NewItemID) using Derived transformation which will store the new ItemID generated from ItemMaster table when the data is loaded .join Lookup with Derived Transformation using No Match Output
3.The No Matched values should be inserted into ItemMaster table.For this lets create a procedure which inserts the data and retrieves the ItemID value as an Output
ALTER PROCEDURE usp_InsertMaster
#ItemName AS varchar(20),
#id AS INT OUTPUT AS
INSERT INTO ItemMaster
(ItemName)
VALUES
(#ItemName)
SET #id = SCOPE_IDENTITY()
//If your using ID as Identity value else use Output clause to retrieve the ID
3.Call this procedure in OLEDB command and map the output with the column created in Derived transformation
After the OLEDB command using Union ALL to combine the rows from matched and No Matched values and then again follow the same procedure with the CustomerMaster table
3.Last option is Transforming procedure logic in SSIS
Package Design is
1.Load the data into staging
2.Use Merge or Not Exists and load the missing values in 2 Master tables using Execute SQL Task
3.Use Data Flow Task with source as Staging and 2 lookups with the master tables .Since all the missing values are already inserted into Master tables ,so there wont be any Lookup No match Output. Just connect the Lookup Match output with Oledb Destination (Transaction Table)
IMHO i think the 1st approach will be fast . The problem arises only because there are 2 master tables which needs to be updated along with that get the inserted ID's and load it into target table.So doing it synchronously is difficult .

MySQL INSERT IF (custom if statements)

First, here's the concise summary of the question:
Is it possible to run an INSERT statement conditionally?
Something akin to this:
IF(expression) INSERT...
Now, I know I can do this with a stored procedure.
My question is: can I do this in my query?
Now, why would I want to do that?
Let's assume we have the following 2 tables:
products: id, qty_on_hand
orders: id, product_id, qty
Now, let's say an order for 20 Voodoo Dolls (product id 2) comes in.
We first check if there's enough Quantity On Hand:
SELECT IF(
( SELECT SUM(qty) FROM orders WHERE product_id = 2 ) + 20
<=
( SELECT qty_on_hand FROM products WHERE id = 2)
, 'true', 'false');
Then, if it evaluates to true, we run an INSERT query.
So far so good.
However, there's a problem with concurrency.
If 2 orders come in at the exact same time, they might both read the quantity-on-hand before any one of them has entered the order.
They'll then both place the order, thus exceeding the qty_on_hand.
So, back to the root of the question:
Is it possible to run an INSERT statement conditionally, so that we can combine both these queries into one?
I searched around a lot, and the only type of conditional INSERT statement that I could find was ON DUPLICATE KEY, which obviously does not apply here.
INSERT INTO TABLE
SELECT value_for_column1, value_for_column2, ...
FROM wherever
WHERE your_special_condition
If no rows are returned from the select (because your special condition is false) no insert happens.
Using your schema from question (assuming your id column is auto_increment):
insert into orders (product_id, qty)
select 2, 20
where (SELECT qty_on_hand FROM products WHERE id = 2) > 20;
This will insert no rows if there's not enough stock on hand, otherwise it will create the order row.
Nice idea btw!
Try:
INSERT INTO orders(product_id, qty)
SELECT 2, 20 FROM products WHERE id = 2 AND qty_on_hand >= 20
If a product with id equal to 2 exists and the qty_on_hand is greater or equal to 20 for this product, then an insert will occur with the values product_id = 2, and qty = 20. Otherwise, no insert will occur.
Note: If your product ids are note unique, you might want to add a LIMIT clause at the end of the SELECT statement.
Not sure about concurrency, you'll need to read up on locking in mysql, but this will let you be sure that you only take 20 items if 20 items are available:
update products
set qty_on_hand = qty_on_hand - 20
where qty_on_hand >= 20
and id=2
You can then check how many rows were affected. If none were affected, you did not have enough stock. If 1 row was affected, you have effectively consumed the stock.
You're probably solving the problem the wrong way.
If you're afraid two read-operations will occur at the same time and thus one will work with stale data, the solution is to use locks or transactions.
Have the query do this:
lock table for read
read table
update table
release lock
I wanted to insert into a table using values so I found this solution to insert the values using the IF condition
DELIMITER $$
CREATE PROCEDURE insertIssue()
BEGIN
IF (1 NOT IN (select I.issue_number from issue as I where I.series_id = 1)) THEN
INSERT IGNORE INTO issue ( issue_number, month_published, year_published, series_id, mcs_issue_id) VALUES (1, 1, 1990, 1, 1);
END IF;
END$$
DELIMITER ;
If you later on want to call the procedure it's as simple as
CALL insertIssue()
You can find more information about PROCEDURES and if conditions in this site