I have a table with many columns in SQL Server and I have move part of data into MySQL. I made a view or function on the table in SQL Server and these two databases must be synced once a day through job. Because the data of this view may change every day.
View return a table with 3 columns: (char, varchar, varchar) that none of them are unique or primary key.
My solution is:
create a job
execute view on SQL Server
return result of view
create temp table with 3 column in MySQL
move result view from SQL Server to temp table
move records from temp table to new table one by one if not exist before
delete temp table
To transfer without using the temp table, I wanted to use below type of query but could not find the correct query. That's why I used the temp table:
insert into new_table
values (array of records) where record if not exist in new table.
And for the solution I mentioned above, I used the following query:
insert into new_table
select *
from temp_table
where not exist new_table.column = temp_table.column
Do you have a better suggestion that new records can be fetch and added to previous records?
It should look more like this:
insert into new_table
select *
from temp_table
where not exists (
select 1
from new_table
where new_table.column = temp_table.column
)
or maybe this:
insert into new_table
select *
from temp_table
where not exists (
select 1
from new_table
where new_table.column = temp_table.column
and new_table.column2 = temp_table.column2
and new_table.column3 = temp_table.column3
)
I want to update a MySQL table from matlab in bulk. The current logic that I use iterates over the array and inserts it one-by-one which takes way too long.
Here is my current implementation-
function update_table(customer_id_list, cluster_id_list, write_conn)
num_customers = size(customer_id_list, 1);
for idx=1:num_customers+1
customer_id = customer_id_list(idx);
cluster_id = cluster_id_list(idx);
sql = strcat(sql, 'UPDATE table SET cluster_id = ', num2str(cluster_id), ' WHERE customer_id = ', num2str(customer_id));
exec(write_conn, sql);
end
end
Tried to look for documentation to do bulk update/insert, but haven't found anything yet.
Do an "upjoin" using a temporary table.
Build your update specification as a Matlab table array with all the cluster_id and customer_id pairs that specify the new values.
Create a SQL temporary table that contains columns for the key columns you'll be matching on and the columns to update.
CREATE TEMPORARY TABLE my_temp_table SELECT customer_id, cluster_id FROM table WHERE 1 = 0
Batch-insert your update specification data from Matlab into the temporary table using Matlab Database Toolbox's datainsert or sqlwrite.
Update the target table en masse by joining it to the temp table: UPDATE table SET targ.cluster_id = upd.cluster_id FROM table targ INNER JOIN my_temp_table upd ON targ.customer_id = upd.customer_id.
Drop the temp table.
Boom. If you're going to do this a lot, wrap it up in a generic upjoin() function.
See the Matlab documentation for datainsert and sqlwrite. Do not use fastinsert; despite its name, it is much slower than datainsert and sqlwrite.
I'm migrating data from an old database to a new one in SSIS (2008 R2 Enterprise Edition). In the old database, I have a table called [Financial] and a column named: [Installments]. This column has a numeric value in it: 1, 2, 3 or 4. These are payments in installments. The old database only stores this numeric value and do not provide any more information about individual installments. The new database, however, provide more information of each installment, with columns like: [InstallmentPaid] (if the customer paid the installment), [DateInstallmentPaid] (when the customer paid the installment), [InstallmentNumber] (this is important to specify which installmentnumber it is. If the customer wants to pay in 4 installments, then 4 records will be created. 1 with InstallmentNr1, second with InstallmentNr2, third with InstallmentNr3 and fourth with InstallmentNr4.) and of course the [InstallmentPrice].
So the old database has the table [Financial] with the column [Installments]. The new database has the same [Financial] table, but instead of having a column called [Installments], it has a new relationship called [CustInstallments] ([CustInstallments] has FK FinancialID (1-to-many relationship)
So now that I'm migrating the data from the old database to the new one, I don't want to lose the information about the amount of installments. The following logic should be executed in SSIS in order to prevent information loss:
Foreach [Installments] in [Financial] from the old database, insert a
new [CustInstallment] referencing to the corresponding [FinancialID]
within the new database
So if in the old database the numeric value within [Installments] is 3, then I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) This ? should be 1 at the first insert, 2 and the 2nd and 3 at the 3rd. So I need some kind of a loop here? Is that even possible within the data flow of SSIS?
Below the visualization (figure) and description of my flow so far.
I select the old database source [Financial]
I convert the data so it matches the current database data types
Since I already migrated the old [Financial] database data to the new one, I can use the lookup on the FinancialID's in the new database, so the first variable ? of the INSERT query can be linked to the lookup output.
I split all the possibilities, like when the Installment contains NULL, 1, 2, 3 or 4.
The 5th step is what I'm looking for. Some clue, some direction towards something useful. When NumberOfInstallments is 1, I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) with the second ? variable as 1. When the NumberOfInstallments are 2, then I need to do two inserts, one with InstallmentNumber 1, and one with InstallmentNumber 2. When NumberOfInstallmentNumber is 3, then 3 inserts with a counting NumberOfInstallmentNumber. When 4, then four.
Is there any smart way to achieve this? Is there any built-in function available of SSIS that I am not aware of, and could be used here?
I appreciate any input here!
Thank you.
EDIT 10/02/2014
I have tried the following code:
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, X);
WITH nums AS(select top 4 row_number() over (order by (select 1)) AS id
from sys.columns
) SELECT f.* FROM CustInstallments f
JOIN nums n on f.InstallmentNumber>= n.id
But this query doesn't create X-amount of records, instead, the JOIN nums just replicates it X-amount of times, so I still can't track every installment individually.
I have written my own code - toke me a while since I never worked with TSQL before - and this works like a charm in SQL Server:
DECLARE #MyCounter tinyint;
SET #MyCounter = 1;
WHILE (SELECT COUNT(*) FROM CustInstallments WHERE FinancialID = #ID) < 4
BEGIN
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (#ID, #MyCounter)
IF (SELECT COUNT(*) FROM CustInstallments) > 4
BREAK
ELSE
SET #MyCounter = #MyCounter +1;
CONTINUE
END
Now in SSIS, I cannot change the #ID to a ?-variable, and use the lookup FinancialID, because as soon as I do, I get the following error:
Could anyone explain me why SSIS doesn't like this?
EDIT 10/02/2014
My last and least preferable option would be to use multicast to cast an insert query X amount of times, where each X is an OLE DB Command. For example, when there are 3 [Installments] in the old column, I would create a multicast with 3 OLE DB commands, with their SqlCommand:
OLE DB Command 1: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 1);
OLE DB Command 2: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 2);
OLE DB Command 3: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 3);
This is an ugly approach, but with the small amount of data I am using, perhaps it's not a big deal.
I would try to resolve this with TSQL in your source query. Do a join to some kind of numbers table like this:
create table #fininancial (id int not null identity(1,1), investments int);
go
insert into #fininancial (investments) values (1),(2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.* from #fininancial f
JOIN nums n on f.investments >= n.id
EDIT:
The above example is unclear - sorry about that. I was only presenting the concept of replicating the rows, but not completing the thought of how you will apply it. Try this out:
create table #fininancial (financialid int not null, investments int);
go
insert into #fininancial (financialid, investments) values (123, 1),(456, 2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.financialid, n.id as investments from #fininancial f
JOIN nums n on n.id <= f.investments
So for each financialid you will get multiple investments with different investment ids. This is a set-based way to handle the operation, which will perform better than a procedural method and will require less effort in ssis. Does that make more sense?
We are currently importing very large CSV files into a mySQL data warehouse. A key part of the processing is to flag whether a record in the CSV file match an existing record in the warehouse. The "match" is done by comparing specific fields in the new data against the previous version of the table. If the record is "new" or if there have been updates, we want to add it to the warehouse.
At the moment the processing plan is as follows :
~ read CSV file into mySQL table A
~ is primary key on A on old-A? If it isnt set record status to "NEW"
~ if key is on old-A, issue update statement , JOINING old-A to A
~ if A.field1 = old-A.field1 OR A.field2 = A.old-A.field2 OR A.field3 = old-A.field3 THEN flag record status as "UPDATE"
~ process NEW or UPDATEd records according to record status
File-size on A and old-A is currently in the order of 50M records. We would expect new records to be 1M, updates to be 5-10M.
Although we are currently using MYSQL for this processing, I am wondering whether it would simply be better to do this using a scripting language? We are finding in particular that the step to flag the updates is very time consuming. Essentially we have an UPDATE statement that is unable to use any indexation.
so
CREATE TABLE A (key1 bigint,
field1 varchar(50),
field2 varchar(50),
field 3 varchar(50) );
LOAD DATA ...
... add field rec_status to table A
... then
UPDATE A
LEFT JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'NEW'
WHERE old-A.key1 = NULL;
UPDATE A
JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'UPDATED'
WHERE A.field1 <> old-A.field1
OR A.field2 <> old-A.field2
OR A.field3 <> old-A.field3;
...
I will consider skipping the "flag" step. Process the CSV file using script or MySql table A using MySQL statement, select a record from old-A table base on whatever criteria, such as field1, or/and field2... of table A, if found, lock and update old-A record, delete processed record from CSV or table A. If not found, create record in old-A with data.
i want to create one ssis package which takes values from flat file and insert it into database table depending upon there companyname.
for example:
I have table fields:
Date SecurityId SecurityType EntryPrice Price CompanyName
2011-08-31 5033048 Bond 1.05 NULL ABC Corp
now i want to insert Price into this table but i need to match with CompanyName
and in that also in file CompanyName is like ABC so how can i checked that and insert only particular data...
like this i have 20 records in my file with different company names.
I DID LIKE THIS
in lookup i did
and now my problem is i need to check company name from flat file and insert that company price into table but in flat file company name is given like 'AK STL' ans in table it is like 'AK STEEL CORPORATION' so for this i have used column transformation but what expression i write to find match ...same with other company names only 1ft 2-3 charachters are there in flat file please help
Basically, you are looking to "Upsert" your data into the database. Here is a simple look up upsert example. With as few of records in your dataset as you have said, this method will suffice. With larger datasets, you probably want to look into using temp tables and using sql logic similar to this:
--Insert Portion
INSERT INTO FinalTable
( Colums )
SELECT T.TempColumns
FROM TempTable T
WHERE
(
SELECT 'Bam'
FROM FinalTable F
WHERE F.Key(s) = T.Key(s)
) IS NULL
--Update Portion
UPDATE FinalTable
SET NonKeyColumn(s) = T.TempNonKeyColumn(s)
FROM TempTable T
WHERE FinalTable.Key(s) = T.Key(s)
AND CHECKSUM(FinalTable.NonKeyColumn(s)) <> CHECKSUM(T.NonKeyColumn(s))