Merge two databases, without duplicates, with FKs references [duplicate] - mysql

I have two mdb files.
I can also convert it to MySQL database, if necessary.
How can I merge these two different dbs to a single one?
The idea is to get all info form both dbs and merge into one, without duplicating any client.
The problem is that both bds have the same clients, and different ones, but the PKs of the clients aren't the same on them.
Every line has a unique field, I guess it can help somehow.
Any idea of how can I do that?

Select a UNION all columns except the PKs will give you only distinct rows:
insert into new_table (<non-pk columns>)
select <non-pk columns> from tableA
union
select <non-pk columns> from tableB
Note: union removes duplicates.

I would run an UPDATE to populate one of the tables w/ all info available.
Assuming the first table has all names that the second table has (that there are no name values in table 2 that are not in table 1) you should be able to run the following update to make the first table complete:
update tclient1 t join (select name,
max(tel) as tel_filled,
max(address) as add_filled
from (select name, tel, address
from tclient1
union all
select name, tel, address
from tclient2) x
group by name) x on t.name = x.name
set t.tel = x.tel_filled and t.address = x.add_filled;
See fiddle: http://sqlfiddle.com/#!2/3e7dc/1/0

Disable foreign keys (see here)
Update FK in the 2nd DB so make them unique, for instance:
update Client
set id_client = id_client + 100000000;
update History
set id_client = id_client + 100000000,
id_history = id_history + 10000000;
Enable FKs to check integrity
Export 2nd DB as SQL-inserts and execute it in the 1st DB.
Use backups, please.

Here is one approach that assumes that name is the match between the two rows. It just counts the numbers that are filled in and chooses the appropriate source. This version uses union all with a comparison in the where using >= or <:
insert into client(id, name, tel, address)
select id, name, tel, address
from db1.client c1
where ((id is not null) + (tel is not null) + (address is not null)) >=
(select (id is not null) + (tel is not null) + (address is not null)
from db2.client c2
where c1.name = c2.name
)
)
union all
select id, name, tel, address
from db2.client c2
where ((id is not null) + (tel is not null) + (address is not null)) >
(select (id is not null) + (tel is not null) + (address is not null)
from db1.client c1
where c2.name = c1.name
)
);
Note: the above version assumes that name is in both tables (as in the example in your question) and there are no duplicates. It can be easily modified if this isn't the case.

Related

How to merge two databases, with same data, but with different PKs, without duplicated fields?

I have two mdb files.
I can also convert it to MySQL database, if necessary.
How can I merge these two different dbs to a single one?
The idea is to get all info form both dbs and merge into one, without duplicating any client.
The problem is that both bds have the same clients, and different ones, but the PKs of the clients aren't the same on them.
Every line has a unique field, I guess it can help somehow.
Any idea of how can I do that?
Select a UNION all columns except the PKs will give you only distinct rows:
insert into new_table (<non-pk columns>)
select <non-pk columns> from tableA
union
select <non-pk columns> from tableB
Note: union removes duplicates.
I would run an UPDATE to populate one of the tables w/ all info available.
Assuming the first table has all names that the second table has (that there are no name values in table 2 that are not in table 1) you should be able to run the following update to make the first table complete:
update tclient1 t join (select name,
max(tel) as tel_filled,
max(address) as add_filled
from (select name, tel, address
from tclient1
union all
select name, tel, address
from tclient2) x
group by name) x on t.name = x.name
set t.tel = x.tel_filled and t.address = x.add_filled;
See fiddle: http://sqlfiddle.com/#!2/3e7dc/1/0
Disable foreign keys (see here)
Update FK in the 2nd DB so make them unique, for instance:
update Client
set id_client = id_client + 100000000;
update History
set id_client = id_client + 100000000,
id_history = id_history + 10000000;
Enable FKs to check integrity
Export 2nd DB as SQL-inserts and execute it in the 1st DB.
Use backups, please.
Here is one approach that assumes that name is the match between the two rows. It just counts the numbers that are filled in and chooses the appropriate source. This version uses union all with a comparison in the where using >= or <:
insert into client(id, name, tel, address)
select id, name, tel, address
from db1.client c1
where ((id is not null) + (tel is not null) + (address is not null)) >=
(select (id is not null) + (tel is not null) + (address is not null)
from db2.client c2
where c1.name = c2.name
)
)
union all
select id, name, tel, address
from db2.client c2
where ((id is not null) + (tel is not null) + (address is not null)) >
(select (id is not null) + (tel is not null) + (address is not null)
from db1.client c1
where c2.name = c1.name
)
);
Note: the above version assumes that name is in both tables (as in the example in your question) and there are no duplicates. It can be easily modified if this isn't the case.

MySQL - INSERT multiple values conditionally

Ok, so I have a table which holds bets on games.
The table holds the columns: user_id, event_id, bet.
A user can send his/her (multiple) bets to the server in one request.
I need to insert multiple bets using one query, while checking that none of the bets
are on an event that already started/finished.
In case of at least 1 started/finished event, I don't really care if the whole query cancels, or just ignores the 'unqualified' bets.
Question
How can I insert multiple bets (rows) with one query, while conditioning the insert on a select statement (which checks for each of the events' statuses)?
Here is the query I would've used if it worked (and it doesn't of course):
INSERT INTO bet_on_event (user_id, event_id, bet)
VALUES (1,5,1), (1,6,2)
IF (SELECT COUNT(*) FROM events WHERE _id IN(5,6) AND status=0) = ?;
Explanation
1. As mentioned, the values are pre-made - requested by the user.
2. Games/events have status. 0 means a game hasn't started, so it's ok to bet.
3. The select statement just counts how many of the requested events have status 0.
4. The 'IF' should check if the count from (3) equals the number of events the user requested to bet on, thus confirming that all the events are ok to bet on.
The 'IF' should be replaced with something that work, and the whole statement can be replaced if you have a better idea for what I'm trying to achieve.
A simpler query (which isn't enough for my case, but works with 1 row) is:
INSERT INTO bet_on_event (user_id, event_id, bet)
SELECT 1,5,1 FROM dual
WHERE (SELECT COUNT(*) FROM events WHERE _id IN(5,6) AND status=0) = ?;
Any idea? Is this even possible? Betting is gonna be used a lot, so I want to do it as quick as possible - with 1 query.
Thank you so much.
EDIT
That is what I ended up doing, taken from Thorsten's answer (I changed it to a dynamically built query, as that is what I need):
var query='INSERT INTO bet_on_event (user_id, event_id, bet)';
for(var i=0; i<eventIds.length; i++){
query+= ' SELECT ' + userId + ',' + eventIds[i] + ',' + bets[i]
+ ' FROM dual WHERE EXISTS (SELECT * FROM events WHERE id = ' + eventIds[i]
+ ' AND Status = 0)';
if(i < eventIds.length - 1){
query += ' UNION ALL';
}
}
Where eventIds and bets are in a corresponding order (like a map)
EDIT 2
I also wanted to UPDATE the bets which already exist (in case the user wanted to...).
So there's a need to update each row with the relevant bet in the array. This is the solution:
ON DUPLICATE KEY UPDATE bet=VALUES(bet)
Just added (concatenated) to the end of the query...
Does this work for you? It inserts 1,5,1 if there is no event for id 5 that has started. Same for 1,6,1 and id 6.
INSERT INTO bet_on_event (user_id, event_id, bet)
SELECT 1,5,1 FROM dual WHERE NOT EXISTS
(SELECT * FROM events WHERE _id = 5 AND Status <> 0)
UNION ALL
SELECT 1,6,1 FROM dual WHERE NOT EXISTS
(SELECT * FROM events WHERE _id = 6 AND Status <> 0);
EDIT: If you don't want to insert anything in case one or more of the games have started, you can simply replace WHERE _id = 5 and WHERE _id = 6 with WHERE _id IN (5,6). Or have just one exists clause:
INSERT INTO bet_on_event (user_id, event_id, bet)
SELECT *
FROM
(
SELECT 1,5,1 FROM dual
UNION ALL
SELECT 1,6,1 FROM dual
) tmp
WHERE NOT EXISTS (SELECT * FROM events WHERE _id IN (5,6) AND Status <> 0);
have you tried with UNION ?
INSERT INTO bet_on_event (user_id, event_id, bet)
(SELECT 1,5,1 FROM dual
WHERE (SELECT COUNT(*) FROM events WHERE _id IN(5,6) AND status=0) = ?
UNION
SELECT 1,6,2 FROM dual
WHERE (SELECT COUNT(*) FROM events WHERE _id IN(5,6) AND status=0) = ? );

Using MERGE in SQL Server 2008

I just found out about this nifty little feature. I have a couple questions. consider the statement below.
This is how interpret how it works. The USING statement is what gets compared to see if there is a match correct? I want to use how it is now, but I want to use 2 other columns from the source table in the MATCH portion. I can't do that. So is there a way that I can use the 2 columns (decesed (I know its spelled wrong :) ) and hicno_enc)?
Another thing I would like to do and don't know if it possible, but if the row exists in target but not source, then mark it inactive.
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM
aco.tmpimport i
INNER JOIN aco.patients p
ON p.hicnoenc = i.hicno_enc
MERGE aco.patients AS target
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM aco.tmpimport
) AS source
ON target.hicnoenc = source.hicno_enc
WHEN MATCHED AND target.isdeceased <> CONVERT(BIT,source.decesed) THEN
UPDATE
SET
target.isdeceased = source.decesed,
updatedat = getdate(),
updatedby = 0
WHEN NOT MATCHED THEN
INSERT (firstname, lastname, gender, dob, isdeceased, hicnoenc)
VALUES (source.FIRST_NAME,
source.LAST_NAME,
source.sex1,
source.BIRTH_DATE,
source.decesed,
source.hicno_enc);
So is there a way that I can use the 2 columns (decesed (I know its
spelled wrong :) ) and hicno_enc)?
Add the columns you need in the select statement in the using clause.
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE, decesed, hicno_enc
FROM aco.tmpimport
) AS source
if the row exists in target but not source, then mark it inactive.
Add a when not matched by source clause and do the update.
WHEN NOT MATCHED BY SOURCE THEN
UPDATE
SET active = 0

finding duplicate rows on more than one field

I am using this query to find duplicates based on two fields:
SELECT
last_name,
first_name,
middle_initial,
COUNT(last_name) AS Duplicates,
IF(rec_id = '', 1, 0) AS has_REC_ID
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;
Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.
Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.
So for starters I though I would create a new column and do something like this:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
But MySQL returns: "You can't specify target table 'a' for update in from clause"
I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?
UPDATE: I also tried:
UPDATE files a
SET a.has_dup = 1
WHERE a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
);
...and got the same error message.
You could:
1) Create a holding table
2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""
3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""
4) Update the rows in the original table that have a matching first and last name and have rec_id = "".
5) Drop the holding table
So something like:
create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);
insert into temp (select firstname,lastname,rec_id from files where firstname = lastname and rec_id != '');
delete from files where firstname = lastname and rec_id != '';
update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname
and f.rec_id != '';
drop table temp;
From the documentation:
Currently, you cannot update a table and select from the same table in a subquery.
I can't think of a quick workaround to that.
Update
Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM
(SELECT * FROM files) -- new table target
b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
),
1, null);
I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)
UPDATE files
SET has_dup
= if(last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
EDITED: Another try:
UPDATE files f, (SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
) as duplicates
SET f.has_dup = 1
WHERE f.last_name = duplicates.last_name

sql query for deleting rows with NOT IN using 2 columns

I have a table with a composite key composed of 2 columns, say Name and ID. I have some service that gets me the keys (name, id combination) of the rows to keep, the rest i need to delete. If it was with only 1 row , I could use
delete from table_name where name not in (list_of_valid_names)
but how do I make the query so that I can say something like
name not in (valid_names) and id not in(valid_ids)
// this wont work since they separately dont identity a unique record or will it?
Use mysql's special "multiple value" in syntax:
delete from table_name
where (name, id) not in (select name, id from some_table where some_condition);
If your list is a literal list, you can still use this approach:
delete from table_name
where (name, id) not in (select 'john', 1 union select 'sally', 2);
Actually, no I retract my comment about needing special juice or being stuck with (AND OR'ing all your options).
Since you have a list of values of what you want to retain, dump that into a temporary table. Then do a delete against the base table for what does not exist in the temporary table (left outer join). I suck at mysql syntax or I'd cobble together your query. Psuedocode is approximate
DELETE
B
FROM
BASE B
LEFT OUTER JOIN
#RETAIN R
ON R.key1 = B.key1
AND R.key2 = B.key
WHERE
R.key1 IS NULL
The NOT EXISTS version:
DELETE
b
FROM
BaseTable b
WHERE
NOT EXISTS
( SELECT
*
FROM
RetainTable r
WHERE
(r.key1, r.key2) = (b.key1, b.key2)
)