How do I merge and delete existing primary key on new database? - mysql

I have 2 tables with same structures.
current_db = 521,892 rows and primary key is email
new_db = 575,992 rows and primary key is email
On new_db I want delete existing email address which already in current_db.
Question :
How do I compare & delete existing email on new_db and the results on new_db only be 54,100

Check the rows you want to delete -
SELECT n.* FROM new_db n
JOIN current_db c
ON n.email = c.email;
Delete them -
DELETE n
FROM new_db n
JOIN current_db c
ON n.email = c.email;

if indexes matches you can do it in one shot with something like:
delete from new_table where id in (select id from old_table)
Otherwise you have to modify the query according to your matching fields
delete from new_table where id in (select id from old_table where oldtable.field = newtable.field)
Of course you have to pay attention on what you actually delete, my suggestion is to delete in two pass (creating a temporary table and changing the delete into an insert into temp_table) and then check if effectively everythings is correct.
(Under Oracle there's the minus operator which shows the difference between two recordset, hopefully there's something similar in your database enviroment).
minus op works like:
select * from table_a
minus
select * from table_b
Starting with 2 identical tables, it gives only rows with different field values.
Is straightforward how to cross check with count(*) if everything is fine with the data.
Don't know if for your RDBMS exists something alike, but I think that a fast search on google may help.

DELETE n
FROM new_db n
WHERE EXISTS
( SELECT *
FROM current_db c
WHERE c.email = n.email
)

Related

Remove duplicate records in mysql

I have a table called leads with duplicate records
Leads:
*account_id
*campaign_id
I want to remove all the duplicate account_id where campaign_id equal to "51"
For example, if account_id = 1991 appears two times in the table then remove the one with campaign_id = "51" and keep the other one.
You could use a delete join:
DELETE t1
FROM yourTable t1
INNER JOIN yourTable t2
ON t2.account_id = t1.account_id AND
t2.campaign_id <> 51
WHERE
t1.campaign_id = 51;
There's no problem to delete from a table provided that:
You use the correct syntax.
You have done a backup of the table BEFORE you do any deleting.
However, I would suggest a different method:
Create a new table based on the existing table:
CREATE TABLE mytable_new LIKE mytable;
Add unique constraint (or PRIMARY KEY) on column(s) you don't want to have duplicates:
ALTER TABLE mytable_new ADD UNIQUE(column1,[column2]);
Note: if you want to identify a combination of two (or more) columns as unique, place all the column names in the UNIQUE() separated by comma. Maybe in your case, the constraint would be UNIQUE(account_id, campaign_id).
Insert data from original table to new table:
INSERT IGNORE INTO mytable_new SELECT * FROM mytable;
Note: the IGNORE will insert only non-duplicate values that match with the UNIQUE() constraint. If you have an app that runs a MySQL INSERT query to the table, you have to update the query by adding IGNORE.
Check data consistency and once you're satisfied, rename both tables:
RENAME TABLE mytable TO mytable_old;
RENAME TABLE mytable_new TO mytable;
The best thing about this is that in case that if you see anything wrong with the new table, you still have the original table.
Changing the name of the tables only take less than a second, the probable issue here is that it might take a while to do the INSERT IGNORE if you have a large data.
Demo fiddle
DELETE t1
FROM yourTable t1
INNER JOIN yourTable t2
ON t2.account_id = t1.account_id AND
t2.campaign_id <> 51
WHERE
t1.campaign_id = 51;

Delete Duplicates from large mysql Address DB

I know, deleting duplicates from mysql is often discussed here. But none of the solution work fine within my case.
So, I have a DB with Address Data nearly like this:
ID; Anrede; Vorname; Nachname; Strasse; Hausnummer; PLZ; Ort; Nummer_Art; Vorwahl; Rufnummer
ID is primary Key and unique.
And i have entrys for example like this:
1;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Mobile;012345;67890
2;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Fixed;045678;877656
The different PhoneNumber are not the problem, because they are not relevant for me. So i just want to delete the duplicates in Lastname, Street and Zipcode. In that case ID 1 or ID 2. Which one of both doesn't matter.
I tried it actually like this with delete:
DELETE db
FROM Import_Daten db,
Import_Daten dbl
WHERE db.id > dbl.id AND
db.Lastname = dbl.Lastname AND
db.Strasse = dbl.Strasse AND
db.PLZ = dbl.PLZ;
And insert into a copy table:
INSERT INTO Import_Daten_1
SELECT MIN(db.id),
db.Anrede,
db.Firstname,
db.Lastname,
db.Branche,
db.Strasse,
db.Hausnummer,
db.Ortsteil,
db.Land,
db.PLZ,
db.Ort,
db.Kontaktart,
db.Vorwahl,
db.Durchwahl
FROM Import_Daten db,
Import_Daten dbl
WHERE db.lastname = dbl.lastname AND
db.Strasse = dbl.Strasse And
db.PLZ = dbl.PLZ;
The complete table contains over 10Mio rows. The size is actually my problem. The mysql runs on a MAMP Server on a Macbook with 1,5GHZ and 4GB RAM. So not really fast. SQL Statements run in a phpmyadmin. Actually i have no other system possibilities.
You can write a stored procedure that will each time select a different chunk of data (for example by rownumber between two values) and delete only from that range. This way you will slowly bit by bit delete your duplicates
A more effective two table solution can look like following.
We can store only the data we really need to delete and only the fields that contain duplicate information.
Let's assume we are looking for duplicate data in Lastname , Branche, Haushummer fields.
Create table to hold the duplicate data
DROP TABLE data_to_delete;
Populate the table with data we need to delete ( I assume all fields have VARCHAR(255) type )
CREATE TABLE data_to_delete (
id BIGINT COMMENT 'this field will contain ID of row that we will not delete',
cnt INT,
Lastname VARCHAR(255),
Branche VARCHAR(255),
Hausnummer VARCHAR(255)
) AS SELECT
min(t1.id) AS id,
count(*) AS cnt,
t1.Lastname,
t1.Branche,
t1.Hausnummer
FROM Import_Daten AS t1
GROUP BY t1.Lastname, t1.Branche, t1.Hausnummer
HAVING count(*)>1 ;
Now let's delete duplicate data and leave only one record of all duplicate sets
DELETE Import_Daten
FROM Import_Daten LEFT JOIN data_to_delete
ON Import_Daten.Lastname=data_to_delete.Lastname
AND Import_Daten.Branche=data_to_delete.Branche
AND Import_Daten.Hausnummer = data_to_delete.Hausnummer
WHERE Import_Daten.id != data_to_delete.id;
DROP TABLE data_to_delete;
You can add a new column e.g. uq and make it UNIQUE.
ALTER TABLE Import_Daten
ADD COLUMN `uq` BINARY(16) NULL,
ADD UNIQUE INDEX `uq_UNIQUE` (`uq` ASC);
When this is done you can execute an UPDATE query like this
UPDATE IGNORE Import_Daten
SET
uq = UNHEX(
MD5(
CONCAT(
Import_Daten.Lastname,
Import_Daten.Street,
Import_Daten.Zipcode
)
)
)
WHERE
uq IS NULL;
Once all entries are updated and the query is executed again, all duplicates will have the uq field with a value=NULL and can be removed.
The result then is:
0 row(s) affected, 1 warning(s): 1062 Duplicate entry...
For newly added rows always create the uq hash and and consider using this as the primary key once all entries are unique.

Updating multiple rows with same value in mysql

I have a table t with columns id(primary key),a,b,c,d. assume that the columns id,a,b and c are already populated. I want to set column d =md5(concat(b,c)). Now the issue is that this table contains millions of records and the unique combination of b and c is only a few thousands. I want to save the time required for computing md5 of same values. Is there a way in which I can update multiple rows of this table with the same value without computing the md5 again, something like this:
update t set d=md5(concat(b,c)) group by b,c;
As group by does not work with update statement.
One method is a join:
update t join
(select md5(concat(b, c)) as val
from table t
group by b, c
) tt
on t.b = tt.b and t.c = tt.c
set d = val;
However, it is quite possible that any working with the data would take longer than the md5() function, so doing the update directly could be feasible.
EDIT:
Actually, updating the entire table is likely to take time, just for the updates and logging. I would suggest that you create another table entirely for the b/c/d values and join in the values when you need them.
Create a temp table:
CREATE TEMPORARY TABLE IF NOT EXISTS tmpTable
AS (SELECT b, c, md5(concat(b, c)) as d FROM t group by b, c)
Update initial table:
UPDATE t orig
JOIN tmpTable tmp ON orig.b = tmp.b AND orig.c = tmp.c
SET orig.d = tmp.d
Drop the temp table:
DROP TABLE tmpTable

Why would a SQL MERGE have a duplicate key error, even with HOLDLOCK declared?

There is a lot of information that I could find on SQL Merge, but I can't seem to get this working for me. Here's what's happening.
Each day I'll be getting an Excel file uploaded to a web server with a few thousand records, each record containing 180 columns. These records contain both new information which would have to use INSERT, and updated information which will have to use UPDATE. To get the information to the database, I'm using C# to do a Bulk Copy to a temp SQL 2008 table. My plan was to then perform a Merge to get the information into the live table. The temp table doesn't have a Primary Key set, but the live table does. In the end, this is how my Merge statement would look:
MERGE Table1 WITH (HOLDLOCK) AS t1
USING (SELECT * FROM Table2) AS t2
ON t1.id = t2.id
WHEN MATCHED THEN
UPDATE SET (t1.col1=t2.col1,t1.col2=t2.col2,...t1.colx=t2.colx)
WHEN NOT MATCHED BY TARGET THEN
INSERT (col1,col2,...colx)
VALUES(t2.col1,t2.col2,...t2.colx);
Even when including the HOLDLOCK, I still get the error Cannot insert duplicate key in object. From what I've read online, HOLDLOCK should allow SQL to read primary keys, but not perform any insert or update until after the task has been executed. I'm basically learning how to use MERGE on the fly, but is there something I have to enable for SQL 2008 to pick up on MERGE Locks?
I found a way around the problem and wanted to post the answer here, in case it helps anyone else. It looks like MERGE wouldn't work for what I needed since the temporary table being used had duplicate records that would be used as a Primary Key in the live table. The solution I came up with was to create the below stored procedure.
-- Start with insert
INSERT INTO LiveTable(A, B, C, D, id)
(
-- Filter rows to get unique id
SELECT A, B, C, D, id FROM(
SELECT A, B, C, D, id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS row_number
FROM TempTable
WHERE NOT EXISTS(
SELECT id FROM LiveTable WHERE LiveTable.id = TempTable.id)
) AS ROWS
WHERE row_number = 1
)
-- Continue with Update
-- Covers skipped id's during insert
UPDATE tb_TestMLS
SET
LiveTable.A = T.A,
LiveTable.B = T.B,
LiveTable.C = T.C,
LiveTable.D = T.D
FROM LiveTable L
INNER JOIN
TempTable T
ON
L.id= T.id

Delete statement in a same table

I need to query a delete statement for the same table based on column conditions from the same table for a correlated subquery.
I can't directly run a delete statement and check a condition for the same table in mysql for a correlated subquery.
I want to know whether using temp table will affect mysql's memory/performance?
Any help will be highly appreciated.
Thanks.
You can make mysql do the temp table for you by wrapping your "where" query as an inline from table.
This original query will give you the dreaded "You can't specify target table for update in FROM clause":
DELETE FROM sametable
WHERE id IN (
SELECT id FROM sametable WHERE stuff=true
)
Rewriting it to use inline temp becomes...
DELETE FROM sametable
WHERE id IN (
SELECT implicitTemp.id from (SELECT id FROM sametable WHERE stuff=true) implicitTemp
)
Your question is really not clear, but I would guess you have a correlated subquery and you're having trouble doing a SELECT from the same table that is locked by the DELETE. For instance to delete all but the most recent revision of a document:
DELETE FROM document_revisions d1 WHERE edit_date <
(SELECT MAX(edit_date) FROM document_revisions d2
WHERE d2.document_id = d1.document_id);
This is a problem for MySQL.
Many examples of these types of problems can be solved using MySQL multi-table delete syntax:
DELETE d1 FROM document_revisions d1 JOIN document_revisions d2
ON d1.document_id = d2.document_id AND d1.edit_date < d2.edit_date;
But these solutions are best designed on a case-by-case basis, so if you edit your question and be more specific about the problem you're trying to solve, perhaps we can help you.
In other cases you may be right, using a temp table is the simplest solution.
can't directly run a delete statement and check a condition for the same table
Sure you can. If you want to delete from table1 while checking the condition that col1 = 'somevalue', you could do this:
DELETE
FROM table1
WHERE col1 = 'somevalue'
EDIT
To delete using a correlated subquery, please see the following example:
create table project (id int);
create table emp_project (id int, project_id int);
insert into project values (1);
insert into project values (2);
insert into emp_project values (100, 1);
insert into emp_project values (200, 1);
/* Delete any project record that doesn't have associated emp_project records */
DELETE
FROM project
WHERE NOT EXISTS
(SELECT *
FROM emp_project e
WHERE e.project_id = project.id);
/* project 2 doesn't have any emp_project records, so it was deleted, now
we have 1 project record remaining */
SELECT * FROM project;
Result:
id
1
Create a temp table with the values you want to delete, then join it to the table while deleting. In this example I have a table "Games" with an ID column. I will delete ids greater than 3. I will gather the targets in a temp table first so I can report on them later.
DECLARE #DeletedRows TABLE (ID int)
insert
#DeletedRows
(ID)
select
ID
from
Games
where
ID > 3
DELETE
Games
from
Games g
join
#DeletedRows x
on x.ID = g.ID
I have used group by aggregate with having clause and same table, where the query was like
DELETE
FROM TableName
WHERE id in
(select implicitTable.id
FROM (
SELECT id
FROM `TableName`
GROUP by id
HAVING count(id)>1
) as implicitTable
)
You mean something like:
DELETE FROM table WHERE someColumn = "someValue";
?
This is definitely possible, read about the DELETE syntax in the reference manual.
You can delete from same table. Delete statement is as follows
DELETE FROM table_name
WHERE some_column=some_value