Deleting duplicates from a database using MySQL - mysql

I am working in MySQL trying to insert a bunch of entries into a table. There are many semi-duplicates that I need to get rid of. I want to delete all of the semi-duplicates except for one. When I run the code below, it seems to be deleting everything that I don't want to be deleted by keeping only entries that have multiple semi-duplicates. How can I delete all but one of the semi-duplicates while also keeping the entries of everything else?
delete from t where (ColA, ColB, ColC) in
(select * from (select ColA, ColB, ColC from t
group by ColA, ColB, ColC having count(*) > 1) as t);

Well, i don't think you can solve this without using a where
But you can not operate a SELECT WHERE on a DELETE using the same table
Exemple from my database
delete from tb_test where idtb_test not in (
select idtb_test from tb_test T1
WHERE T1.title in (select title from tb_test T2 group by T2.title having count(*) =1)
)
This query search for non-duplicates rows on col title and delete all the rows except the selected rows
Mysql Error :
Error Code: 1093. You can't specify target table 'tb_test' for update
in FROM clause

If I understand correctly, you want to delete almost all rows which have a duplicate, and you don't care which one remains undeleted.
Also, as I understand - this table has primary key.
In that case, it can be done with this query:
delete test_table t3
where t3.id in (
select t1.id
from test_table t1,
(
select col_a, col_b, col_c
from test_table
group by col_a, col_b, col_c
having count(*) > 1
) s1
where t1.col_a = s1.col_a
and t1.col_b = s1.col_b
and t1.col_c = s1.col_c
and t1.id != (
select min(t2.id)
from test_table t2
where t1.col_a = t2.col_a
and t1.col_b = t2.col_b
and t1.col_c = t2.col_c
group by col_a, col_b, col_c
)
)
Basically, this query deletes all row duplicates except for those with minimal ID, which is PK in my case.
PS: This works in Oracle DB, can't test in MySQL right now.

Related

Update/Insert column values with another table's group by result in mysql

I have an empty table (t1) and I want to insert or update the t1.uid column from another table's (t2) GROUP BY uid values.
So far I have tried like this:
UPDATE table1 t1 JOIN
(SELECT uid FROM table2 GROUP BY uid) t2
SET t1.uid = t2.uid;
but it's not working for me.
N.B. I've got a massive data set for which group by (uid from table-t2) results giving me total 1114732 results which I have to insert/update in t1 table's uid column.
Please try this:
Insert into table1(uid)
select distinct uid from table2
If table1 is empty, then UPDATE is not the correct verb. Would this suit your needs?
INSERT into table1 SELECT distinct uid from table2;
INSERT ... SELECT docs

Merging two SQL Server tables conditionally into a third table

Clearly, I am not a SQL guy, so I have to ask for help on the following rather simple task.
I have two SQL Server 2008 tables: t1 and t2 with many identical columns and a key column (entry_ID). T2 has rows that do not exist in t1 but should.
I want to merge those rows from t2 that do not exist in t1 but I also do not want any rows from t2 that already exist in t1. I would like the result set to fill a new t3.
I have looked at many solutions online but can't find the solution to the above scenario.
Thank you.
There are a number of ways to do it you could use UNION ALL or OUTER JOIN.
Assuming you are using Entry_ID to find identical records, and Entry_ID is unique within each table, here is a OUTER JOIN method:
This gets you your recordset: T1 and T2 merged:
SELECT
CASE
WHEN T1.Entry_ID IS NULL THEN 'T2'
WHEN T2.Entry_ID IS NULL THEN 'T1'
ELSE 'Both'
END SourceTable,
COALESCE(T1.Entry_ID,T2.Entry_ID) As Entry_ID,
COALESCE(T1.Col1, T2.Col1) As Col1,
COALESCE(T1.Col2, T2.Col2) As Col2,
COALESCE(T1.Col3, T2.Col3) As Col3,
COALESCE(T1.Col4, T2.Col4) As Col4
FROM T1 FULL OUTER JOIN T2
ON T1.Entry_DI = T2.Entry_ID
ORDER BY COALESCE(T1.Entry_DI,T2.Entry_ID)
This inserts it into T3:
INSERT INTO T3 (Entry_ID,Col1, COl2,Col3,Col4)
SELECT
COALESCE(T1.Entry_DI,T2.Entry_ID) As Entry_ID,
COALESCE(T1.Col1, T2.Col1) As Col1,
COALESCE(T1.Col2, T2.Col2) As Col2,
COALESCE(T1.Col3, T2.Col3) As Col3,
COALESCE(T1.Col4, T2.Col4) As Col4
FROM T1 FULL OUTER JOIN T2
ON T1.Entry_DI = T2.Entry_ID
Again you must note that Entry_ID needs to be unique within their tables, and it uses this to match between the tables.
Also note the columns from the select line up with the column list in the insert statement - the order of the columns in the physical table doesn't matter, the INSERT and SELECT just have to line up.

Delete records from a table where < max number for a field and keep highest number

I know this sounds rather confusing but I'm at a loss how to explain it better. I have a table simplified below:
DB Type ID
================
Table1 1
Table1 2
Table1 3
Table1 4
Table1 5
Table2 6
Table2 7
Table2 8
Table2 9
Table2 10
what i am trying to achieve is to basically clean out this table but keep the record with the highest ID for each DB Type if that makes sense - so in this case it would be (Table1,5) and (Table2,10) with all other records being deleted. Is it possible to do this exclusively through MySQL?
*EDIT***
Answer thanks to tips from Yogendra Singh
DELETE FROM MyTable WHERE ID NOT IN (SELECT * FROM (SELECT MAX(ID) from MyTable GROUP BY DB Type) AS tb1 ) ORDER BY ID ASC
TRY selecting the max ID group by db_type first and then use it as sub query with not in.
DELETE FROM MyTable
WHERE ID NOT IN
(SELECT ID FROM
(SELECT MAX(ID) AS ID from MyTable GROUP BY DB Type) AS tb1
)
EDIT:
DELETE FROM MyTable
HAVING MAX(ID) > ID;
delete your_table
from
your_table left join
(select max(id) max_id from your_table group by type) mx
on your_table.id=mx.max_id
where mx.max_id is null
Subquery returns the maximum id for every type, and those are the values to keep. With an left join i'm selecting all the rows from your table that don't have an in in max_ids, and those are the rows to delete. This will work only if id is primary key, otherwise we have to join also the type.
Is the combination DB Type - ID unique?
If so, you can attack this in two stages:
Get only the rows you want
SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable
GROUP BY [DB Type]
Delete the rest (Wrapping the previous statement into a more complicated statement; don't mean that)
DELETE FROM YourTable
FROM
YourTable
LEFT JOIN
(SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable GROUP BY [DB Type]) DontDelete
ON
YourTable.[DB Type]=DontDelete.[DB Type] AND
YourTable.ID=DontDelete.MaxID
WHERE
DontDelete.[DB Type] IS NULL
DELETE FROM MyTable del
WHERE EXISTS (
(SELECT *
FROM MyTable xx
WHERE xx."db Type" = del."db Type"
AND xx.id > del.id
);
delete from my_Table
where Day in (select MAX(day) d from my_Table where id='id')

Copy rows from one table to another, ignoring duplicates

Im trying to copy row from table to another using 2 coluom only as the tow table schema is not identical ,
am getting this error
Operand should contain 1 column(s)
Any tips whats wrong with my statement ?
Insert table1 ( screenname,list_id )
Select screenname,list_id
From table2 As T1
Where Not Exists (
Select 1
From table1 As T2
Where
(T2.screenname = T1.screenname,T2.list_id = T1.list_id)
)
try to change where condition from (T2.screenname = T1.screenname,T2.list_id = T1.list_id) to (T2.screenname = T1.screenname AND T2.list_id = T1.list_id)
(note AND keyword instead of comma)
Did you try INSERT INTO...ON DUPLICATE KEY syntax?
See MySQL manual here
You can create a unique index in table1 on the columns screenname and list_id
Then use the following statement
Insert ignore into table1 ( screenname,list_id )
Select screenname,list_id
From table2 As T1
Also try this query -
INSERT INTO table1 (screenname, list_id)
SELECT screenname, list_id FROM table2 t2
LEFT JOIN table1 t1
ON t1.screenname = t2.screenname AND t1.list_id = t2.list_id
WHERE
t1.screenname IS NULL AND t1.list_id IS NULL;
Use simple INSERT IGNORE
INSERT table1 (screenname, list_id) SELECT screenname, list_id FROM table2

How to delete duplicate records in mysql database?

What's the best way to delete duplicate records in a mysql database using rails or mysql queries?
What you can do is copy the distinct records into a new table by:
select distinct * into NewTable from MyTable
Here's another idea in no particular language:
rs = `select a, b, count(*) as c from entries group by 1, 2 having c > 1`
rs.each do |a, b, c|
`delete from entries where a=#{a} and b=#{b} limit #{c - 1}`
end
Edit:
Kudos to Olaf for that "having" hint :)
well, if it's a small table, from rails console you can do
class ActiveRecord::Base
def non_id_attributes
atts = self.attributes
atts.delete('id')
atts
end
end
duplicate_groups = YourClass.find(:all).group_by { |element| element.non_id_attributes }.select{ |gr| gr.last.size > 1 }
redundant_elements = duplicate_groups.map { |group| group.last - [group.last.first] }.flatten
redundant_elements.each(&:destroy)
Check for Duplicate entries :
SELECT DISTINCT(req_field) AS field, COUNT(req_field) AS fieldCount FROM
table_name GROUP BY req_field HAVING fieldCount > 1
Remove Duplicate Queries :
DELETE FROM table_name
USING table_name, table_name AS vtable
WHERE
(table_name.id > vtable.id)
AND (table_name.req_field=req_field)
Replace req_field and table_name - should work without any issues.
New to SQL :-)
This is a classic question - often asked in interviews:-)
I don't know whether it'll work in MYSQL but it works in most databases -
> create table t(
> a char(2),
> b char(2),
> c smallint )
> select a,b,c,count(*) from t
> group by a,b,c
> having count(*) > 1
a b c
-- -- ------ -----------
(0 rows affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bb",1)
(1 row affected)
> insert into t values ("aa","bc",1)
(1 row affected)
> select a,b,c,count(*) from t group by a,b,c having count(*) > 1
a b c
-- -- ------ -----------
aa bb 1 2
(1 row affected)
If you have PK (id) in table (EMP) and want to older delete duplicate records with name column. For large data following query may be good approach.
DELETE t3
FROM (
SELECT t1.name, t1.id
FROM (
SELECT name
FROM EMP
GROUP BY name
HAVING COUNT(name) > 1
) AS t0 INNER JOIN EMP t1 ON t0.name = t1.name
) AS t2 INNER JOIN EMP t3 ON t3.name = t2.name
WHERE t2.id < t3.id;
suppose we have a table name tbl_product and there is duplicacy in the field p_pi_code and p_nats_id in maximum no of count then
first create a new table insert the data from existing table ...
ie from tbl_product to newtable1 if anything else then newtable1 to newtable2
CREATE TABLE `newtable2` (
`p_id` int(10) unsigned NOT NULL auto_increment,
`p_status` varchar(45) NOT NULL,
`p_pi_code` varchar(45) NOT NULL,
`p_nats_id` mediumint(8) unsigned NOT NULL,
`p_is_special` tinyint(4) NOT NULL,
PRIMARY KEY (`p_id`)
) ENGINE=InnoDB;
INSERT INTO newtable1 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM tbl_product group by p_pi_code;
INSERT INTO newtable2 (p_status, p_pi_code, p_nats_id, p_is_special) SELECT
p_status, p_pi_code, p_nats_id, p_is_special FROM newtable1 group by p_nats_id;
after that we see all the duplicacy in the field is removed
I had to do this recently on Oracle, but the steps would have been the same on MySQL. It was a lot of data, at least compared to what I'm used to working with, so my process to de-dup was comparatively heavyweight. I'm including it here in case someone else comes along with a similar problem.
My duplicate records had different IDs, different updated_at times, possibly different updated_by IDs, but all other columns the same. I wanted to keep the most recently updated of any duplicate set.
I used a combination of Rails logic and SQL to get it done.
Step one: run a rake script to identify the IDs of the duplicate records, using model logic. IDs go in a text file.
Step two: create a temporary table with one column, the IDs to delete, loaded from the text file.
Step three: create another temporary table with all the records I'm going to delete (just in case!).
CREATE TABLE temp_duplicate_models
AS (SELECT * FROM models
WHERE id IN (SELECT * FROM temp_duplicate_ids));
Step four: actual deleting.
DELETE FROM models WHERE id IN (SELECT * FROM temp_duplicate_ids);
You can use:
http://lenniedevilliers.blogspot.com/2008/10/weekly-code-find-duplicates-in-sql.html
to get the duplicates and then just delete them via Ruby code or SQL code (I would do it in SQL code but thats up to you :-)
If your table has a PK (or you can easily give it one), you can specify any number of columns in the table to be equal (to qualify is as a duplicate) with the following query (may be a bit messy looking but it works):
DELETE FROM table WHERE pk_id IN(
SELECT DISTINCT t3.pk_id FROM (
SELECT t1.* FROM table AS t1 INNER JOIN (
SELECT col1, col2, col3, col4, COUNT(*) FROM table
GROUP BY col1, col2, col3, col4 HAVING COUNT(*)>1) AS t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND
t1.col4 = t2.col4)
AS t3, (
SELECT t1.* FROM table AS t1 INNER JOIN (
SELECT col1, col2, col3, col4, COUNT(*) FROM table
GROUP BY col1, col2, col3, col4 HAVING COUNT(*)>1) AS t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND
t1.col4 = t2.col4)
AS t4
WHERE t3.col1 = t4.col1 AND t3.pk_id > t4.pk_id
)
This will leave the first record entered into the database, deleting the 'newest' duplicates. If you want to keep the last record, switch the > to <.
In MySql when I put something like
delete from A where IDA in (select IDA from A )
mySql said something like "you can't use the same table in the select part of the delete operation."
I've just have to delete some duplicate records, and I have succeeded with a .php program like that
<?php
...
$res = hacer_sql("SELECT MIN(IDESTUDIANTE) as IDTODELETE
FROM `estudiante` group by `LASTNAME`,`FIRSTNAME`,`CI`,`PHONE`
HAVING COUNT(*) > 1 )");
while ( $reg = mysql_fetch_assoc($res) ) {
hacer_sql("delete from estudiante where IDESTUDIANTE = {$reg['IDTODELETE']}");
}
?>
I am using Alter Table
ALTER IGNORE TABLE jos_city ADD UNIQUE INDEX(`city`);
I used #krukid's answer above to do the following on a table with around 70,000 entries:
rs = 'select a, b, count(*) as c from table group by 1, 2 having c > 1'
# get a hashmap
dups = MyModel.connection.select_all(rs)
# convert to array
dupsarr = dups.map { |i| [i.a, i.b, i.c] }
# delete dups
dupsarr.each do |a,b,c|
ActiveRecord::Base.connection.execute("delete from table_name where a=#{MyModel.sanitize(a)} and b=#{MyModel.sanitize(b)} limit #{c-1}")
end
Here is the rails solution I came up with. May not be the most efficient, but not a big deal if its a one time migration.
distinct_records = MyTable.all.group(:distinct_column_1, :distinct_column_2).map {|mt| mt.id}
duplicates = MyTable.all.to_a.reject!{|mt| distinct_records.include? mt.id}
duplicates.each(&:destroy)
First, groups by all columns that determine uniqueness, the example shows 2 but you could have more or less
Second, selects the inverse of that group...all other records
Third, Deletes all those records.
Firstly do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join.
You don't need to create the temporary table.
Delete duplicate except one record:
In this table it should have auto increment column.
The possible solution that I've just come across:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
if you want to keep the row with the lowest auto increment id value OR
DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
if you want to keep the row with the highest auto increment id value.
You can cross check your solution, find duplicate again:
SELECT * FROM `names` GROUP BY name, id having count(name) > 1;
If it return 0 result, then you query is successful.