deleting duplicate records on mysql? - mysql

I have this mysql query that finds duplicates and the number of occurances for each topic:
SELECT name,
COUNT(name) AS NumOccurrences
FROM topics
GROUP BY name
HAVING ( COUNT(name) > 1 )
but what I want to do is delete all the duplicates that are found. I only want one unique name for each topic, and no duplicates!! thanks

DELETE t2
FROM topics t1
JOIN topics t2
ON t2.name = t1.name
AND t2.id < t1.id

I would copy all the unique entries to a new table:
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY unique_column_name;
Check the data, then delete your old table when you're sure everything's good and rename the new table to the old one.
Then make the name column unique so you won't have to do this again.
Cheers

Related

Delete with join not completing due to high number of rows

I have a table with approximately 550k rows. I'm trying to execute this query:
DELETE t1 FROM categories t1
INNER JOIN categories t2
WHERE t1.id < t2.id
AND t1.name = t2.name
AND t1.book_id = t2.book_id
Unfortunately the shell freezes up and I can tell by counting the rows in another shell that nothing is happening.
Is there any way to buffer this query, or solve this issue in another way?
Any help is appreciated.
If you need to delete a large number of rows, it is usually more efficient to move the rows you want to retain to another table, truncate the original table, then insert back into it:
-- move the rows we want to keep
create table t1_tmp as
select name, book_id, min(id) id from t1 group by name, book_id;
-- empty the table - back it up first!
truncate table t1;
-- insert back into the table
insert into t1 (id, name, book_id)
select id, name, book_id from t1_tmp;
-- drop the temporary table
drop table t1_tmp;
This index should help:
INDEX(name, book_id, id)

extracting extra records from mysql tables when there are no primary keys

I am trying to find all the records that are in t1 but not in t2. I know there are more records in t1 than in t2 because when I run
select count(*)
from t1;
select count(*)
from t2;
I get 21,500 records and 21,000 records respectively. But the problem is these tables are not normalized, there are no primary keys, therefore I cannot do something like this:
SELECT id FROM t1
where t1.id
not in (
SELECT t2.id
FROM t2
where t2.id is not null);
or this
SELECT t1.id, t2.id
FROM t1
LEFT JOIN t2
ON t1.id = t2.id
where t2.id is null
as both return null, as the id numbers match perfectly, there seems to be the same exact amount of ids. There must be another field which is not matching.
UPDATE
I ended up doing this:
select id, count(id)
from t1
group by id;
select id, count(id)
from t2
group by id
it gave the same amount of claim numbers and the count of times it shows up. I copied and pasted it into excel and just subtracted one count from the other and did a conditional formatting to only show the ones that are not zero and this gave me all the ids that showed up in one table more than the other. (Sloppy solution, but it was able to resolve the issue).
You have two problems. A bad database design and somehow bogus data is being inserted into your tables.
I don't know if this will work without indexes.
A left outer join should get you started (look up the syntax).
You should end up with something like:
t1.id t2.id
1 1
2 2
3 null
4 4
Try to fix the tables by adding a primary key on 'id' to both tables using MySQL:
ALTER TABLE t1 ADD PRIMARY KEY (id)
ALTER TABLE t2 ADD PRIMARY KEY (id)

SQL query to delete multiple rows

How we can delete multiple rows having columns from a db?
Suppose db has following data (id, list_name, user) and list_name has values as:
Owner-aaa
coowner-aaa
owner-aaa
subowner-aaa
How we can delete the rows having "Owner-aaa" and "owner-aaa" which are duplicates?
Can we add something in this query here:
delete from <table_name> where list_name = 'owner-aaa'
But it deletes only lower case list, I want something general which checks duplicates in small and caps and delete both of them?
Thanks in advance
Amy
DELETE FROM mytable WHERE LOWER(listname) IN
(SELECT LOWER(listname) FROM mytable
GROUP BY LOWER(listname)
HAVING COUNT(*) > 1)
delete from tableName where LOWER(list_name) = 'owner-aaa'
Meaby you can use LOWER/UPPER sql functions.
But are you sure your model is correct? It seem realy weird to have a name list like that. That should be another table NAMES with ID and NAME field. It's a 1-N relation.
I'm not entirely sure from your question whether you want to delete all rows where duplicates occur, or leave one, and remove only the true duplicates. So here's a shot at each:
To remove only the true duplicates:
DELETE FROM MyTable WHERE id IN
(
SELECT T1.id
FROM MyTable T1
INNER JOIN MyTable T2
ON UPPER(T1.list_name) = UPPER(T2.list_name)
AND T2.id <> T1.id
AND (T1.id <> (SELECT MAX(id) FROM MyTable WHERE UPPER(list_name) = UPPER(T1.list_name))
) DUPS
This presumes that the id field is unique to each record
To remove all records where there are duplicates, remove the two "AND" clauses in the subquery.
DELETE a
FROM list a
INNER JOIN list b ON LOWER(a.list_name)=LOWER(b.list_name)
WHERE a.id <> b.id

How to remove duplicate entries from a mysql db?

I have a table with some ids + titles. I want to make the title column unique, but it has over 600k records already, some of which are duplicates (sometimes several dozen times over).
How do I remove all duplicates, except one, so I can add a UNIQUE key to the title column after?
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.
CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
More specifically:
The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.
CREATE TABLE tempTableName LIKE tableName;
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
DROP TABLE tableName;
INSERT tableName SELECT * FROM tempTableName;
DROP TABLE tempTableName;
Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.
First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.
SELECT
s1.email
s1.id,
s1.created
s2.id,
s2.created
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
/* Emails are the same */
s1.email = s2.email AND
/* DON'T select both accounts,
only select the one created later.
The serial id could also be used here */
s2.created > s1.created
;
Next select only the unique duplicate ids:
SELECT
DISTINCT s2.id
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
;
Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.
DELETE FROM
student
WHERE
id
IN (
SELECT
DISTINCT s2.id
FROM
(SELECT * FROM student) AS s1
INNER JOIN
(SELECT * FROM student) AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
);
Then create the unique index:
ALTER TABLE
student
ADD UNIQUE INDEX
idx_student_unique_email(email)
;
Below query can be used to delete all the duplicate except the one row with lowest "id" field value
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name
In the similar way, we can keep the row with the highest value in 'id' as follows
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
This shows how to do it in SQL2000. I'm not completely familiar with MySQL syntax but I'm sure there's something comparable
create table #titles (iid int identity (1, 1), title varchar(200))
-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')
DELETE T FROM
#titles T left join
(
select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null
Select * FROM #titles
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
The solution posted by Nitin seems to be the most elegant / logical one.
However it has one issue:
ERROR 1093 (HY000): You can't specify target table 'student' for
update in FROM clause
This can however be resolved by using (SELECT * FROM student) instead of student:
DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
Give your +1's to Nitin for coming up with the original solution.
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way (with examples for different use cases).
In this case, something like this should work:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;

Select a record that has a duplicate

I'd like to select all records from a table (names) where lastname is not unique. Preferably I would like to delete all records that are duplicates.
How would this be done? Assume that I don't want to rerun one query multiple times until it quits.
To find which lastnames have duplicates:
SELECT lastname, COUNT(lastname) AS rowcount
FROM table
GROUP BY lastname
HAVING rowcount > 1
To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.
DELETE FROM table
WHERE id IN (SELECT id
FROM (SELECT * FROM table) AS t
GROUP BY lastname
HAVING COUNT(lastname) > 1)
The fastest and easiest way to delete duplicate records is my issuing a very simple command.
ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])
This will lock the table, if this is an issue, try:
delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster
dup How can I remove duplicate rows?
DELETE names
FROM names
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, lastname
FROM names
GROUP BY lastname
) as KeepRows ON
names.lastname = KeepRows.lastname
WHERE
KeepRows.RowId IS NULL
assumption: you have an RowId column
SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;
and then
DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1
but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?