How to remove duplicate rows in MySQL? - mysql

I tried to remove duplicate rows from a table TT
here is my query
delete t1
from TT t1
, TT t2
where t1.id < t2.id
and t1.url = t2.url
Here id is the primary key and url has the unique key in the table TT. You must be wondering why there are duplicate rows with unique index?
Actually it did happen and I don't know why but right now I want to remove the duplicate rows first. I am able to run the query in phpmyadmin but no duplicate rows are deleted at all(There is duplicate rows in the Table TT).
What could be the reason? Thanks!

You can use ROW_NUMBER() to remove duplicate
;WITH cte AS (
SELECT *
, ROW_NUMBER OVER(PARTITION BY url ORDER BY url) AS rn
FROM TT
)
DELETE FROM cte
WHERE rn > 1

Related

Deleting duplicate values from a mysql table but keep one

I'm trying to delete duplicate rows from a mysql table, but still keep one.
However the following query seemingly deletes every duplicate row and I'm not sure why. Basically I want to delete the row if the outputID, title and type all matches.
DELETE DupRows.*
FROM output AS DupRows
INNER JOIN (
SELECT MIN(Output_ID) AS Output_ID, Title, Type
FROM output
GROUP BY Title, Type
HAVING COUNT(*) > 1
) AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND SaveRows.Output_ID = DupRows.Output_ID;
Just :
DELETE DupRows
FROM output AS DupRows
INNER JOIN output AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND DupRows.Output_ID > SaveRows.Output_ID
This will delete all duplicates on Title and Type while keeping the record with the lowest value.
If you are running MySQL 8.0, you can use window function ROW_NUMBER() to assign a rank to each record in Title/Type groups, ordered by id. Then you can delete all records whose row number is not 1.
DELETE FROM output
WHERE Output_ID IN (
SELECT Output_ID
FROM (
SELECT Output_ID, ROW_NUMBER() OVER(PARTITION BY Title, Type ORDER BY Output_ID) rn
FROM output
) x
WHERE rn > 1
)
Delete From output Where Output_ID NOT IN (
Select MIN(Output_ID) from output Group By Title, Type Having COUNT(*)>1
)
By below query duplicate rows with matching condition get deleted and keeps one oldest unique row.
NOTE:- In my query I used id column is auto increment column.
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id>t2.id
If you want to keep newly inserted unique row just change the last condition as:
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id<t2.id

Delete duplicates in mySQL table

I am trying to write my first mySQL query. I need to delete rows if they have the same article-number field. I wrote this query:
SELECT
article_number, COUNT(*)
FROM
article_numbers
GROUP BY
article_number
HAVING
COUNT(*) > 1
It shows me all the rows that are duplicate. But how can I delete all but 1 for each duplicate?
Thanks
EDIT:
I tried this query:
delete article_numbers from article_numbers inner join
(select article_number
from article_numbers
group by article_number
having count(1) > 1) as duplicates
on (duplicates.article_number = article_numbers.article_number)
but it gives me this error:
Cannot delete or update a parent row: a foreign key constraint fails (api.products, CONSTRAINT products_article_number_id_foreign FOREIGN KEY (article_number_id) REFERENCES article_numbers (id))
EDIT 2:
I disabled the foreign key temporarily, and now my delete query works. But how can I modify it that one of the duplicate rows is not deleted?
Use a CROSS JOIN.
Query
delete t1
from article_numbers t1,
article_numbers t2
where t1.id > t2.id
and t1.article_number = t2.article_number;
Fiddle demo
I use a rather simple query to remove dupes:
;WITH DEDUPE AS (
SELECT ROW_NUMBER() OVER(
PARTITION BY article_number
ORDER BY (SELECT 1)) AS RN
FROM article_numbers)
DELETE FROM DEDUPE
WHERE RN != 1
Delete c
from (select *,rank() over(order by article_number) as r from article_numbers )c
where c.r!=1
Delete a row if same article_number but higher id exists:
delete from article_numbers t1
where exists (select 1 from article_numbers t2
where t2.article_number = t1.article_number
and t2.id > t1.id)
Core ANSI SQL, so I suppose it works with both MySQL and SQL Server.
I think this would help:
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1
I modified my query and I think it works now:
SET FOREIGN_KEY_CHECKS=0;
delete article_numbers from article_numbers inner join
(select min(id) minid, article_number
from article_numbers
group by article_number
having count(1) > 1) as duplicates
on (duplicates.article_number = article_numbers.article_number and duplicates.minid <> article_numbers.id)
But it seems very complex. I will check #Ullas method to see if it works, too.

MySQL - Deleting all but newest rows with composite key

I have a table with columns like this:
id timestamp content
where ID is a string, and timestamp is DEFAULT CURRENT_TIMESTAMP.
id and timestamp together make a composite key, so you can select the newest colum with something like:
select * from table where id = 'text-here' order by timestamp desc limit 1
I now have a problem where I want to delete all but the newest entry for each id, but I have no idea how to do this. If it had an auto-incrementing primary key I could use a sub-query to select the ones to keep and use NOT IN, as is demonstrated on numerous questions here, but I don't know how to do this with a composite key.
It is possible without a subquery too:
DELETE t
FROM t
JOIN t AS t2 ON t.timestamp < t2.timestamp AND t.id = t2.id;
http://sqlfiddle.com/#!9/1ff88/1
The following query:
DELETE mytable
FROM mytable
INNER JOIN (SELECT id, MAX(`timestamp`) AS `timestamp`
FROM mytable
GROUP BY id) AS t
ON mytable.id = t.id AND mytable.`timestamp` < t.`timestamp`
deletes all but the newest record per id from mytable.
Demo here

how to Delete Duplicate Rows but keeping 1 based on two columns

I have table called scheduler. It contains following columns:
ID
sequence_id
schedule_time (timestamp)
processed
source_order
I need to delete duplicate rows from the table but keeping 1 row which has same schedule_time and source_order for a particular sequence_id where processed=0
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.ID IS NULL AND processed = 0
I apply from this post ;P How can I remove duplicate rows?
Have you seen it?
--fixed version--
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.minID IS NULL AND processed = 0
For mysql
DELETE a from tbl a , tbl b WHERE a.Id>b.Id and
a.sequence_id= b.sequence_id and a.processed=0;
The fastest way to remove duplicates - is definitely to force them out by adding an index, leaving only one copy of each left in the table:
ALTER IGNORE TABLE dates ADD PRIMARY KEY (
ID
sequence_id
schedule_time
processed
source_order
)
Now if you have a key, you might need to delete it and so on, but the point is that when you add a unique key with IGNORE to a table with duplicates - the bahavior is to delete all the extra records / duplicates. So after you added this key, you now just need to delete it again to be able to make new duplicates :-)
Now if you need to do more complex filtering (on witch one of the duplicates to keep that you can not just include in indexes - although unlikely), you can create a table at the same time as you select and input what you want in it - all in the same query:
CREATE TABLE tmp SELECT ..fields.. GROUP BY ( ..what you need..)
DROP TABLE original_table
ALTER TABLE tmp RENAME TO original_table_name

deleting duplicate records on mysql?

I have this mysql query that finds duplicates and the number of occurances for each topic:
SELECT name,
COUNT(name) AS NumOccurrences
FROM topics
GROUP BY name
HAVING ( COUNT(name) > 1 )
but what I want to do is delete all the duplicates that are found. I only want one unique name for each topic, and no duplicates!! thanks
DELETE t2
FROM topics t1
JOIN topics t2
ON t2.name = t1.name
AND t2.id < t1.id
I would copy all the unique entries to a new table:
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY unique_column_name;
Check the data, then delete your old table when you're sure everything's good and rename the new table to the old one.
Then make the name column unique so you won't have to do this again.
Cheers