Removing Duplicates In MySql - mysql

I have inherited a database that has a "Duplicate problem".
when I run:
SELECT myFIELD, COUNT(*) c FROM myTABLE GROUP BY myFIELD HAVING c > 1;
I get ~600 records that are duplicated
None or tripled or any other multiple
I wish to kill off one of the records, leaving me with no duplicates.
What does the query look like?

You can use a query like this ... you should repete the query depending how many time the same row is duplicated ..
delete from my_table
where (myFIELD, id) in
(select a.myFIELD, max(a.id)
FROM myTABLE as a GROUP BY a.myFIELD HAVING count(*) > 1)
otherwise you can use
delete from my_table
where (myFIELD, id) not in
(select a.myFIELD, min(a.id)
FROM myTABLE as a GROUP BY a.myFIELD )
this should delete all the duplicated rows in a shot
If there problem with table name
delete from my_table
where (myFIELD, id) in (select field, id from
(select a.myFIELD as field, max(a.id) as id
FROM myTABLE as a GROUP BY a.myFIELD HAVING count(*) > 1) as t)

Related

How to delete the row that has least value in a specific column if group by count of the column is greater than 1

I have a table like given below name recomendation
I want to delete all the rows where cnt has the minimum value and there exist multiple records of ID_recipient.
If there is a single record of ID_recipient it shouldn't get deleted whatever the cnt value may be.
The ones highlighted in blue are the records that must stay.
I tried:
DELETE from table where(
SELECT DISTINCT(A.ID_recipient), DISTINCT(A.cnt) FROM (
SELECT MIN(cnt) as cnt FROM recomendation_table_ID_recipient GROUP BY ID_recipient HAVING COUNT(*) > 1 ) as A);
which is not working.
If you want to use 2 dimensions you have to use IN clause.
Your subquerys doesn't make much sense, so you should test this first, or post data with wanted example
DELETE from recomendation_table_ID_recipient where (ID_recipient,cnt) IN (
SELECT DISTINCT A.ID_recipient, A.cnt FROM (
SELECT ID_recipient, MIN(cnt) as cnt FROM recomendation_table_ID_recipient GROUP BY ID_recipient HAVING COUNT(*) > 1 ) as A);
delete t1 from recomendation_table_ID_recipient t1 join (
select ID_recipient, min(cnt) as cnt from recomendation_table_ID_recipient
group by ID_recipient
having count(*) > 1
) t2 on t1.ID_recipient = t2.ID_recipient and t1.cnt = t2.cnt;
See db-fiddle

MySQL Select works fine but Delete hangs indefinitely based on the position of GROUP BY

select * from table1 where ID in (
select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
Above query ran in 2.2 seconds returning four result. Now when I change the select * to delete, it hangs up indefinitely.
delete from table1 where ID in (
select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
If I move the position of group by clause inside the alias select query, it will no longer hang.
delete from table1 where ID in (
select a.ID from (select min(ID) from table1 group by id_x, id_y, col_z having count(*) > 1) a)
Why does it hang? Even though (select * from table1) pulls millions of records, the query doesn't seem to stop executing for hours. Can anybody explain what huddles the query? It puzzles me because the select query works fine whereas the delete query hangs.
EDIT:
My focus here is why it hangs. I already posted work-around that works fine. But in order to develop prevention system, I need to get to the root cause of this..
Use a JOIN instead of WHERE ID IN (SELECT ...).
DELETE t1
FROM table1 AS t1
JOIN (
SELECT MIN(id) AS minId
FROM table1
GROUP BY id_x, id_y, col_z
HAVING COUNT(*) > 1) AS t2
ON t1.id = t2.minId
I think your query is not being optimized because it has to recalculate the subquery after each deletion, since deleting a row could change the MIN(id) for that group. Using a JOIN requires the grouping and aggregation to be done just once.
Try this:
delete t
from table1 t join
(select min(id) as min_id
from table1
group byid_x, id_y, col_z
having count(*) >= 2
) tt
on tt.min_id = t.id;
That said, you probably don't want to delete just the minimum id. I'm guessing you want to keep the most recent id. If so:
delete t
from table1 t left join
(select max(id) as max_id
from table1
group byid_x, id_y, col_z
having count(*) >= 2
) tt
on tt.max_id = t.id
where tt.max_id is null;

Delete records based on another query in mysql

I have a query in MySQL based on which I am finding duplicate records of some columns.
select max(id), count(*) as cnt
from table group by start_id, end_id, mysqltable
having cnt>1;
This above query gives me the max(id) and the count of number of records that have start_id,end_id,mysqltable column values same.
I want to delete all the records that match the max(id) column of the above query
How can I do that?
I have tried like below
delete from table
where (select max(id), count(*) as cnt
from table group by start_id,end_id,mysqltable
having cnt>1)
But Unable to delete records
You can remove duplicate records using JOIN.
DELETE t1 FROM table t1
INNER JOIN
table t2
WHERE
t1.id > t2.id AND t1.start_id = t2.start_id AND t1.end_id = t2.end_id AND t1.mysqltable = t2.mysqltable;
This query keeps the lowest id and remove the highest.
I think so this command should work:
delete from table
where id in
( select max(id) from table
group by start_id, end_id, mysqltable
having count(*) > 1
);

Display the orders in which more than one article is ordered

I tried to display the entries in table where the order has more than one article: but it´s not working the way I tried it. Can somebody show me what´s wrong?!
Here´s what I tried:
SELECT *
FROM TableX
WHERE (SELECT COUNT(Ordernumber) FROM TableX AS a WHERE a>1);
One option is to use a subquery to identify the order numbers having more than one article, then join this subquery to your original table to obtain the full records for these matching orders.
SELECT t1.*
FROM TableX t1
INNER JOIN
(
SELECT Ordernumber
FROM TableX
GROUP BY Ordernumber
HAVING COUNT(*) > 1
) t2
ON t1.Ordernumber = t2.Ordernumber
This query assumes that all articles within a given order are unique. If duplicate articles could occur, and you would not count duplicates, then you can use the following HAVING clause instead:
HAVING COUNT(DISTINCT article) > 1
Another option:
SELECT *
FROM TableX
WHERE Ordernumber IN
(
SELECT Ordernumber
FROM TableX
GROUP BY Ordernumber
HAVING COUNT(*) > 1
)

Delete records from a table where < max number for a field and keep highest number

I know this sounds rather confusing but I'm at a loss how to explain it better. I have a table simplified below:
DB Type ID
================
Table1 1
Table1 2
Table1 3
Table1 4
Table1 5
Table2 6
Table2 7
Table2 8
Table2 9
Table2 10
what i am trying to achieve is to basically clean out this table but keep the record with the highest ID for each DB Type if that makes sense - so in this case it would be (Table1,5) and (Table2,10) with all other records being deleted. Is it possible to do this exclusively through MySQL?
*EDIT***
Answer thanks to tips from Yogendra Singh
DELETE FROM MyTable WHERE ID NOT IN (SELECT * FROM (SELECT MAX(ID) from MyTable GROUP BY DB Type) AS tb1 ) ORDER BY ID ASC
TRY selecting the max ID group by db_type first and then use it as sub query with not in.
DELETE FROM MyTable
WHERE ID NOT IN
(SELECT ID FROM
(SELECT MAX(ID) AS ID from MyTable GROUP BY DB Type) AS tb1
)
EDIT:
DELETE FROM MyTable
HAVING MAX(ID) > ID;
delete your_table
from
your_table left join
(select max(id) max_id from your_table group by type) mx
on your_table.id=mx.max_id
where mx.max_id is null
Subquery returns the maximum id for every type, and those are the values to keep. With an left join i'm selecting all the rows from your table that don't have an in in max_ids, and those are the rows to delete. This will work only if id is primary key, otherwise we have to join also the type.
Is the combination DB Type - ID unique?
If so, you can attack this in two stages:
Get only the rows you want
SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable
GROUP BY [DB Type]
Delete the rest (Wrapping the previous statement into a more complicated statement; don't mean that)
DELETE FROM YourTable
FROM
YourTable
LEFT JOIN
(SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable GROUP BY [DB Type]) DontDelete
ON
YourTable.[DB Type]=DontDelete.[DB Type] AND
YourTable.ID=DontDelete.MaxID
WHERE
DontDelete.[DB Type] IS NULL
DELETE FROM MyTable del
WHERE EXISTS (
(SELECT *
FROM MyTable xx
WHERE xx."db Type" = del."db Type"
AND xx.id > del.id
);
delete from my_Table
where Day in (select MAX(day) d from my_Table where id='id')