I have a query in MySQL based on which I am finding duplicate records of some columns.
select max(id), count(*) as cnt
from table group by start_id, end_id, mysqltable
having cnt>1;
This above query gives me the max(id) and the count of number of records that have start_id,end_id,mysqltable column values same.
I want to delete all the records that match the max(id) column of the above query
How can I do that?
I have tried like below
delete from table
where (select max(id), count(*) as cnt
from table group by start_id,end_id,mysqltable
having cnt>1)
But Unable to delete records
You can remove duplicate records using JOIN.
DELETE t1 FROM table t1
INNER JOIN
table t2
WHERE
t1.id > t2.id AND t1.start_id = t2.start_id AND t1.end_id = t2.end_id AND t1.mysqltable = t2.mysqltable;
This query keeps the lowest id and remove the highest.
I think so this command should work:
delete from table
where id in
( select max(id) from table
group by start_id, end_id, mysqltable
having count(*) > 1
);
Related
I'm trying to delete duplicate rows from a mysql table, but still keep one.
However the following query seemingly deletes every duplicate row and I'm not sure why. Basically I want to delete the row if the outputID, title and type all matches.
DELETE DupRows.*
FROM output AS DupRows
INNER JOIN (
SELECT MIN(Output_ID) AS Output_ID, Title, Type
FROM output
GROUP BY Title, Type
HAVING COUNT(*) > 1
) AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND SaveRows.Output_ID = DupRows.Output_ID;
Just :
DELETE DupRows
FROM output AS DupRows
INNER JOIN output AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND DupRows.Output_ID > SaveRows.Output_ID
This will delete all duplicates on Title and Type while keeping the record with the lowest value.
If you are running MySQL 8.0, you can use window function ROW_NUMBER() to assign a rank to each record in Title/Type groups, ordered by id. Then you can delete all records whose row number is not 1.
DELETE FROM output
WHERE Output_ID IN (
SELECT Output_ID
FROM (
SELECT Output_ID, ROW_NUMBER() OVER(PARTITION BY Title, Type ORDER BY Output_ID) rn
FROM output
) x
WHERE rn > 1
)
Delete From output Where Output_ID NOT IN (
Select MIN(Output_ID) from output Group By Title, Type Having COUNT(*)>1
)
By below query duplicate rows with matching condition get deleted and keeps one oldest unique row.
NOTE:- In my query I used id column is auto increment column.
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id>t2.id
If you want to keep newly inserted unique row just change the last condition as:
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id<t2.id
the sql as follows come from mysql document. it is:
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
The document say It finds all rows in table t1 containing a value that occurs twice in a given column , and doesnot explain the sql.
t1 and t is the same table, so the
count(*) in subquery == select count(*) from t
, isn't it?
count(*) in subquery == select count(*) from t
is wrong. because in mysql you can't use it like that. so you have to run it like that to get result of same id having two rows.
if you want to get count of same occurrence,
SELECT id, name, count(*) AS all_count FROM t1 GROUP BY id HAVING all_count > 1 ORDER BY all_count DESC
And also you can get values as your query like this as well,
select * from t1 where id in ( select id from t1 group by id having count(*) > 1 )
The query contains a correlated subquery in WHERE clause:
SELECT COUNT(*) FROM t1 WHERE t1.id = t.id
It is called correlated because it is related to the main query via t.id. So, this subquery counts the number of records having an id value that is equal to the current id value of the record returned by the main query.
Thus, predicate
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id) = 2
evaluates to true for any row with an id value that occurs twice in the table.
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
This query goes through each record in t1 and then in the subquery looks into t1 again to see if in this case id is found 2 times (and only 2 times). You can do the same for any other column in t1 (or any table for that matter).
When you would like to see all values that are multiple times in the table, change WHERE 2 = by WHERE 1 <. This will also give you the values that are 3 times, 4 times, etc. in the table.
{
SELECT id,count( * )
FROM
MyTable
group by id
having count( * )>1
}
with this code, you can see the rows which repet more than one,
and you can change this query by yourself
How about using GROUP BY and HAVING:
SELECT id, count(1) as Total FROM MyTable AS t1
GROUP BY t1.id
HAVING Total = 2
I have inherited a database that has a "Duplicate problem".
when I run:
SELECT myFIELD, COUNT(*) c FROM myTABLE GROUP BY myFIELD HAVING c > 1;
I get ~600 records that are duplicated
None or tripled or any other multiple
I wish to kill off one of the records, leaving me with no duplicates.
What does the query look like?
You can use a query like this ... you should repete the query depending how many time the same row is duplicated ..
delete from my_table
where (myFIELD, id) in
(select a.myFIELD, max(a.id)
FROM myTABLE as a GROUP BY a.myFIELD HAVING count(*) > 1)
otherwise you can use
delete from my_table
where (myFIELD, id) not in
(select a.myFIELD, min(a.id)
FROM myTABLE as a GROUP BY a.myFIELD )
this should delete all the duplicated rows in a shot
If there problem with table name
delete from my_table
where (myFIELD, id) in (select field, id from
(select a.myFIELD as field, max(a.id) as id
FROM myTABLE as a GROUP BY a.myFIELD HAVING count(*) > 1) as t)
I am updating my table setting a field named "status" based on the condition that the total number of distinct rows should be more than 10 and less than 13. The query is as follows:
update myTable set status='Established'
where id IN(select id, count(*) as c
from myTable
where year>=1996 and year<=2008
group by id
having count(distinct year)>=10 and count(distinct year)<=13)
The problem is, I'm getting error1241 that is "operand should contain 1 column"! Could you please advise how can I solve this? Thanks!
The result of the sub query must return only 1 column :
update myTable set status='Established'
where id IN(select id
from myTable
group by id
having count(distinct year)>=10 and count(distinct year)>=13)
In MySQL, an update with a join often performs better than an update with a subquery in the where clause.
This version might have better performance:
update myTable join
(select id, count(*) as c
from myTable
where year >= 1996 and year <= 2008
group by id
having count(distinct year) >= 10 and count(distinct year) <= 13
) filter
on myTable.id = filter.id
set status = 'Established';
I will also note that you have a table where a column called id is not unique among the rows. Typically, such a column would be a primary key, so the having clause would always fail (there would only be one row).
update myTable
set status='Established'
where id IN(select id from myTable
group by id
having count(distinct year)>=10
and count(distinct year)>=13)
You are using IN operator and then you inner query returns two columns id and count(*) it should return only one column back.
I know this sounds rather confusing but I'm at a loss how to explain it better. I have a table simplified below:
DB Type ID
================
Table1 1
Table1 2
Table1 3
Table1 4
Table1 5
Table2 6
Table2 7
Table2 8
Table2 9
Table2 10
what i am trying to achieve is to basically clean out this table but keep the record with the highest ID for each DB Type if that makes sense - so in this case it would be (Table1,5) and (Table2,10) with all other records being deleted. Is it possible to do this exclusively through MySQL?
*EDIT***
Answer thanks to tips from Yogendra Singh
DELETE FROM MyTable WHERE ID NOT IN (SELECT * FROM (SELECT MAX(ID) from MyTable GROUP BY DB Type) AS tb1 ) ORDER BY ID ASC
TRY selecting the max ID group by db_type first and then use it as sub query with not in.
DELETE FROM MyTable
WHERE ID NOT IN
(SELECT ID FROM
(SELECT MAX(ID) AS ID from MyTable GROUP BY DB Type) AS tb1
)
EDIT:
DELETE FROM MyTable
HAVING MAX(ID) > ID;
delete your_table
from
your_table left join
(select max(id) max_id from your_table group by type) mx
on your_table.id=mx.max_id
where mx.max_id is null
Subquery returns the maximum id for every type, and those are the values to keep. With an left join i'm selecting all the rows from your table that don't have an in in max_ids, and those are the rows to delete. This will work only if id is primary key, otherwise we have to join also the type.
Is the combination DB Type - ID unique?
If so, you can attack this in two stages:
Get only the rows you want
SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable
GROUP BY [DB Type]
Delete the rest (Wrapping the previous statement into a more complicated statement; don't mean that)
DELETE FROM YourTable
FROM
YourTable
LEFT JOIN
(SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable GROUP BY [DB Type]) DontDelete
ON
YourTable.[DB Type]=DontDelete.[DB Type] AND
YourTable.ID=DontDelete.MaxID
WHERE
DontDelete.[DB Type] IS NULL
DELETE FROM MyTable del
WHERE EXISTS (
(SELECT *
FROM MyTable xx
WHERE xx."db Type" = del."db Type"
AND xx.id > del.id
);
delete from my_Table
where Day in (select MAX(day) d from my_Table where id='id')