I have the query below that shows me duplicates in my table. I would like to know how can i turn this into a delete query to delete these duplicate rows but leaving just one. My table does have a auto increment id column.
SELECT * FROM tbl_user_tmp AS t1
INNER JOIN (
SELECT name, activity, class, COUNT(1) AS cnt FROM tbl_user_tmp
WHERE user = 'test' AND disregard = 0
GROUP BY name, activity, class
HAVING cnt > 1
) AS t2
ON t1.name = t2.name AND t1.activity = t2.activity AND t1.class = t2.class
WHERE user = 'test' AND disregard = 0
GROUP BY t1.name, t1.activity, t1.class
I have tried the query below and seems to work, but im afraid im missing something. does it look correct?
delete from tbl_user_tmp
where user='test' AND id not in
(
select minid from
(select min(id) as minid from tbl_user_tmp where user='test' group by name, activity, class) as newtable
)
You can use LIMIT.
Example:
DELETE FROM users
LIMIT 2;
Now you just need to set COUNT - 1 as your limit ;)
Related
I have this script running to check for duplicates in my table:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
This works fine and returns the ID's of the duplicate rows:
myresult = cur.fetchall()
print(myresult)
Example output:
[(84,), (85,), (339,), (340,), (351,), (352,), (416,), (417,), (511,), (512,), (532,), (533,),
(815,), (816,), (978,), (979,), (1075,), (1076,), (1385,), (1386,), (1512,)]
Now I want to delete records 84, 339, 351, 416, etc.
What would be the most convenient way to do so?
MySQL provides you with the DELETE JOIN statement that allows you to remove duplicate rows quickly.
The following statement deletes duplicate rows and keeps the highest id:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id < t2.id AND
t1.unique_col = t2.unique_col;
In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id > t2.id AND
t1.unique_col = t2.unique_col;
you can remove duplicate rows in MySQL in this way
WHERE customer_id NOT IN
(
SELECT
customer_id
FROM
(
SELECT MIN(customer_id) as customer_id
FROM CUSTOMERS
GROUP BY CONCAT(first_name, last_name, phone)
) AS duplicate_customer_ids
);`
I have a table like given below name recomendation
I want to delete all the rows where cnt has the minimum value and there exist multiple records of ID_recipient.
If there is a single record of ID_recipient it shouldn't get deleted whatever the cnt value may be.
The ones highlighted in blue are the records that must stay.
I tried:
DELETE from table where(
SELECT DISTINCT(A.ID_recipient), DISTINCT(A.cnt) FROM (
SELECT MIN(cnt) as cnt FROM recomendation_table_ID_recipient GROUP BY ID_recipient HAVING COUNT(*) > 1 ) as A);
which is not working.
If you want to use 2 dimensions you have to use IN clause.
Your subquerys doesn't make much sense, so you should test this first, or post data with wanted example
DELETE from recomendation_table_ID_recipient where (ID_recipient,cnt) IN (
SELECT DISTINCT A.ID_recipient, A.cnt FROM (
SELECT ID_recipient, MIN(cnt) as cnt FROM recomendation_table_ID_recipient GROUP BY ID_recipient HAVING COUNT(*) > 1 ) as A);
delete t1 from recomendation_table_ID_recipient t1 join (
select ID_recipient, min(cnt) as cnt from recomendation_table_ID_recipient
group by ID_recipient
having count(*) > 1
) t2 on t1.ID_recipient = t2.ID_recipient and t1.cnt = t2.cnt;
See db-fiddle
select * from table1 where ID in (
select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
Above query ran in 2.2 seconds returning four result. Now when I change the select * to delete, it hangs up indefinitely.
delete from table1 where ID in (
select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
If I move the position of group by clause inside the alias select query, it will no longer hang.
delete from table1 where ID in (
select a.ID from (select min(ID) from table1 group by id_x, id_y, col_z having count(*) > 1) a)
Why does it hang? Even though (select * from table1) pulls millions of records, the query doesn't seem to stop executing for hours. Can anybody explain what huddles the query? It puzzles me because the select query works fine whereas the delete query hangs.
EDIT:
My focus here is why it hangs. I already posted work-around that works fine. But in order to develop prevention system, I need to get to the root cause of this..
Use a JOIN instead of WHERE ID IN (SELECT ...).
DELETE t1
FROM table1 AS t1
JOIN (
SELECT MIN(id) AS minId
FROM table1
GROUP BY id_x, id_y, col_z
HAVING COUNT(*) > 1) AS t2
ON t1.id = t2.minId
I think your query is not being optimized because it has to recalculate the subquery after each deletion, since deleting a row could change the MIN(id) for that group. Using a JOIN requires the grouping and aggregation to be done just once.
Try this:
delete t
from table1 t join
(select min(id) as min_id
from table1
group byid_x, id_y, col_z
having count(*) >= 2
) tt
on tt.min_id = t.id;
That said, you probably don't want to delete just the minimum id. I'm guessing you want to keep the most recent id. If so:
delete t
from table1 t left join
(select max(id) as max_id
from table1
group byid_x, id_y, col_z
having count(*) >= 2
) tt
on tt.max_id = t.id
where tt.max_id is null;
the sql as follows come from mysql document. it is:
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
The document say It finds all rows in table t1 containing a value that occurs twice in a given column , and doesnot explain the sql.
t1 and t is the same table, so the
count(*) in subquery == select count(*) from t
, isn't it?
count(*) in subquery == select count(*) from t
is wrong. because in mysql you can't use it like that. so you have to run it like that to get result of same id having two rows.
if you want to get count of same occurrence,
SELECT id, name, count(*) AS all_count FROM t1 GROUP BY id HAVING all_count > 1 ORDER BY all_count DESC
And also you can get values as your query like this as well,
select * from t1 where id in ( select id from t1 group by id having count(*) > 1 )
The query contains a correlated subquery in WHERE clause:
SELECT COUNT(*) FROM t1 WHERE t1.id = t.id
It is called correlated because it is related to the main query via t.id. So, this subquery counts the number of records having an id value that is equal to the current id value of the record returned by the main query.
Thus, predicate
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id) = 2
evaluates to true for any row with an id value that occurs twice in the table.
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
This query goes through each record in t1 and then in the subquery looks into t1 again to see if in this case id is found 2 times (and only 2 times). You can do the same for any other column in t1 (or any table for that matter).
When you would like to see all values that are multiple times in the table, change WHERE 2 = by WHERE 1 <. This will also give you the values that are 3 times, 4 times, etc. in the table.
{
SELECT id,count( * )
FROM
MyTable
group by id
having count( * )>1
}
with this code, you can see the rows which repet more than one,
and you can change this query by yourself
How about using GROUP BY and HAVING:
SELECT id, count(1) as Total FROM MyTable AS t1
GROUP BY t1.id
HAVING Total = 2
My working query is below. However, the results from that query will produce duplicates AND non duplicates on name column. I want to be able to only show results where name columns from the two select queries are different
select t.*
from tbl_user_tmp t JOIN
(select activity, class, count(*) as NumDuplicates
from tbl_user_tmp
where user = 'bignadad2'
group by activity, class
having NumDuplicates > 1)
tsum ON t.activity = tsum.activity and t.class = tsum.class
columns are in this order
id, name, activity, class, activity_id
I only want to show these results where activity, class match and name does not.
2059 lg_lmk com.lge.lmk com.lge.lmk.activities.LmkMainActivity 48255
3668 task_manager com.lge.lmk com.lge.lmk.activities.LmkMainActivity 48255
These are the other results i do not want to see
2690 phone com.modoohut.dialer com.modoohut.dialer.DialActivity 54700
2694 phone com.modoohut.dialer com.modoohut.dialer.DialActivity 54700
I forgot that you needs only some results
SELECT * FROM tbl_user_tmp AS t1
INNER JOIN (
SELECT activity, class, COUNT(1) AS cnt FROM tbl_user_tmp
WHERE user = 'first'
GROUP BY activity, class
HAVING cnt > 1
) AS t2
ON t1.activity = t2.activity AND t1.class = t2.class
WHERE user = 'first' -- remove records of the other users
GROUP BY t1.name, t1.activity, t1.class -- select distinct records
SQLFiddle
If class is unique in the activity then you can remove activity from the GROUP BY-statement.