Delete duplicate rows in mySQL in same table - mysql

I have this script running to check for duplicates in my table:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
This works fine and returns the ID's of the duplicate rows:
myresult = cur.fetchall()
print(myresult)
Example output:
[(84,), (85,), (339,), (340,), (351,), (352,), (416,), (417,), (511,), (512,), (532,), (533,),
(815,), (816,), (978,), (979,), (1075,), (1076,), (1385,), (1386,), (1512,)]
Now I want to delete records 84, 339, 351, 416, etc.
What would be the most convenient way to do so?

MySQL provides you with the DELETE JOIN statement that allows you to remove duplicate rows quickly.
The following statement deletes duplicate rows and keeps the highest id:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id < t2.id AND
t1.unique_col = t2.unique_col;
In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id > t2.id AND
t1.unique_col = t2.unique_col;

you can remove duplicate rows in MySQL in this way
WHERE customer_id NOT IN
(
SELECT
customer_id
FROM
(
SELECT MIN(customer_id) as customer_id
FROM CUSTOMERS
GROUP BY CONCAT(first_name, last_name, phone)
) AS duplicate_customer_ids
);`

Related

How to get DISTINCT of a SUBSTRING in MySQL

I have a query,
SELECT t2.id, t1.image, SUBSTRING(t2.start_time,1,10) AS mytime,
t2.user
FROM post_table t1
INNER JOIN watchUserList t2 ON t1.id = t2.movie_id
WHERE user = 'john#gmail.com'
ORDER BY id DESC;
In this query I want to fetch DISTINCT of mytime. I tried DISTINCT(SUBSTRING(t2.start_time,1,10)) AS mytime and SUBSTRING(t2.start_time,1,10) AS DISTINCT(mytime). But both doesn't work. How to get Distinct of a Substring in MySQL. Is there any way?
The correct syntax is
SELECT DISTINCT t2.id, t1.image, SUBSTRING(t2.start_time,1,10) AS mytime,
t2.user
FROM post_table t1
INNER JOIN watchUserList t2 ON t1.id = t2.movie_id
WHERE user = 'john#gmail.com'
ORDER BY id DESC;
But note, that its distinct over all the fields in the field-list.
Distinct on Multiple Columns
When we use MySQL Distinct on multiple columns, then the SELECT Statement writes the unique combination of multiple columns instead of unique individual records.
Distinct On Multiple Columns

Delete records based on another query in mysql

I have a query in MySQL based on which I am finding duplicate records of some columns.
select max(id), count(*) as cnt
from table group by start_id, end_id, mysqltable
having cnt>1;
This above query gives me the max(id) and the count of number of records that have start_id,end_id,mysqltable column values same.
I want to delete all the records that match the max(id) column of the above query
How can I do that?
I have tried like below
delete from table
where (select max(id), count(*) as cnt
from table group by start_id,end_id,mysqltable
having cnt>1)
But Unable to delete records
You can remove duplicate records using JOIN.
DELETE t1 FROM table t1
INNER JOIN
table t2
WHERE
t1.id > t2.id AND t1.start_id = t2.start_id AND t1.end_id = t2.end_id AND t1.mysqltable = t2.mysqltable;
This query keeps the lowest id and remove the highest.
I think so this command should work:
delete from table
where id in
( select max(id) from table
group by start_id, end_id, mysqltable
having count(*) > 1
);

MySQL take rows and override ones without user_id

I have table like this one:
I would like to all rows, but if there is user_id 5 if this case, override other rows which have no user_id.
I tried both with MAX(user_id) and GROUP BY country_name, but it still returns, wrong results.
Final result I'm expecting:
Try this;)
select t1.*
from yourtable t1
inner join (
select max(user_id) as user_id, country_name from yourtable group by country_name
) t2 on t1.country_name = t2.country_name and t1.user_id = t2.user_id
This is just a solution based on your sample data. If you have a variety of user_id, it should be more different.
As of SQL Select only rows with Max Value on a Column you can easily get rows with max value on a column by using both MAX(column) and GROUP BY other_column in one statement.
But if you want to select other columns too, you have to this in a subquery like in the following example:
SELECT a.*
FROM YourTable a
INNER JOIN (
SELECT country_name, MAX(user_id) user_id
FROM YourTable
GROUP BY country_name
) b ON a.country_name = b.country_name AND a.user_id = b.user_id

Delete duplicates in mysql when based on values from 3 columns matching

I have the query below that shows me duplicates in my table. I would like to know how can i turn this into a delete query to delete these duplicate rows but leaving just one. My table does have a auto increment id column.
SELECT * FROM tbl_user_tmp AS t1
INNER JOIN (
SELECT name, activity, class, COUNT(1) AS cnt FROM tbl_user_tmp
WHERE user = 'test' AND disregard = 0
GROUP BY name, activity, class
HAVING cnt > 1
) AS t2
ON t1.name = t2.name AND t1.activity = t2.activity AND t1.class = t2.class
WHERE user = 'test' AND disregard = 0
GROUP BY t1.name, t1.activity, t1.class
I have tried the query below and seems to work, but im afraid im missing something. does it look correct?
delete from tbl_user_tmp
where user='test' AND id not in
(
select minid from
(select min(id) as minid from tbl_user_tmp where user='test' group by name, activity, class) as newtable
)
You can use LIMIT.
Example:
DELETE FROM users
LIMIT 2;
Now you just need to set COUNT - 1 as your limit ;)

Mysql delete older duplicates

I've got table with this data
id, archive id, ean, index, date, (...)
I've got some items with same archive id, same ean, but different index.
So in this case, I want to delete older (basing on date) item, so result will be that for each combination archive_id/index there will be no more than 1 result.
The following (untested) should work:
DELETE FROM someTable WHERE EXISTS ( SELECT id FROM someTable AS subqTable WHERE
subqTable.id = someTable.id
AND subqTable.ean = someTable.ean
-- and other equality comparisons
AND subqTable.date AFTER someTable.date)
DELETE duplicates.*
FROM _table
JOIN _table AS duplicates
ON (_table.archive_id = duplicates.archive_id AND _table.index = duplicates.index)
WHERE duplicates.date < _table.date;
delete t1
from your_table t1
left join
(
select archive_id, ean, min(date) as mdate
from your_table
group by archive_id, ean
) t2 on t1.archive_id = t2.archive_id
and t1.ean = t2.ean
and t1.date = t2.mdate
where t2.mdate is null