Remove duplicate rows from mysql table result - mysql

I have a table named consignment which has some duplicate rows against column "service" where service='CLRC'.
select * from consignment where service='CLRC'
When i select the rows, i have total 2023 rows which includes duplicates.
I wrote the below query to delete the rows but i want to select them first to make sure its deleting the correct records.
When the select runs it returns 64431 records. Is that correct?
select t1.hawb FROM consignment t1
INNER JOIN consignment t2
WHERE
t1.id < t2.id AND
t1.hawb = t2.hawb
and t1.service='CLRC'

If you expect your query to return the number of duplicates then no it is not correct.
The condition t1.id < t2.id will join every id of t1 with all ids from t2 that are greater resulting on more rows or less rows (in the case of only 2 duplicates) and rarely in the expected number.
See the demo.
If you want to see all the duplicates:
select * from consignment t
where t.service = 'CLRC'
and exists (
select 1 from consignment
where service = t.service and id <> t.id and hawb = t.hawb
)
See the demo.
If you want to delete the duplicates and keep only the one ones with the max id for each hawb then:
delete from consignment
where service='CLRC'
and id not in (
select id from (
select max(id) id from consignment
where service='CLRC'
group by hawb
) t
);
See the demo.

Include all the columns in the matching condition except id column, as being primary key :
delete t1
from consignment t1
join consignment t2
where t1.id < t2.id
and t1.hawb = t2.hawb
and t1.col1=t2.col1
and t1.col2=t2.col2
......
and t1.service='CLRC';
Demo
You can check the number of duplicates by
select count(*) from
(
select distinct hawb, col1, col2, service -- (all columns except `id`)
from consignment
) q
check whether this number equals number of deleted records just before commiting the changes.

Related

Deleting duplicate values from a mysql table but keep one

I'm trying to delete duplicate rows from a mysql table, but still keep one.
However the following query seemingly deletes every duplicate row and I'm not sure why. Basically I want to delete the row if the outputID, title and type all matches.
DELETE DupRows.*
FROM output AS DupRows
INNER JOIN (
SELECT MIN(Output_ID) AS Output_ID, Title, Type
FROM output
GROUP BY Title, Type
HAVING COUNT(*) > 1
) AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND SaveRows.Output_ID = DupRows.Output_ID;
Just :
DELETE DupRows
FROM output AS DupRows
INNER JOIN output AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND DupRows.Output_ID > SaveRows.Output_ID
This will delete all duplicates on Title and Type while keeping the record with the lowest value.
If you are running MySQL 8.0, you can use window function ROW_NUMBER() to assign a rank to each record in Title/Type groups, ordered by id. Then you can delete all records whose row number is not 1.
DELETE FROM output
WHERE Output_ID IN (
SELECT Output_ID
FROM (
SELECT Output_ID, ROW_NUMBER() OVER(PARTITION BY Title, Type ORDER BY Output_ID) rn
FROM output
) x
WHERE rn > 1
)
Delete From output Where Output_ID NOT IN (
Select MIN(Output_ID) from output Group By Title, Type Having COUNT(*)>1
)
By below query duplicate rows with matching condition get deleted and keeps one oldest unique row.
NOTE:- In my query I used id column is auto increment column.
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id>t2.id
If you want to keep newly inserted unique row just change the last condition as:
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id<t2.id

why the sql correct and the inner mechanism for run it?

the sql as follows come from mysql document. it is:
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
The document say It finds all rows in table t1 containing a value that occurs twice in a given column , and doesnot explain the sql.
t1 and t is the same table, so the
count(*) in subquery == select count(*) from t
, isn't it?
count(*) in subquery == select count(*) from t
is wrong. because in mysql you can't use it like that. so you have to run it like that to get result of same id having two rows.
if you want to get count of same occurrence,
SELECT id, name, count(*) AS all_count FROM t1 GROUP BY id HAVING all_count > 1 ORDER BY all_count DESC
And also you can get values as your query like this as well,
select * from t1 where id in ( select id from t1 group by id having count(*) > 1 )
The query contains a correlated subquery in WHERE clause:
SELECT COUNT(*) FROM t1 WHERE t1.id = t.id
It is called correlated because it is related to the main query via t.id. So, this subquery counts the number of records having an id value that is equal to the current id value of the record returned by the main query.
Thus, predicate
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id) = 2
evaluates to true for any row with an id value that occurs twice in the table.
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
This query goes through each record in t1 and then in the subquery looks into t1 again to see if in this case id is found 2 times (and only 2 times). You can do the same for any other column in t1 (or any table for that matter).
When you would like to see all values that are multiple times in the table, change WHERE 2 = by WHERE 1 <. This will also give you the values that are 3 times, 4 times, etc. in the table.
{
SELECT id,count( * )
FROM
MyTable
group by id
having count( * )>1
}
with this code, you can see the rows which repet more than one,
and you can change this query by yourself
How about using GROUP BY and HAVING:
SELECT id, count(1) as Total FROM MyTable AS t1
GROUP BY t1.id
HAVING Total = 2

Delete records based on another query in mysql

I have a query in MySQL based on which I am finding duplicate records of some columns.
select max(id), count(*) as cnt
from table group by start_id, end_id, mysqltable
having cnt>1;
This above query gives me the max(id) and the count of number of records that have start_id,end_id,mysqltable column values same.
I want to delete all the records that match the max(id) column of the above query
How can I do that?
I have tried like below
delete from table
where (select max(id), count(*) as cnt
from table group by start_id,end_id,mysqltable
having cnt>1)
But Unable to delete records
You can remove duplicate records using JOIN.
DELETE t1 FROM table t1
INNER JOIN
table t2
WHERE
t1.id > t2.id AND t1.start_id = t2.start_id AND t1.end_id = t2.end_id AND t1.mysqltable = t2.mysqltable;
This query keeps the lowest id and remove the highest.
I think so this command should work:
delete from table
where id in
( select max(id) from table
group by start_id, end_id, mysqltable
having count(*) > 1
);

How to retain one row and remove duplicates in mysql?

I have a mysql table with each row having like 20 fields. Among others, it has:
table: origin, destination, date, price
Now I want to remove any rows that are duplicate regarding only one set of specific fields: origin, destination, date.
I tried:
delete from mytable where id not in
(select id from (
SELECT MAX(p.id) as id from mytable p group by p.origin, p.destination, p.date
) x)
Problem: this retains the rows with the highest id (means: last added).
Instead I'd like to retain only the row that has the lowest price. But how?
Sidenote: I cannot add an unique index, as the table is used for mass inserts by LOAD DATA and should there not throw errors. At time of load I don't know which row is the "bestprice" one.
Also I would not want to introduce any additional or temp tables copying one to another. Just modify the existing table.
Self-join solution:
delete t1
from yourtable t1
join yourtable t2
on t1.origin = t2.origin
and t1.destination = t2.destination
and t1.date = t2.date
and t1.price > t2.price
delete t1
from mytable t1
left join
(
SELECT origin, destination, date, min(price) as price
from mytable
group by origin, destination, date
) t2 on t1.origin = t2.origin
and t1.destination = t2.destination
and t1.date = t2.date
and t1.price = t2.price
where t2.origin is null

mysql Select row with most recent date per user - make it faster

records in my table are like below:
id |sensor_id|val |audit_date
255245| 1|22.12|2017-02-18 08:26:47
and I want get latest records using this
SELECT `sensor_id`, `val`, `audit_date`
FROM `tests` t1
JOIN (SELECT max(`audit_date`) as audit_date, `sensor_id`
from `tests` group by `sensor_id`) t2
USING (`audit_date`, `sensor_id`)
where `id` > (select max(`id`)-1000 from `tests`)
It takes more than one second; without last "where" - second and half.
"id" is primary key and now indexes.
What I can do to make this query faster?
This query return latest instered record using max() funtion
SELECT t1.sensor_id,val,t1.audit_date
FROM `tests` t1
JOIN (SELECT max(`audit_date`) as audit_date, max(`sensor_id`) as max_sensor_id
FROM `tests` group by `sensor_id`) t2
ON t2.max_sensor_id = t1.sensor_id
AND t2.audit_date =t1.audit_date
You can try if a self-exclusion join would be faster:
SELECT t1.sensor_id, t1.val, t1.audit_date
FROM audit t1
LEFT JOIN audit t2
ON t1.sensor_id = t2.sensor_id
AND t2.audit_date > t1.audit_date
where
t2.id is null
Basically that says return records for which there are no greater audit_dates per sensor_id.