Removing duplicate records from relational db table - mysql

I have a database table with three columns. Id, user_id, book_id. In this table, there are some duplicates. a user_id should only have one record of a book_id, but in some cases, a user_id has several book_ids. There are a couple of million records already and I'm wondering how to remove any duplicates.

Try following.
SQL SERVER
WITH ORDERED AS
(
SELECT id
ROW_NUMBER() OVER (PARTITION BY [user_id] , [book_id] ORDER BY id ASC) AS rn
FROM
tableName
)
delete from tableName
where id in ( select id from ORDERED where rn != 1)
MYSQL
delete from tableName
where id not in(
select MIN(id)from tableName
group by user_id, book_id
)
Edited as per comments - In MySQL, you can't modify the same table which you use in the SELECT part
This will solve the issue.
delete from tableName
where id not in(
select temp.temp_id from (
select MIN(id) as temp_id from tableName
group by user_id, book_id
) as temp
)
This will keep only one combination of (user_id, book_id)

If you execute this statement below, it will delete all duplicate records of user_ID and leaving only the greatest ID for each user_ID
DELETE a
FROM tableName a
LEFT JOIN
(
SELECT user_ID, MAX(ID) max_ID
FROM tableName
GROUP BY user_ID
) b ON a.user_ID = b.user_ID AND
a.ID = b.max_ID
WHERE b.max_ID IS NULL
SQLFiddle Demo

Hope this query will allow you to remove duplicates:
DELETE bl1 FROM book_log bl1
JOIN book_log bl2
ON (
bl1.id > bl2.id AND
bl1.user_id = bl2.user_id AND
bl1.book_id = bl2.book_id
);
Demo

Related

Delete all duplicates except first one mysql

I have a table with a column serial_number that is repeated a few times. How would I delete the entire row except the first duplicate?
By the following, I can select all the duplicates. But can't delete.
SELECT serial_number, COUNT(*) FROM trademark_merge GROUP BY serial_number HAVING COUNT(*) > 1
Assuming that the primary key of your table is id, you could phrase this as a delete/join query, like:
delete tm
from trademark_merge tm
inner join (
select serial_number, min(id) id
from trademark_merge
group by serial_number
) tm1 on tm.serial_number = tm1.serial_number and tm.id > tm1.id

how to delete all duplicate rows in mysql

this is my table role_users for particular id
I want to delete all duplicate rows for role_id and user_id and if 2 or more records are there then only 1 latest record should be there and other should be deleted. How can i write this.?
You can use the delete ... from ... join... syntax:
delete r
from role_users r
inner join (
select role_id, user_id, max(id) max_id
from role_users
group by role_id, user_id
) r1
on r.role_id = r1.role_id
and r.user_id = r1.user_id
and r.id < r1.max_id
Try the following approach:
The following query would give you IDs of all Unique records grouped by role_id & user_id
SELECT MAX(id) FROM role_users GROUP BY role_id, user_id
Notice the MAX function here to get the latest created record assuming id is auto_increment
Use it in a nested query to delete all the other duplicate records
DELETE FROM role_users WHERE id NOT IN (SELECT MAX(id) FROM role_users GROUP BY role_id, user_id);
Let me know if it works :)

what should be the mysql query to fetch recently added message by a group of users which are stored in the same table

I have the following table structure
and i want the result to be
Here is the query which i tried
select * from table where userid IN(201,202,203,204,205)
group by userid
order by messageid desc
But i dint get the latest records based on messageid.
I need to write this in a single query as i must use order by clause
Please explain my mistake and provide a solution
You can join the table to itself using the max of the messageid if I'm understanding your question correctly:
select t.messageid, t.userid, t.data
from yourtable t
join (
select max(messageid) maxmessageid, userid
from yourtable
where userid in (201,202,203,204,205)
group by userid
) t2 on t.userid = t2.userid and t.messageid = t2.maxmessageid
order by t.messageid desc
SQL Fiddle Demo
Edit: Here's an alternative approach using IN:
select messageid, userid, data
from yourtable
where messageid in (
select max(messageid) maxmessageid
from yourtable
where userid in (201,202,203,204,205)
group by userid
)
order by messageid desc
More Fiddle
SELECT yourtable.*
FROM
yourtable INNER JOIN (SELECT userid, MAX(messageid) max_messageid
FROM yourtable
WHERE userid IN (201,202,203,204,205)
GROUP BY userid) mx
ON yourtable.messageid=mx.max_messageid
AND yourtable.userid=mx.userid

The target table ... of DELETE is not updateable

I have such query
SET #n=0;
DELETE t3 FROM (
SELECT id, project_id, task_id, user_id,grouper
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1
group by grouper) as t3 WHERE grouper=1
for which i receive The target table t3 of the DELETE is not updatable
is there any solution for this error?
basically what i'm trying is to delete group of table rows marked with grouper using select in delete. i'm also happy for other solutions or ideas different than this one.
sql fiddle: http://sqlfiddle.com/#!2/33820/2/0
EDIT: thanks for the answers here is the working code(if anyone need something similiar):
SET #n=0;
delete from timelog where id in ((SELECT id
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1 and grouper=1
group by grouper))
Wish I had more time...but fast psuedo code...
delete from timelog where id in ((SELECT id
FROM (
SELECT id, project_id, task_id, user_id,
#n:=if(status=55,#n+1,#n),
if(status=55,#n-1,#n) as grouper FROM timelog
WHERE user_id='5' ORDER BY id ASC
) as t
where grouper>-1
group by grouper) as t3 WHERE grouper=1)
all I'm doing is changing the subselect statement into a where clause that simply returns the ID's listed in your original subquery.
edit - brackets are a bit off, I think I have it now. To be honest, this can really be cleaned up to one select statement, not the nested version here.
delete from dept_new where rowid in(select rowid from(select rowid,row_number() over(partition by deptno,dname,loc order by deptno) rownu from dept_new) where rownu>1);

Delete records from a table where < max number for a field and keep highest number

I know this sounds rather confusing but I'm at a loss how to explain it better. I have a table simplified below:
DB Type ID
================
Table1 1
Table1 2
Table1 3
Table1 4
Table1 5
Table2 6
Table2 7
Table2 8
Table2 9
Table2 10
what i am trying to achieve is to basically clean out this table but keep the record with the highest ID for each DB Type if that makes sense - so in this case it would be (Table1,5) and (Table2,10) with all other records being deleted. Is it possible to do this exclusively through MySQL?
*EDIT***
Answer thanks to tips from Yogendra Singh
DELETE FROM MyTable WHERE ID NOT IN (SELECT * FROM (SELECT MAX(ID) from MyTable GROUP BY DB Type) AS tb1 ) ORDER BY ID ASC
TRY selecting the max ID group by db_type first and then use it as sub query with not in.
DELETE FROM MyTable
WHERE ID NOT IN
(SELECT ID FROM
(SELECT MAX(ID) AS ID from MyTable GROUP BY DB Type) AS tb1
)
EDIT:
DELETE FROM MyTable
HAVING MAX(ID) > ID;
delete your_table
from
your_table left join
(select max(id) max_id from your_table group by type) mx
on your_table.id=mx.max_id
where mx.max_id is null
Subquery returns the maximum id for every type, and those are the values to keep. With an left join i'm selecting all the rows from your table that don't have an in in max_ids, and those are the rows to delete. This will work only if id is primary key, otherwise we have to join also the type.
Is the combination DB Type - ID unique?
If so, you can attack this in two stages:
Get only the rows you want
SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable
GROUP BY [DB Type]
Delete the rest (Wrapping the previous statement into a more complicated statement; don't mean that)
DELETE FROM YourTable
FROM
YourTable
LEFT JOIN
(SELECT [DB Type], Max(ID) AS MaxID
FROM YourTable GROUP BY [DB Type]) DontDelete
ON
YourTable.[DB Type]=DontDelete.[DB Type] AND
YourTable.ID=DontDelete.MaxID
WHERE
DontDelete.[DB Type] IS NULL
DELETE FROM MyTable del
WHERE EXISTS (
(SELECT *
FROM MyTable xx
WHERE xx."db Type" = del."db Type"
AND xx.id > del.id
);
delete from my_Table
where Day in (select MAX(day) d from my_Table where id='id')