remove duplicates in a column - mysql

I have the following query to basically find all duplicates in my username column:
SELECT `username`
FROM `instagram_user`
GROUP BY `username`
HAVING COUNT( * ) >1
How do I remove all the duplicates, such that it will only leave me with one unique username in the table? I don't care which entity it is that is persisted or removed, as long as there's one unique username in the table.

If you don't care what record to choose then just add a unique constraint while using IGNORE
ALTER IGNORE TABLE instagram_user ADD UNIQUE (username);
Here is SQLFiddle demo
and MySQL will do the job for you. You want to have that unique constraint anyway in order to keep your table out of duplicates in the future.
or alternatively you can do
DELETE t
FROM instagram_user t JOIN
(
SELECT username, MAX(id) id
FROM instagram_user
GROUP BY username
HAVING COUNT(*) > 1
) q
ON t.username = q.username
AND t.id <> q.id
This one will leave only a row with max id for rows that have duplicate usernames.
Here is SQLFiddle demo

Not sure this is for SQL server, you can try a similar code in mysql.
;With CteUsers AS(
SELECT *,ROW_NUMBER() OVER (PARTITION BY username Order by username) AS ROWID
FROM(
SELECT PkId, `username`
FROM `instagram_user`
)tbltemp)
SELECT * FROM CteUsers;
This will result as follow
PkId username RowId
1 xx 1
2 xx 2
....
then delete where RowId > 1
;With CteUsers AS(
SELECT *,ROW_NUMBER() OVER (PARTITION BY username Order by username) AS ROWID
FROM(
SELECT PkId, `username`
FROM `instagram_user`
)tbltemp)
DELETE instagram_user WHERE PkId iN (SELECT PkId FROM CteUsers WHERE ROWID > 1);

This will give you the duplicates (i.e. the ones you need to delete) ...
select a.id, a.username from instagram_user a, instagram_user b
where a.username = b.username and a.id <> b.id
and b.id = (select min(id) from instagram_user where username = a.username)
so the DELETE would be something like ...
delete from instagram_user where id in
(select a.id from instagram_user a, instagram_user b
where a.username = b.username and a.id <> b.id
and b.id = (select min(c.id) from instagram_user c
where c.username = a.username))

Related

MySQL Delete all records except latest N for each user

I want to keep lastest N records of each user_id and delete others.
Structure table "tab":
id (auto increment)
user_id
information
If possible, I would like to not delete if a user's number of records is less than N.
Thank you in advance.
You can use correlated subquery as follows:
Delete from your_table t
Where N <= (select count(1)
from Your_table tt
where tt.id < t.id)
You can use join in a delete:
delete t
from t join
(select t.*, row_number() over (order by id desc) as seqnum
from t
) tt
on tt.user_id = t.user_id
where seqnum > N;
This reversely enumerates the rows for a given user_id and then deletes those whose enumeration is too large.
I should add that this requires MySQL 8+.
EDIT:
In older versions of MySQL, you can use:
delete t
from t join
(select u.user_id,
(select t2.id
from t t2
where t2.user_id = t.user_id
order by t2.id desc
limit 1 offset N
) as nth_id
from (select distinct user_id from t) u
) tt
on tt.user_id = t.user_id
where t.id <= nth_id;
Test the subquery before you run the delete. It should be returning the n+1th id for each user.

Deleting duplicate data mysql

We have duplicate id where in if we use
SELECT MAX(id), COUNT(id) AS count
FROM user_status
GROUP BY user_id, user_type
HAVING COUNT(*) > 1
we get the duplicate data
Now when I tried to insert it in delete via
DELETE
FROM user_status
WHERE id IN (SELECT MAX(id), COUNT(id) AS count
FROM user_status
GROUP BY user_id, user_type
HAVING COUNT(*) > 1
)
I got the following error
Operand should contain 1 column
How can I fix this ? TIA
As message is clear that you are selecting two columns in the subquery which is not allowed in IN clause but still mysql shows some error in such from of query so please replace the query with join as below
DELETE us
FROM user_status us
INNER JOIN (SELECT MAX(id) id
FROM user_status
GROUP BY user_id, user_type HAVING COUNT(*) > 1) t ON t.id = us.id
How about
DELETE user_status
FROM user_status LEFT JOIN (
SELECT MAX(user_id) maxId
FROM user_status
GROUP BY user_type
) a ON maxId = user_id
WHERE maxId IS NULL;
In your code where col1 in (select col1, col2) the column count simply doesn't match. If you reference each selected column, it works in principle:
DELETE FROM user_status
WHERE (user_id, user_type) NOT IN (
SELECT MAX(user_id) maxId, user_type
FROM (select * from user_status) a
GROUP BY user_type);

I want to update table value based on another table max value

I have two tables A and B(oracle database). Table B has two columns Id and mdate, where id is primary key. Table A has two columns Id and mdate where id is foreign key. I want to update table B mdate value which should be max mdate value from table A for matching Id.
Update b
set mdate= (select max(mdate) from a group by Id)
where b.id = a.id;
You're very close. The WHERE clause needs to be moved into the subquery, to make it a correlated subquery. Also, the parameter to UPDATE is a table name, not a column name.
UPDATE b
SET mdate = (SELECT MAX(mdate) FROM a WHERE b.id = a.id)
In MySQL you can also do it with a JOIN:
UPDATE b
JOIN (SELECT id, MAX(mdate) AS mdate
FROM a
GROUP BY id) AS a ON a.id = b.id
SET b.mdate = a.mdate
Update b
set(b.mdate) = (select MAX(a.mdate) from a where b.id = a.id)
where exists ( select 1 from a where b.id = a.id);
Thanks to Mr. Barmar.

MySQL Delete older records

I have thousands of records (included duplicate posts) so now I want to delete old records (just leave the latest record) based on date.
My code is given below
DELETE a.*
FROM dle_post AS a
INNER JOIN (
SELECT title, MIN( id ) AS min_id
FROM dle_post
GROUP BY title
HAVING COUNT( * ) > 1
) AS b ON b.title = a.title
AND b.min_id <> a.id
The problem is that it random records base on ID. I really appreciate your help!
If you want to base it on date, you should use MAX(date) in the subquery.
DELETE a.*
FROM dle_post AS a
INNER JOIN (
SELECT title, MAX(date) AS maxdate
FROM dle_post
GROUP BY title
HAVING COUNT( * ) > 1
) AS b
ON b.title = a.title
AND a.date < b.maxdate
Just create the SELECT query of posts you want to delete and put and in a subselect:
DELETE FROM dle_post WHERE id IN (SELECT id FROM dle_post WHERE ... )
This is more readable and maintainable.

SELECT row with MAX id from each GROUP BY (unique_id) and ORDER BY number

I have table with id, unique_id, and order_number.
I want to GROUP rows by unique_id
I want to take row with MAX id from each group
And last thing I want to sort that rows by order_number
Also I have few WHERE clauses. This is my attempt which does not work:
SELECT MAX(id) AS id
, order_number
FROM table
WHERE final = 0
AND username = '$username'
AND active = 1
GROUP
BY unique_id
ORDER
BY order_number
You can use your query as a subquery:
SELECT *
FROM table
WHERE id IN (SELECT MAX(id) AS id
FROM table
WHERE final=0 AND username='$username' AND active=1
GROUP BY unique_id)
ORDER BY order_number
or, if id is not unique, use JOIN:
SELECT t1.*
FROM table AS t1
JOIN (SELECT MAX(id) AS max_id, unique_id
FROM table
WHERE final=0 AND username='$username' AND active=1
GROUP BY unique_id
) AS t2 ON t1.unique_id = t2.unique_id AND t1.id = t2.unique_id
ORDER BY order_number
Try this:
SELECT id, redni_broj
FROM table
WHERE final=0 AND username='$username' AND active=1 AND
id IN (
SELECT MAX(id) FROM table table2
WHERE table.unique_id = table2.unique_id
)
GROUP BY unique_id
ORDER BY order_number;