Find most recent duplicates ID with MySQL - mysql

I use to do
SELECT email, COUNT(email) AS occurences
FROM wineries
GROUP BY email
HAVING (COUNT(email) > 1);
to find duplicates based on their email.
But now I'd need their ID to be able to define which one to remove exactly.
The second constraint is: I want only the LAST INSERTED duplicates.
So if there's 2 entries with test#test.com as an email and their IDs are respectively 40 and 12782 it would delete only the 12782 entry and keep the 40 one.
Any ideas on how I could do this? I've been mashing SQL for about a hour and can't seem to find exactly how to do this.
Thanks and have a nice day!

Well, you sort of answer your question. You seem to want max(id):
SELECT email, COUNT(email) AS occurences, max(id)
FROM wineries
GROUP BY email
HAVING (COUNT(email) > 1);
You can delete the others using the statement. Delete with join has a tricky syntax where you have to list the table name first and then specify the from clause with the join:
delete wineries
from wineries join
(select email, max(id) as maxid
from wineries
group by email
having count(*) > 1
) we
on we.email = wineries.email and
wineries.id < we.maxid;
Or writing this as an exists clause:
delete from wineries
where exists (select 1
from (select email, max(id) as maxid
from wineries
group by email
) we
where we.email = wineries.email and wineries.id < we.maxid
)

select email, max(id), COUNT(email) AS occurences
FROM wineries
GROUP BY email
HAVING (COUNT(email) > 1);

delete from wineries
where id not in
(
select * from
(
select min(id)
from wineries
group by email
) x
)
You need a subquery to trick MySQL to delete from a table it is selecting from at the same time.

DELETE duplicates.*
FROM wineries
JOIN wineries AS duplicates USING (email)
WHERE duplicates.id < wineries.id;
play with it on sqlfiddle.com

This is the simplest option:
DELETE FROM wineries
WHERE id NOT IN
(
SELECT MIN(id) id
FROM wineries
GROUP BY email
);
This will only keep the first inserted record for each email address, all other records will be deleted. Credit for this answer should go to #juergen d since this is just a revised version of his answer.

Related

mysql probem on Leetcode

Here is a question from LEETCODE.I don't know why my output is wrong. First I write the SELECT in the parenthesis to find out the repeated email address. Then I use the DELETE to filter out the repeated email address so anyone know what is wrong with my code? questionmycode
output
it is very simple. try this
-- Solution 1
with cte as
(
select id, email, Rank() OVER (partition by email order by id) ranks
from person where email in(
select email from person
group by email having count(email) >1
)
)
DELETE FROM person where id in
(
SELECT id FROM CTE where ranks!=1
)
-- Solution 2
DELETE p from person p
inner join (
select MIN(id) id, email from person
where email in(
select email from person group by email having count(email)>1
) group by email
) A On P.Id>A.id and p.email = a.email;

Count the number of occurrences of each email address

I have a mySQL workbench table called table_contacts, with the fields:
user_id and PrimaryEmail
I want to write a query that, for each row in the table will return:
User_id, PrimaryEmail and Number of occurrences of that email address in the table. So I want the following table to be returned:
I know I need to use a sub query. So far I have:
select user_id, PrimaryEmail,
(select Count(PrimaryEmail) from table_contacts where PrimaryEmail = table_contacts.PrimaryEmail)
from table_contacts
But this is returning the count of all email addresses in the table.
What am I doing wrong?
The solution of Simone and Grażynka will group by address, so you will lose some row each time the email address is more than one time.
To display all row with a count of same email, you can do :
SELECT t1.user_id, t1.PrimaryEmail, (SELECT COUNT(*) FROM table_contacts t2 WHERE t2.PrimaryEmail = t1.PrimaryEmail) FROM table_contacts t1
try this:
select user_id, PrimaryEmail, Count(PrimaryEmail)
from table_contacts
group by PrimaryEmail
in SQL tryit editor a similar query would be
SELECT customerid,count(country),country FROM [Customers] group by country
but in this case you receive only the count of each email (one row for each email). Other (better) solutions have been proposed if you want to list all the rows with the couunt added.
Try this one:
Select user_id, primaryemail, count(*)
from table_contacts
group by user_id, primaryemail
You need a group by, not a subquery
something like
select user_id, PrimaryEmail, Count(PrimaryEmail)
from table_contacts
group by PrimaryEmail
This should do the job:
select t1.user_id, t1.PrimaryEmail, count(*)
from table_contacts t1
join table_contacts t2 on t1.PrimaryEmail = t2.PrimaryEmail
group by t1.user_id, t1.PrimaryEmail
order by t1.user_id;

MySQL look for duplicates on multiple fields

I have a MySQL database with the following fields:
id, email, first_name, last_name
I want to run an SQL query that will display rows where id and email exists more than once.
Basically, the id and email field should only have one row and I would like to run a query to see if there are any possible duplicates
If you just want to return the id and email that are duplicated, you can just use a GROUP BY query:
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
if you also want to return the full rows, then you have to join the previous query back:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
) s
ON yourtable.id = s.id AND yourtable.email=s.email
You'll want something like this:
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
See also this question.
You can search for all ids that meet a specific count by grouping them and using a having clause like this:
SELECT id, COUNT(*) AS totalCount
FROM myTable
GROUP BY id
HAVING COUNT(*) > 1;
Anything this query returns has a duplicate. To check for duplicate emails, you can just change the column you're selecting.

How do I remove duplicates row in SQL? [duplicate]

This question already has answers here:
Delete Duplicate SQL Records
(2 answers)
Closed 8 years ago.
My table name is emails.
My table structures looks like:
I want to remove all of the duplicated emails. I tried this query:
DELETE FROM emails WHERE email NOT IN (SELECT MIN(email)
FROM emails GROUP BY email)
But with no result. Can someone help me with this?
The query that you are looking for would use id, not email:
DELETE FROM emails
WHERE id NOT IN (SELECT MIN(id) FROM emails GROUP BY email) ;
EDIT:
You are using MySQL. You can get around this with the subquery hack:
DELETE FROM emails
WHERE id NOT IN (select minid from (SELECT MIN(id) as minid FROM emails GROUP BY email) e) ;
Or, you can use a join:
delete e
from emails e left outer join
(select min(id) as minid
from emails
group by email
) em
on e.id = e.minid
where em.id is null;
Try this instead
--CREATE a Temporary table
create table IDsToRemove (ID Int)
INSERT INTO IDStoRemove
SELECT MIN(id) _
FROM emails GROUP BY email
DELETE FROM emails WHERE id NOT IN (SELECT id from IDStoRemove)
I don't know the exact mySQL syntax, but should give you the idea
Maybe you (or someone) wants to delete records that are unique, I'll just leave this stolen answer here[0].
DELETE Emails
FROM Emails
LEFT OUTER JOIN (
SELECT MIN(id) as id, email, pwd
FROM Emails
GROUP BY email, pwd
) as KeepRows ON
Emails.id = KeepRows.id
WHERE
KeepRows.id IS NULL
0.How can I remove duplicate rows?

Using sql to find duplicate records and delete in same operation

I'm using this SQL statement to find duplicate records:
SELECT id,
user_id,
activity_type_id,
source_id,
source_type,
COUNT(*) AS cnt
FROM activities
GROUP BY id, user_id, activity_type_id, source_id, source_type
HAVING COUNT(*) > 1
However, I want to not only find, but delete in the same operation.
delete from activities where id not in (select max(id) from activities group by ....)
Thanks to #OMG Ponies and his other post here is revised solution (but not exactly the same). I assumed here that it does not matter which specific rows are left undeleted. Also the assumption is that id is primary key.
In my example, I just set up one extra column name for testing but it can be easily extended to more columns via GROUP BY clause.
DELETE a FROM activities a
LEFT JOIN (SELECT MAX(id) AS id FROM activities GROUP BY name) uniqId
ON a.id=uniqId.id WHERE uniqId.id IS NULL;