Find duplicates using MySQL considering multiple columns - mysql

I need to find duplicate uses based on either same email OR first_name, last_name combination OR same birth_date. What I could comfortably try was:
SELECT id, first_name, last_name
FROM users
where id IN (SELECT id
from users
GROUP BY email
HAVING count(*) > 1)
GROUP BY email, id;
The above gives only duplicate email details, but I'm bit confused about handling other conditions based on first_name, last_name combination OR same birth_date as well.
Is it possible to achieve it in a single query?

Try doing a UNION of three separate queries which checks for the three duplicate criteria:
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT email) > 1
UNION
(
SELECT id
FROM users t1
INNER JOIN
(
SELECT firstname, lastname
FROM users
GROUP BY firstname, lastname
HAVING COUNT(*) > 1
) t2
ON t1.firstname = t2.firstname AND
t1.lastname = t2.lastname
)
UNION
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT birthdate) > 1

Related

Eliminate the duplicate rows from the table in SQL

I want to eliminate the duplicate rows based on email from the table and retrieve all the rows without duplicates.
I have tried using distinct but I'm not getting desired results.
SELECT
DISTINCT Email
FROM
Users
Example Table:
Id
Email
Username
1
sam#gmail.com
sam1122
2
john#gmail.com
john1122
3
sam#gmail.com
sam2233
4
lily#gmail.com
lily#as
What I want to retrieve:
Id
Email
Username
1
john#gmail.com
john1122
2
lily#gmail.com
lily#as
We can try using exists logic here:
SELECT Id, Email, Username
FROM Users u1
WHERE NOT EXISTS (
SELECT 1
FROM Users u2
WHERE u2.Email = u1.Email AND
u2.Id <> u1.Id
);
You can do it using left join :
select u.*
from Users u
left join (
select email, max(id) as Id
from Users
group by email
having count(1) > 1
) as s on s.email = u.email
where s.email is null;
Demo here
Yet another option, if you are using MySQL 8 -
SELECT Id, Email, Username
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY Email) AS cnt
FROM Users
) t
WHERE t.cnt = 1;
SELECT Id, Email, Username
FROM Users
WHERE Email IN (
SELECT Email
FROM Users
GROUP BY Email
HAVING COUNT(*) = 1
)
SELECT id,
Email,
Username,
count(*) AS duplicate_email_count
FROM Users
GROUP BY Email
HAVING duplicate_email_count=1

SQL Query with COUNT, Having Count >1, display full details of duplicates

I have a table like :
name employment_Status email
---- ---- -----
David E David#email.com
John U John#email.com
Michael E Michael#email.com
Steve E Michael#email.com
James U David#email.com
Mary U Mary#email.com
Beth E Beth#email.com
I started by selecting email and count(email):
SELECT email, COUNT(email) AS emailCount
FROM Table
GROUP BY email
HAVING ( COUNT(email) > 1 );
The problem occurred when I tried to include name as well:
SELECT name, email, COUNT(email) AS emailCount
FROM Table
GROUP BY name, email
HAVING ( COUNT(email) > 1 );
I would like to find all people with a duplicate email addresses, (only where both people are employed (E)). However it is returning zero results.
I'd like to be able to display all information for people with duplicate emails, and having employment_Status E. If two people have the same email, but one or both is Unemployed (U), then just ignore.
Could anyone advise?
I think you want exists:
select t.*
from t
where t.employeed = 'E' and
exists (select 1
from t t2
where t2.email = t.email and t2.employeed = 'E' and
t2.name <> t.name
);
Note that this assumes that name (or at least name/email) is unique.
In MySQL 8+, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by t.email) as cnt
from t
where t.employeed = 'E'
) t
where cnt >= 2;
One way would be to use your query as a subquery in FROM clause, and JOIN the result with the main table.
SELECT t.*, d.emailCount
FROM (
SELECT email, employment_Status, COUNT(*) AS emailCount
FROM my_table
GROUP BY email
WHERE employment_Status = 'E'
HAVING emailCount > 1
) d
JOIN my_table t USING(email, employment_Status)
You could also use GROUP_CONCAT(name), if you are fine getiing the names in a (comma) separated string:
SELECT email, COUNT(*) AS emailCount, GROUP_CONCAT(name) as names
FROM my_table
GROUP BY email
WHERE employment_Status = 'E'
HAVING emailCount > 1
The result for your sample data would be:
email emailCount names
-----------------------------------------------
Michael#email.com 2 Michael,Steve

MySQL look for duplicates on multiple fields

I have a MySQL database with the following fields:
id, email, first_name, last_name
I want to run an SQL query that will display rows where id and email exists more than once.
Basically, the id and email field should only have one row and I would like to run a query to see if there are any possible duplicates
If you just want to return the id and email that are duplicated, you can just use a GROUP BY query:
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
if you also want to return the full rows, then you have to join the previous query back:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
) s
ON yourtable.id = s.id AND yourtable.email=s.email
You'll want something like this:
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
See also this question.
You can search for all ids that meet a specific count by grouping them and using a having clause like this:
SELECT id, COUNT(*) AS totalCount
FROM myTable
GROUP BY id
HAVING COUNT(*) > 1;
Anything this query returns has a duplicate. To check for duplicate emails, you can just change the column you're selecting.

SQL Query - Group By Query

I have the following query :
SELECT directory_auth_id, first_name, last_name, COUNT(user_info.directory_auth_id) as Duplication
FROM user_info
GROUP BY directory_auth_id, first_name, last_name
HAVING COUNT(*) > 1
ORDER BY directory_auth_id ASC
This gives me the desired results and shows me all records that meet the criteria.. What it does not do, is show me ALL the records.. How do I see all records that have been matched.
Thanks,
Boardman.
Assuming the SQL Server tag is correct, the best approach is to use window functions:
select ui.*
from (select ui.*, count(*) over (partition by directory_auth_id, first_name, last_name) as cnt
from user_info ui
) ui
where cnt > 1
order by cnt desc, directory_auth_id, first_name, last_name;
Unfortunately, MySQL does not support this ANSI standard functionality. But there are other approaches to solving the problem.
For SQL Server only...
To display all rows related only to the directory_auth_id that exist more than once, you have to determine which directory_auth_id have duplication, then use that resultset as a filter on the main table.
This will accomplish that.
;WITH DUPES
AS
(
SELECT directory_auth_id
FROM user_info
GROUP BY directory_auth_id
HAVING COUNT(*) > 1
)
SELECT directory_auth_id, first_name, last_name
FROM user_info T1
JOIN DUPES T2 ON T1.directory_auth_id = T2.directory_auth_id
This may work for you.
SELECT * FROM user_info where (directory_auth_id,first_name,last_name) in (
SELECT directory_auth_id, first_name, last_name
FROM user_info
GROUP BY directory_auth_id, first_name, last_name
HAVING COUNT(*) > 1
)
ORDER BY directory_auth_id ASC
Try the following, this puts your query into an inline view and then joins to it based on the three relevant fields. So you would get all records from user_info that have more than one line for a (directory_auth_id, first_name, last_name) combination.
select x.*
from user_info x
join (select directory_auth_id,
first_name,
last_name,
count(*) as duplication
from user_info
group by directory_auth_id, first_name, last_name
having count(*) > 1) y
on x.directory_auth_id = y.directory_auth_id
and x.first_name = y.first_name
and x.last_name = y.first_name
This is for MSSQL:
select
directory_auth_id,
first_name,
last_name,
case when count(user_info.directory_auth_id)
over (partition by directory_auth_id, first_name, last_name) > 1
then count(user_info.directory_auth_id)
over (partition by directory_auth_id, first_name, last_name)
end as Duplication
from user_info;

mysql - Find duplicate users with firstname/lastname swapped

I need to find users that have been inserted twice in a table, but with their first name & last name swapped.
e.g. Bob Smith is in the database as
firstname:Bob
lastname:Smith
&
firstname:Smith
lastname:Bob
What's the best query to find those users?
The server runs mysql.
Select
*
FROM UserTable ut
JOIN UserTable ut2 on ut2.firstname = ut.lastname and ut2.lastname = ut.firstname
SELECT
firstname, lastname
FROM
(
SELECT firstname, lastname FROM MyTable -- WHERE firstname <> lastname
UNION ALL
SELECT lastname, firstname FROM MyTable -- WHERE firstname <> lastname
) foo
GROUP BY
firstname, lastname
HAVING
COUNT(*) > 1
SELECT orig.firstname, orig.lastname
FROM yourtable AS orig
INNER JOIN yourtable AS dupes ON orig.firstname = dupe.lastname AND orig.lastname = dupe.firstname
Basically, do a self-join on the user table, but only on the records where the fn/ln dupe-swap occurs.