From this SO POST Finding Duplicates, how can I delete duplicates.
SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
just use the DISTINCT keyword:
SELECT DISTINCT firstname FROM list;
if any of the output is a duplicate, mysql will remove them.
for more documentation on DISTINCT go here:
http://www.cyberciti.biz/faq/howto-removing-eliminating-duplicates-from-a-mysql-table/
Related
I need to find duplicate uses based on either same email OR first_name, last_name combination OR same birth_date. What I could comfortably try was:
SELECT id, first_name, last_name
FROM users
where id IN (SELECT id
from users
GROUP BY email
HAVING count(*) > 1)
GROUP BY email, id;
The above gives only duplicate email details, but I'm bit confused about handling other conditions based on first_name, last_name combination OR same birth_date as well.
Is it possible to achieve it in a single query?
Try doing a UNION of three separate queries which checks for the three duplicate criteria:
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT email) > 1
UNION
(
SELECT id
FROM users t1
INNER JOIN
(
SELECT firstname, lastname
FROM users
GROUP BY firstname, lastname
HAVING COUNT(*) > 1
) t2
ON t1.firstname = t2.firstname AND
t1.lastname = t2.lastname
)
UNION
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT birthdate) > 1
I have a mySQL workbench table called table_contacts, with the fields:
user_id and PrimaryEmail
I want to write a query that, for each row in the table will return:
User_id, PrimaryEmail and Number of occurrences of that email address in the table. So I want the following table to be returned:
I know I need to use a sub query. So far I have:
select user_id, PrimaryEmail,
(select Count(PrimaryEmail) from table_contacts where PrimaryEmail = table_contacts.PrimaryEmail)
from table_contacts
But this is returning the count of all email addresses in the table.
What am I doing wrong?
The solution of Simone and Grażynka will group by address, so you will lose some row each time the email address is more than one time.
To display all row with a count of same email, you can do :
SELECT t1.user_id, t1.PrimaryEmail, (SELECT COUNT(*) FROM table_contacts t2 WHERE t2.PrimaryEmail = t1.PrimaryEmail) FROM table_contacts t1
try this:
select user_id, PrimaryEmail, Count(PrimaryEmail)
from table_contacts
group by PrimaryEmail
in SQL tryit editor a similar query would be
SELECT customerid,count(country),country FROM [Customers] group by country
but in this case you receive only the count of each email (one row for each email). Other (better) solutions have been proposed if you want to list all the rows with the couunt added.
Try this one:
Select user_id, primaryemail, count(*)
from table_contacts
group by user_id, primaryemail
You need a group by, not a subquery
something like
select user_id, PrimaryEmail, Count(PrimaryEmail)
from table_contacts
group by PrimaryEmail
This should do the job:
select t1.user_id, t1.PrimaryEmail, count(*)
from table_contacts t1
join table_contacts t2 on t1.PrimaryEmail = t2.PrimaryEmail
group by t1.user_id, t1.PrimaryEmail
order by t1.user_id;
I have a MySQL database with the following fields:
id, email, first_name, last_name
I want to run an SQL query that will display rows where id and email exists more than once.
Basically, the id and email field should only have one row and I would like to run a query to see if there are any possible duplicates
If you just want to return the id and email that are duplicated, you can just use a GROUP BY query:
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
if you also want to return the full rows, then you have to join the previous query back:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
) s
ON yourtable.id = s.id AND yourtable.email=s.email
You'll want something like this:
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
See also this question.
You can search for all ids that meet a specific count by grouping them and using a having clause like this:
SELECT id, COUNT(*) AS totalCount
FROM myTable
GROUP BY id
HAVING COUNT(*) > 1;
Anything this query returns has a duplicate. To check for duplicate emails, you can just change the column you're selecting.
Ive got a table like this, where I'm looking for unnecessary duplicate rows:
I want to find any rows where the First Name, Last Name, and Occupation columns are identical - in this case rows 1 and 3. I don't want to specify what the identical values should be as I dont know.
I've tried the answer to this question, but I dont think it applies to this case.
simple solution is to add a HAVING clause where there are duplicates after grouping by all three columns
SELECT
ID, FirstName, LastName, Occupation, Age
FROM table1
GROUP BY
FirstName,
LastName,
Occupation
HAVING COUNT(*) > 1
here is a DEMO with two duplicate rows to ensure it works properly
EDIT:
my first understanding was you wanted one row returned when it has duplicates.. if you want a query that will return all duplicate rows..
then here it is... this will return rows 1 and 3
SELECT p1.* FROM people p
JOIN people p1
ON p1.firstname = p.firstname
AND p1.lastname = p.lastname
AND p1.occupation = p.occupation
GROUP BY id
HAVING COUNT(*) > 1;
another DEMO
Self join, 3 times (untested): SELECT a.* from your_table a, your_table b, your_table c, your_table d
where
a.fname = b.fname and a.lname=c.lname and a.occupation=d.occupation
I've been researching the proper way to find duplicate rows based on specific fields for days now. I think I need a little more help -
SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId;
The goal is to take the duplicates and put them in an archive table (enrollees_duplicates), then delete the duplicates from the live table (enrollees). I tried writing one query to find and insert the duplicate rows but it gives me the following error:
"Column count doesn't match value count at row 1"
The query I tried using is:
INSERT INTO enrollees_duplicates (SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId);
I assume it is because I'm not retrieving all of the columns in the INNER JOIN select? If that's the case, wouldn't it still throw the same error if I changed it to SELECT * (with the MinId and count additions) because there would be two extra columns that don't exist in the new table?
Is there any way to do all of the work with an SQL query without having to SELECT the duplicates, store them in a PHP array, and then use another SQL query to pull each row, INSERT it into the duplicate table, and then another SQL query to delete the duplicate row.
My intention was to use two queries. One to insert all duplicate rows into the archive table and another to delete the duplicate rows. If it could, somehow, be made into one query that finds the duplicates, inserts them into the archive table, and then deletes them - all in one run, that would be even better.
Being new to this field, Any help or guidance would be appreciated.
"Column count doesn't match value count at row 1"
Tables enrollees_duplicates and enrollees have diffrent structure.
Might be better to use ON DELETE TRIGGER ? (http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html).
The solution to my problem is that when my first select was just '*', it was adding the two additional columns (MinId, count) to the result which made the column count different. By only grabbing the results of the 'enrollees' table and not the additional parameters of the subquery too, it corrects the column difference.
INSERT INTO enrollees_duplicates (SELECT enrollees.*
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId);