I've been researching the proper way to find duplicate rows based on specific fields for days now. I think I need a little more help -
SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId;
The goal is to take the duplicates and put them in an archive table (enrollees_duplicates), then delete the duplicates from the live table (enrollees). I tried writing one query to find and insert the duplicate rows but it gives me the following error:
"Column count doesn't match value count at row 1"
The query I tried using is:
INSERT INTO enrollees_duplicates (SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId);
I assume it is because I'm not retrieving all of the columns in the INNER JOIN select? If that's the case, wouldn't it still throw the same error if I changed it to SELECT * (with the MinId and count additions) because there would be two extra columns that don't exist in the new table?
Is there any way to do all of the work with an SQL query without having to SELECT the duplicates, store them in a PHP array, and then use another SQL query to pull each row, INSERT it into the duplicate table, and then another SQL query to delete the duplicate row.
My intention was to use two queries. One to insert all duplicate rows into the archive table and another to delete the duplicate rows. If it could, somehow, be made into one query that finds the duplicates, inserts them into the archive table, and then deletes them - all in one run, that would be even better.
Being new to this field, Any help or guidance would be appreciated.
"Column count doesn't match value count at row 1"
Tables enrollees_duplicates and enrollees have diffrent structure.
Might be better to use ON DELETE TRIGGER ? (http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html).
The solution to my problem is that when my first select was just '*', it was adding the two additional columns (MinId, count) to the result which made the column count different. By only grabbing the results of the 'enrollees' table and not the additional parameters of the subquery too, it corrects the column difference.
INSERT INTO enrollees_duplicates (SELECT enrollees.*
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId);
Related
Background: I have an orders table that contains address columns. I would like to update these with randomly picked addresses taken from a temporary table
Both tables contain address, address1, city and postcode columns
I was thinking the query would be something like:
UPDATE orders (address, address1, city, postcode)
VALUE
(SELECT address, address1, city, postcode
FROM addresses
ORDER BY RAND()
LIMIT 1)
Edit: Note that it needs update all rows with different values.
UPDATE orders
JOIN ( SELECT address, address1, city, postcode
FROM addresses
ORDER BY RAND()
LIMIT 1
) AS newdata
SET orders.address = newdata.address,
orders.address1 = newdata.address1,
orders.city = newdata.city,
orders.postcode = newdata.postcode
WHERE orders.id = 72;
I want to retrieve duplicates for all column 1 2 and 3 and 4
What I have is this query
But that is not the result i want i just want this query 2
I want to retrieve same name, same dob but different fname
This is what i have in my code :
SELECT *
FROM demo WHERE (name, dob) in
(SELECT name, dob
FROM demo
GROUP BY name, dob
HAVING count(*) > 1)
ORDER BY name ASC
This query is based on self join, when it joins this table it will iterate each record and compare with it. Based on question requirement, I have made conditions and if it finds same "name" and "dob" with different "fname", it will be in the last result.
select a.* from TestStack a, TestStack b where a.name = b.name
and a.dob = b.dob and a.fname <> b.fname
I want to know how can I find duplicate value in a table over two columns combined.
suppose my table has fields as id || name || father_name || region || dob
now how can I find results set such as:
.ie I want to find all rows where three columns are same.
select t1.*
from your_table t1
join
(
select name, father_name, region
from your_table
group by name, father_name, region
having count(*) >= 3
) t2 on t1.name = t2.name
and t1.father_name = t2.father_name
and t1.region = t2.region
If you are using MySql 8.0, you could make use of window function. Below query with such function returns exact output:
select id, name, fatherName, country from (
select id,
name,
fatherName,
country,
count(id) over (partition by name, fatherName, country) cnt
from Tbl
) `a` where cnt > 1;
Actually, i also need this type of feature many times, where i need to compare all columns with same value except auto incremented primary key id column.
So, in that case i always use group by keyword.
Example,
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT name,city,state
FROM YourTable
GROUP BY name,city,state
HAVING COUNT(*) > 1) B
ON A.name = B.name AND A.city = B.city AND A.state = B.state
You can append the number of columns which you want to compare
Hope, This might help you in your case also.
I'm attempting to update a MySQL table to show column name 'processed' as '2' if there is duplicate entries for 'name' and 'address_1', but it's not working - as usual I think I'm just being a bit of a moron..
Here's what I'm trying
UPDATE `records`
SET `processed`='2', `count` = (SELECT COUNT(`user`)
FROM `records`
WHERE `name`<>''
AND `address_1`<>'')
WHERE `count`=> '1';
Basically, if there's more than one 'name' and 'address_1' then the 'processed' field needs updating to '2'..
You could use a query like this one to return duplicated names and addresses:
SELECT name, address_1, COUNT(*) cnt
FROM records
GROUP BY name, address_1
HAVING COUNT(*)>1
and then join this query to the records table, and update the column processed to 2 where the join succeeds:
UPDATE
records INNER JOIN (SELECT name, address_1, COUNT(*) cnt
FROM records
GROUP BY name, address_1
HAVING COUNT(*)>1) duplicates
ON records.name = duplicates.name
AND records.address_1=duplicates.address_1
SET
`processed`='2',
`count` = duplicates.cnt
WHERE
records.`name`<>''
AND records.`address_1`<>''
I have a MySQL TABLE. It contains mailing addresses we get from a data feed. But there are no customer records for the mailing addresses, so I don't have an easy way to match a customer record as a key to see if it exists already in the master TABLE. So I've decided to have the new daily data feed added to the master TABLE and then remove duplicates.
What is the safest way to remove duplicates? Obviously, I want to ignore the ID column field. But how do I do this for the following fields:
company_name
contact_name
address1
address2
address3
city
state
zipcode
phone_number
email_address
What if I rebuild the MySQL TABLE to include ALTER TABLE with UNIQUE KEY, would that be safe? For example:
ALTER TABLE people ADD UNIQUE KEY (company_name,contact_name,address1,address2,address3,city,state,zipcode,phone_number,email_address)
Would the above safely prevent duplicated records from being INSERTed to begin with?
Thanks!
This is simplest Query you can use
Choose Max or min based on your requirement.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
Thanks
DELETE a FROM test a
LEFT JOIN
(
SELECT MIN(id) AS id, company_name, contact_name, address1, address2, address3, city, state, zipcode, phone_number, email_address
FROM test
GROUP BY company_name, contact_name, address1, address2, address3, city, state, zipcode, phone_number, email_address
) b ON a.id = b.id AND a.company_name = b.company_name AND a.contact_name = b.contact_name AND a.address1 = b.address1 AND a.address2 = b.address2 AND a.address3 = b.address3 AND a.city = b.city AND a.state = b.state AND a.zipcode = b.zipcode AND a.phone_number = b.phone_number AND a.email_address = b.email_address
WHERE b.id IS NULL