finding duplicate rows on more than one field

finding duplicate rows on more than one field - mysql

I am using this query to find duplicates based on two fields:
SELECT
last_name,
first_name,
middle_initial,
COUNT(last_name) AS Duplicates,
IF(rec_id = '', 1, 0) AS has_REC_ID
FROM files
GROUP BY last_name, first_name
HAVING COUNT(last_name) > 1 AND COUNT(first_name) > 1;
Okay, what this returns is a set of rows with first, last, and middle names, a column called 'Duplicates' with a lot of 2s, and a column called has_REC_ID with mixed 1s and 0s.
Ultimately, what I'm trying to do is find which rows have matching first and last names--and then for each of those pairs, find the one that has ('') as a value for rec_id, assign the rec_id value from the one that DOES have a rec_id, and then delete the record that had a rec_id in the first place.
So for starters I though I would create a new column and do something like this:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
But MySQL returns: "You can't specify target table 'a' for update in from clause"
I'll bet there's something much less ridiculous than the method I'm trying here. Can someone please help me figure out what that is?
UPDATE: I also tried:
UPDATE files a
SET a.has_dup = 1
WHERE a.last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
);
...and got the same error message.

You could:
1) Create a holding table
2) Populate the holding table with those rows that have a matching first and last name and have rec_id != ""
3) Delete the rows from the original table (files) that have a matching first and last name and have rec_id != ""
4) Update the rows in the original table that have a matching first and last name and have rec_id = "".
5) Drop the holding table
So something like:
create table temp
(
firstname varchar(100) not null,
lastname varchar(100) not null,
rec_id int not null
);
insert into temp (select firstname,lastname,rec_id from files where firstname = lastname and rec_id != '');
delete from files where firstname = lastname and rec_id != '';
update files f
set f.rec_id = (select t.rec_id from temp t where f.firstname = t.firstname and f.lastname = t.lastname)
where f.firstname = f.lastname
and f.rec_id != '';
drop table temp;

From the documentation:
Currently, you cannot update a table and select from the same table in a subquery.
I can't think of a quick workaround to that.
Update
Apparently, there is a "quick" workaround, but whether or not it's performant is another issue. It's all about adding a new layer of indirection by introducing a temporary table:
UPDATE files a
SET a.has_dup --new column
= if(a.last_name IN (
SELECT b.last_name
FROM
(SELECT * FROM files) -- new table target
b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
),
1, null);

I don't have any MySQL to test, but this I think this should be work: (EDITED->FAIL)
UPDATE files
SET has_dup
= if(last_name IN (
SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
)
, 1, null);
EDITED: Another try:
UPDATE files f, (SELECT b.last_name
FROM files b
GROUP BY b.last_name
HAVING COUNT(b.last_name) > 1
) as duplicates
SET f.has_dup = 1
WHERE f.last_name = duplicates.last_name

Related

Merge two databases, without duplicates, with FKs references [duplicate]

I have two mdb files.
I can also convert it to MySQL database, if necessary.
How can I merge these two different dbs to a single one?
The idea is to get all info form both dbs and merge into one, without duplicating any client.
The problem is that both bds have the same clients, and different ones, but the PKs of the clients aren't the same on them.
Every line has a unique field, I guess it can help somehow.
Any idea of how can I do that?

Select a UNION all columns except the PKs will give you only distinct rows:
insert into new_table (<non-pk columns>)
select <non-pk columns> from tableA
union
select <non-pk columns> from tableB
Note: union removes duplicates.

I would run an UPDATE to populate one of the tables w/ all info available.
Assuming the first table has all names that the second table has (that there are no name values in table 2 that are not in table 1) you should be able to run the following update to make the first table complete:
update tclient1 t join (select name,
max(tel) as tel_filled,
max(address) as add_filled
from (select name, tel, address
from tclient1
union all
select name, tel, address
from tclient2) x
group by name) x on t.name = x.name
set t.tel = x.tel_filled and t.address = x.add_filled;
See fiddle: http://sqlfiddle.com/#!2/3e7dc/1/0

Disable foreign keys (see here)
Update FK in the 2nd DB so make them unique, for instance:
update Client
set id_client = id_client + 100000000;
update History
set id_client = id_client + 100000000,
id_history = id_history + 10000000;
Enable FKs to check integrity
Export 2nd DB as SQL-inserts and execute it in the 1st DB.
Use backups, please.

Here is one approach that assumes that name is the match between the two rows. It just counts the numbers that are filled in and chooses the appropriate source. This version uses union all with a comparison in the where using >= or <:
insert into client(id, name, tel, address)
select id, name, tel, address
from db1.client c1
where ((id is not null) + (tel is not null) + (address is not null)) >=
(select (id is not null) + (tel is not null) + (address is not null)
from db2.client c2
where c1.name = c2.name
)
)
union all
select id, name, tel, address
from db2.client c2
where ((id is not null) + (tel is not null) + (address is not null)) >
(select (id is not null) + (tel is not null) + (address is not null)
from db1.client c1
where c2.name = c1.name
)
);
Note: the above version assumes that name is in both tables (as in the example in your question) and there are no duplicates. It can be easily modified if this isn't the case.

mysql - How can I correct an auto-increment field that has deleted rows (1,2,3,4,5 - is now 1,3,5) but I want it to be 1,2,3

I have a table in a database that I made using an "auto-increment" primary key. Only now I went back and deleted some of the rows.
The problem is, that I would like the table to be 1->X without any holes in the numbered column (in other words it should be 1,2,3,4,5,6,7,8,9,10 - but now it's more like 1,2,3,7,8,10.
Is there a way I can re-set the values of this column to be incremented correctly via mysql???

Use a user defined variable, which is incremented during every row, and an order by to prevent duplicate id's *during the update process:
SET #x := 0;
UPDATE mytable SET
id = (#x := #x + 1)
ORDER BY id;
See an SQLFiddle

You can put it in select process:
SELECT #s:=#s+1 AS fake_id, tbl.*
FROM tbl, (SELECT #s:= 0) AS tmp_tbl

UPDATE yourtable t
JOIN (SELECT id oldid, #id := #id + 1 newid
FROM yourtable
JOIN (SELECT #id := 0) var
ORDER BY oldid) newid
ON t.id = oldid
SET t.id = newid
The ORDER BY and ON conditions are needed to prevent it from temporarily creating duplicate keys, which causes an error.
DEMO

MySQL - tell if column _all_ has same value

I'm trying to write a query like
if (select count(*) from Users where fkId=5000 and status='r') =
(select count(*) from Users where fkId=5000) then ..
in just one query.
What this means is, if all the rows that have fkId=5000 also have status=r, then do something.
There can be any number of rows with fkId=5000, and any fraction of those rows could have status=r, status=k, status=l, status=a etc. I'm interested in the case where ALL the rows that have fkId=5000 also have status=r (and not any other status).
The way I'm doing it now is
how many rows with id=5000 and status = 'r'?
how many rows with id=5000?
are those numbers equal? then ..
I'm trying to figure out how to rewrite this query using only 1 query, instead of 2. Keyword ALL didn't seem to be able to write such a query (<> ALL is equivalent to NOT IN). I tried a couple of GROUP BY formulations but could not get the correct result to appear.

The most efficient way to do this is:
if not exists (select 1
from users
where fkid = 5000 and (status <> 'r' or status is null)
)
This will stop the query at the first non-matching row.

I suggest you to check for any rows with status not equal to 'r'
SELECT count(*)>0 FROM Users WHERE fkId = 5000 AND status != 'r'

In the following case, if the number 1 is "true" (which it is) then you'll get Yes back, and if not you'll get No back:
SELECT IF(1, 'Yes', 'No') AS yesorno
(Go ahead -- try it!)
In your case however, the following would be more appropriate:
SELECT IF (
(SELECT COUNT(*) FROM Users WHERE fkId=5000 AND status IN('r') AND status NOT IN('1', 'a', 'k')) = (SELECT COUNT(*) FROM Users WHERE fkId=5000),
'They are equal.',
'They are not equal.'
)
AS are_they_equal
By adding AS, you can manipulate the name of the "column" that's returned to you.
Hope that helps... Also, see this page if you'd like more info.
:)

EASY!
Simply join back to the same table. Here is the complete code for testing:
CREATE TABLE Users(id int NOT NULL AUTO_INCREMENT, fkID int NOT NULL, status char(1), PRIMARY KEY (id));
INSERT Users (fkID, status) VALUES (5000, 'r');
INSERT Users (fkID, status) VALUES (5000, 'r');
INSERT Users (fkID, status) VALUES (5000, 'r');
-- The next query produces "0" to indicate no miss-matches
SELECT COUNT(*) FROM Users u1 LEFT JOIN Users u2 ON u1.id=u2.id AND u2.status='r' WHERE u1.fkID=5000 AND u2.id IS NULL;
-- now change one record to create a miss-match
UPDATE Users SET status='l' WHERE id=3 ;
-- The next query produces "1" to indicate 1 miss-match
SELECT COUNT(*) FROM Users u1 LEFT JOIN Users u2 ON u1.id=u2.id AND u2.status='r' WHERE u1.fkID=5000 AND u2.id IS NULL;
DROP TABLE Users;
So all you need to test for in the result is that it's 0 (zero) meaning everything has fkID=5000 also has status='r'
If you properly index your table then joining back to the same table is not an issue and certainly beats having to do a 2nd query.

Besides the NOT EXISTS version - which should be the most efficient as it does no counting at all and exits as soon as it finds a value that doesn't match the conditions, there is one more way, that will work if status is not nullable and will be efficient if there is an index on (fkId, status):
IF EXISTS
( SELECT 1
FROM Users
WHERE fkId = 5000
HAVING MIN(status) = 'r'
AND MAX(status) = 'r'
)
There is one difference though. The above will show false if there are no rows at all with fkId=5000, while the NOT EXISTS version will show true - which is probably what you want anyway.

Insert values in new column in Mysql

I have a table T with some data having 3 rows. Now I added a new column c.
Now I want to insert values into c for existing rows.
I do it like this :
insert into T (c) values(1),(2),(3);
But instead of updating existing data, it inserted new rows.
How can I update existing data ?
I don't want to specify where clause. I just want to add values serial wise as insert does.

You would use the UPDATE statement to assign values to columns of existing rows.
UPDATE t
SET t.c = 1
WHERE t.a = 1 ;
To assign unique, sequential integer to each existing row, you'd need to make use of the primary key, or a unique identifier from each row. In this example, we assume that the id column is unique:
UPDATE t
JOIN ( SELECT r.id
, #i := #i + 1 AS i
FROM t r
JOIN (SELECT #i := 0) n
ORDER BY r.id
) s
ON s.id = t.id
SET t.c = s.i
Actually, you can also do this:
UPDATE t
JOIN ( SELECT #i := 0 ) n
SET t.c = #i := #i + 1
ORDER BY t.id
It sounds like you might want to investigate the AUTO_INCREMENT attribute.

You can use below query
UPDATE Table
SET Col1='1', Col2='2'
and so on...
Also, you can use where condition, if you want to update data with some specified conditions, as below:
---it's just an example of query used in my database---
UPDATE City_Employee
SET City='Banglore' where (City='Delhi')
Best of luck !

Update with SELECT and group without GROUP BY

I have a table like this (MySQL 5.0.x, MyISAM):
response{id, title, status, ...} (status: 1 new, 3 multi)
I would like to update the status from new (status=1) to multi (status=3) of all the responses if at least 20 have the same title.
I have this one, but it does not work :
UPDATE response SET status = 3 WHERE status = 1 AND title IN (
SELECT title FROM (
SELECT DISTINCT(r.title) FROM response r WHERE EXISTS (
SELECT 1 FROM response spam WHERE spam.title = r.title LIMIT 20, 1)
)
as u)
Please note:
I do the nested select to avoid the famous You can't specify target table 'response' for update in FROM clause
I cannot use GROUP BY for performance reasons. The query cost with a solution using LIMIT is way better (but it is less readable).
EDIT:
It is possible to do SELECT FROM an UPDATE target in MySQL. See solution here
The issue is on the data selected which is totaly wrong.
The only solution I found which works is with a GROUP BY:
UPDATE response SET status = 3
WHERE status = 1 AND title IN (SELECT title
FROM (SELECT title
FROM response
GROUP BY title
HAVING COUNT(1) >= 20)
as derived_response)
Thanks for your help! :)

MySQL doesn't like it when you try to UPDATE and SELECT from the same table in one query. It has to do with locking priorities, etc.
Here's how I would solve this problem:
SELECT CONCAT('UPDATE response SET status = 3 ',
'WHERE status = 1 AND title = ', QUOTE(title), ';') AS sql
FROM response
GROUP BY title
HAVING COUNT(*) >= 20;
This query produces a series of UPDATE statements, with the quoted titles that deserve to be updated embedded. Capture the result and run it as an SQL script.
I understand that GROUP BY in MySQL often incurs a temporary table, and this can be costly. But is that a deal-breaker? How frequently do you need to run this query? Besides, any other solutions are likely to require a temporary table too.
I can think of one way to solve this problem without using GROUP BY:
CREATE TEMPORARY TABLE titlecount (c INTEGER, title VARCHAR(100) PRIMARY KEY);
INSERT INTO titlecount (c, title)
SELECT 1, title FROM response
ON DUPLICATE KEY UPDATE c = c+1;
UPDATE response JOIN titlecount USING (title)
SET response.status = 3
WHERE response.status = 1 AND titlecount.c >= 20;
But this also uses a temporary table, which is why you try to avoid using GROUP BY in the first place.

I would write something straightforward like below
UPDATE `response`, (
SELECT title, count(title) as count from `response`
WHERE status = 1
GROUP BY title
) AS tmp
SET response.status = 3
WHERE status = 1 AND response.title = tmp.title AND count >= 20;
Is using GROUP BY really that slow ? The solution you tried to implement looks like requesting again and again on the same table and should be way slower than using GROUP BY if it worked.

This is a funny peculiarity with MySQL - I can't think of a way to do it in a single statement (GROUP BY or no GROUP BY).
You could select the appropriate response rows into a temporary table first then do the update by selecting from that temp table.

you'll have to use a temporary table:
create temporary table r_update (title varchar(10));
insert r_update
select title
from response
group
by title
having count(*) < 20;
update response r
left outer
join r_update ru
on ru.title = r.title
set status = case when ru.title is null then 3 else 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

finding duplicate rows on more than one field - mysql

Related

Merge two databases, without duplicates, with FKs references [duplicate]

mysql - How can I correct an auto-increment field that has deleted rows (1,2,3,4,5 - is now 1,3,5) but I want it to be 1,2,3

MySQL - tell if column _all_ has same value

Insert values in new column in Mysql

Update with SELECT and group without GROUP BY

Categories

Resources