Show entire row for duplicates - duplicates

I can find the duplicates in rawHR table, but now need to show all of the remaining columns of those duplicates. Here's what I've tried, but it gives me all employees, not just the duplicate ones.
select *
from rawHR
where exists
(
select distinct firstname, lastname
from rawHR
group by firstname, lastname
having count(*)> 1
)

Related

How to select multiple tables in single query mysql? (some tables have no data yet)

I have 3 tables called patients, customers and deliveries. Those tables are in the same database called db.
All the tables equally have id, first_name, last_name, gender and only deliveries table has their own data. (the other 2 tables are currently empty.)
Now, I want to select all of them in 1 query but mysql throws an error this:
SELECT first_name, last_name, gender FROM paitents, customers,
deliveries GROUP BY people LIMIT 0, 50000 Error Code: 1052. Column
'first_name' in field list is ambiguous 0.047 sec .
This is how I tried:
SELECT first_name, last_name, gender
FROM patients, customers, deliveries
GROUP BY people;
How do I select all of the tables even if some tables currently have no data?
All the tables equally have id, first_name, last_name, gender and only deliveries table has their own data. (the other 2 tables are currently empty.)
Now, I want to select all of them in 1 query
I suspect that you are looking for union all:
SELECT first_name, last_name, gender FROM patients
UNION ALL
SELECT first_name, last_name, gender FROM customers
UNION ALL
SELECT first_name, last_name, gender FROM deliveries
This will combine all records available in the 3 tables in the resultset. On the other hand, using an (implicit) cross join like you do would generate a cartesian product of the 3 tables, with 9 columns (3 * 3) in the resultset (that is, if you fix the ambiguity on column names that you currently have).
If you want to eliminate duplicates accross tables, you can use union instead of union all.
If you want to limit the number of records in the resultset, you can do this as follows:
(
SELECT first_name, last_name, gender FROM patients
UNION ALL
SELECT first_name, last_name, gender FROM customers
UNION ALL
SELECT first_name, last_name, gender FROM deliveries
)
ORDER BY id
LIMIT 5000
Note that, functionaly this does require an order by clause, otherwise the ordering of the results is undefined (I assumed id).

Need a field in SELECT DISTINCT but I do not want it to be printed

I need to have the ID field in the SELECT DISTINCT in order to differentiate 2 cases: duplicates from not duplicates but namesake.
In other words you may have the same person duplicated many times and people with same name and surname in the same db.
If I do not place the ID field in the SELECT, the query returns duplicates and namesakes.
I have to place the ID to eliminate duplicates only. But at the same time, I would like not to print the ID. IS this possible without using the group by ID?
SELECT DISTINCT ID, Name, Surname
FROM (SUBQUERY THAT RETURNS DUPLICATES)
Sure:
Select c.Name, c.Surname
From (
SELECT DISTINCT ID, Name, Surname
FROM (SUBQUERY THAT RETURNS DUPLICATES)
) as c;
A simple way a select wrapper
select Name, Surname from (
SELECT DISTINCT
ID
, Name
, Surname
FROM (SUBQUERY THAT RETURNS DUPLICATES) ) T

Find duplicates using MySQL considering multiple columns

I need to find duplicate uses based on either same email OR first_name, last_name combination OR same birth_date. What I could comfortably try was:
SELECT id, first_name, last_name
FROM users
where id IN (SELECT id
from users
GROUP BY email
HAVING count(*) > 1)
GROUP BY email, id;
The above gives only duplicate email details, but I'm bit confused about handling other conditions based on first_name, last_name combination OR same birth_date as well.
Is it possible to achieve it in a single query?
Try doing a UNION of three separate queries which checks for the three duplicate criteria:
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT email) > 1
UNION
(
SELECT id
FROM users t1
INNER JOIN
(
SELECT firstname, lastname
FROM users
GROUP BY firstname, lastname
HAVING COUNT(*) > 1
) t2
ON t1.firstname = t2.firstname AND
t1.lastname = t2.lastname
)
UNION
SELECT id
FROM users
GROUP BY id
HAVING COUNT(DISTINCT birthdate) > 1

Group By Two Tables

I have two tables with identical schema. I want to get a count of all the people with a given surname in both tables, and have found I can do it like this:
SELECT surname, count(*) AS cnt
FROM
(
SELECT surname
FROM people.NorthKorea
UNION ALL
SELECT surname
FROM peopleGlobal.NorthKorea
) AS t
GROUP BY surname
ORDER BY cnt DESC
This is fine for small tables, but I have tables with up to 250 million rows, so was wondering if there may be a more efficient way of doing this? Such as INSERTING the result of the COUNT from one table into a table, and then updating / inserting (REPLACE?) the result of the COUNT on the second table.
N.B. I actually want to store the result of the COUNT on both tables in another table.
An index on the surname column should help a lot. I would try with this query, if there are a lot more rows than surnames I expect it to run faster:
SELECT surname, SUM(cnt)
FROM
(
SELECT surname, COUNT(*) as cnt
FROM people.NorthKorea
GROUP BY surname
UNION ALL
SELECT surname, COUNT(*) as cnt
FROM peopleGlobal.NorthKorea
GROUP BY surname
)
GROUP BY surname
ORDER BY cnt DESC

Select a record that has a duplicate

I'd like to select all records from a table (names) where lastname is not unique. Preferably I would like to delete all records that are duplicates.
How would this be done? Assume that I don't want to rerun one query multiple times until it quits.
To find which lastnames have duplicates:
SELECT lastname, COUNT(lastname) AS rowcount
FROM table
GROUP BY lastname
HAVING rowcount > 1
To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.
DELETE FROM table
WHERE id IN (SELECT id
FROM (SELECT * FROM table) AS t
GROUP BY lastname
HAVING COUNT(lastname) > 1)
The fastest and easiest way to delete duplicate records is my issuing a very simple command.
ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])
This will lock the table, if this is an issue, try:
delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster
dup How can I remove duplicate rows?
DELETE names
FROM names
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, lastname
FROM names
GROUP BY lastname
) as KeepRows ON
names.lastname = KeepRows.lastname
WHERE
KeepRows.RowId IS NULL
assumption: you have an RowId column
SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;
and then
DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1
but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?