Find rows with duplicates in three columns without specifying value? - mysql

Ive got a table like this, where I'm looking for unnecessary duplicate rows:
I want to find any rows where the First Name, Last Name, and Occupation columns are identical - in this case rows 1 and 3. I don't want to specify what the identical values should be as I dont know.
I've tried the answer to this question, but I dont think it applies to this case.

simple solution is to add a HAVING clause where there are duplicates after grouping by all three columns
SELECT
ID, FirstName, LastName, Occupation, Age
FROM table1
GROUP BY
FirstName,
LastName,
Occupation
HAVING COUNT(*) > 1
here is a DEMO with two duplicate rows to ensure it works properly
EDIT:
my first understanding was you wanted one row returned when it has duplicates.. if you want a query that will return all duplicate rows..
then here it is... this will return rows 1 and 3
SELECT p1.* FROM people p
JOIN people p1
ON p1.firstname = p.firstname
AND p1.lastname = p.lastname
AND p1.occupation = p.occupation
GROUP BY id
HAVING COUNT(*) > 1;
another DEMO

Self join, 3 times (untested): SELECT a.* from your_table a, your_table b, your_table c, your_table d
where
a.fname = b.fname and a.lname=c.lname and a.occupation=d.occupation

Related

Count Distinct on multiple values within same column in SQL Aggregation

Objective:
I wanted to show the number of distinct IDs for any combination selected.
In the below example, I have data at a granular level: ID level data.
I wanted to show the number of distinct IDs for each combination.
For this, I use count distinct which will give me '1' for the below combinations.
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in that case, if I just use this data, I would be showing the sum of E-comm and Face to face and the result would be '2' instead of '1'.
And this is not limited to Ecom/Face to face. I wanted to apply the same logic for all columns.
Please let me know if you have any other alternative approach to address this issue.
First aggregate in your table to get the distinct ids for each TranType:
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
and then join to the table:
SELECT t.*, g.counter_distinct
FROM tablename t
INNER JOIN (
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
) g ON g.TranType = t.TranType
Or use a correlated subquery:
SELECT t1.*,
(SELECT COUNT(DISTINCT t2.id) FROM tablename t2 WHERE t2.TranType = t1.TranType) counter_distinct
FROM tablename t1
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in
You can get the list of ids using:
select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2;
You can get the count using a subquery:
select count(*)
from (select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2
) i;

Selecting entire Record for Distinct Column in MySql

I have a table:
[myTable]
ID
Name
HairColor
NumberOfPairsOfPants
I can easily get a list of HairColor with
SELECT DISTINCT HairColor FROM myTable
But I want the full contents the records where the hair color is Distinct (Yes the database table is denormalized/redundant so I don't get logic errors)
Pseudo code
SELECT DISTINCT HairColor,* FROM myTable
Syntax help!
Okay, first off, the SELECT DISTINCT haircolor FROM myTable doesn't give you "Where haircolor is distinct." It gives you all the distinct hair colors. Kind of like "Distinct names in the room." If there are two people named Sally, SELECT DISTINCT name would give you one row for Sally. What you're looking for is a bit different.
What you want to do is, first off all, determine which hair colors are distinct, ie, which ones occur only once. For that you will need
SELECT haircolor, COUNT(*) AS cnt FROM myTable GROUP BY haircolor HAVING cnt > 1;
Once you've done that, you will want to join those results with your original table to get the entire rows associated with those hair colors, eg
SELECT a.* FROM myTable AS a
INNER JOIN
(SELECT SELECT haircolor, COUNT(*) AS cnt FROM myTable GROUP BY haircolor HAVING cnt > 1) AS b
ON a.haircolor = b.haircolor

Group, count, and exclude records with matching values in another similar table

Select Gender, Count(*)as TotalGender
FROM 1
Group by Gender
This works perfect for me.
However, this table(1) has a similar column to another table(2).
If this similar column has matching values, I need to exclude those values from the above count.
Use a JOIN in that case, like this:
SELECT Gender,
COUNT(*) AS TotalGender
FROM `1`
LEFT JOIN `2`
ON `1.somecolumn` = `2.somecolumn`
WHERE `2.somecolumn` IS NULL
GROUP BY Gender

select A, B , C group by B with A from the row that has the highest C

I have collected informations from different sources about certain IDs that should match a single name. Some sources are more trustworthy than others in giving the correct name for a given ID.
I created a table (name, id, source_trustworthiness) and I want to get the most trustworthy name for each ID.
I tried
SELECT name, id, MAX( source_trustworthiness )
FROM table
GROUP BY id
this returns th highest trustworthiness available for each ID but with the first name it finds, regarless of its trustworthiness.
Is there a way I can get that right ?
Mysql has special functionality to help:
SELECT * FROM (
SELECT name, id, source_trustworthiness
FROM table
ORDER BY 3 DESC ) x
GROUP BY id
Although this wouldn't even execute in other databases (not naming all non-aggregate columns in the GROUP BY clause), with mysql it returns the first row encountered for each unique value of the grouped by columns. By ordering the rows greatest first, the first row for each id will be the most trustworthy.
Since this question is tagged mysql, this query is OK. Not only is it really simple, it's also quite fast.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT id, MAX(source_trustworthiness) max_val
FROM TableName
GROUP BY ID
) b ON a.ID = b.ID AND
a.source_trustworthiness = b.max_val

Mysql select distinct

I am trying to select of the duplicate rows in mysql table it's working fine for me but the problem is that it is not letting me select all the fields in that query , just letting me select the field name i used as distinct , lemme write the query for better understading
mysql_query("SELECT DISTINCT ticket_id FROM temp_tickets ORDER BY ticket_id")
mysql_query("SELECT * , DISTINCT ticket_id FROM temp_tickets ORDER BY ticket_id")
1st one is working fine
now when i am trying to select all fields i am ending up with errors
i am trying to select the latest of the duplicates let say ticket_id 127 is 3 times on row id 7,8,9 so i want to select it once with the latest entry that would be 9 in this case and this applies on all the rest of the ticket_id's
Any idea
thanks
DISTINCT is not a function that applies only to some columns. It's a query modifier that applies to all columns in the select-list.
That is, DISTINCT reduces rows only if all columns are identical to the columns of another row.
DISTINCT must follow immediately after SELECT (along with other query modifiers, like SQL_CALC_FOUND_ROWS). Then following the query modifiers, you can list columns.
RIGHT: SELECT DISTINCT foo, ticket_id FROM table...
Output a row for each distinct pairing of values across ticket_id and foo.
WRONG: SELECT foo, DISTINCT ticket_id FROM table...
If there are three distinct values of ticket_id, would this return only three rows? What if there are six distinct values of foo? Which three values of the six possible values of foo should be output?
It's ambiguous as written.
Are you looking for "SELECT * FROM temp_tickets GROUP BY ticket_id ORDER BY ticket_id ?
UPDATE
SELECT t.*
FROM
(SELECT ticket_id, MAX(id) as id FROM temp_tickets GROUP BY ticket_id) a
INNER JOIN temp_tickets t ON (t.id = a.id)
You can use group by instead of distinct. Because when you use distinct, you'll get struggle to select all values from table. Unlike when you use group by, you can get distinct values and also all fields in table.
You can use DISTINCT like that
mysql_query("SELECT DISTINCT(ticket_id), column1, column2, column3
FROM temp_tickets
ORDER BY ticket_id");
use a subselect:
http://forums.asp.net/t/1470093.aspx