Select duplicate using partition by in MySQL? - mysql

I would like to select all duplicate from a table:
SELECT * FROM people HAVING (count(*) OVER (PARTITION BY name)) > 1;
Unfortunately I get the error:
Error Code: 3593. You cannot use the window function 'count' in this context.
One less elegant solution would be:
SELECT
*
FROM
people
WHERE
code IN (SELECT
name
FROM
people
GROUP BY name
HAVING COUNT(*) > 1);
How can I rewrite my first query to make it work?

If the code is identical then you can use exists :
select p.*
from people p
where exists (select 1 from people p1 where p1.name = p.name and p.code <> p1.code);
Use identity column instead if code column doesn't have a identity feature if entire table has no any identity column then your method works fine with following updated query :
SELECT p.*
FROM people p
WHERE name IN (SELECT name FROM people GROUP BY name HAVING COUNT(*) > 1);

Related

more efficient way to select duplicate users

Im trying to select * from all duplicate rows in users, where a duplicate is defined as two users sharing the same first_name and last_name. (I need to process the other columns that might differ)
Im using MySQL 8.0.28.
My first try was to literally translate my requirement:
select * from `users` AS u1 where exists (select 1 from `users` AS u2 WHERE `u2`.`first_name` = `u1`.`first_name` AND `u2`.`last_name` = `u1`.`last_name` AND `u2`.`id` != `u1`.`id`)
Which, obviously, has a horrendous execution time.
My current query is
SELECT * from users where Concat(first_name," ",last_name) IN (select Concat(first_name," ",last_name) from `users` GROUP BY first_name, last_name HAVING COUNT(*)>1)
which is vastly more efficient, but still takes more than 100ms for 8000 records. I suppose a solution that doesn't use concat could benefit from indicies and would not need to calculate the result for each row.
Also, I couldn't get group by to work because I need so select all columns of all rows that are duplicates, not just the distinct first_name's and last_name's. Also because I don't want to disable ONLY_FULL_GROUP_BY (not sure if disabling that would help anyway).
Is there a more efficient, proper way to select these duplicate rows?
I would just use an aggregation approach here:
SELECT *
FROM users
WHERE (first_name, last_name) IN (
SELECT first_name, last_name
FROM users
GROUP BY 1, 2
HAVING COUNT(*) > 1
);
On MySQL 8+, we can also use COUNT() as an analytic function here:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY first_name, last_name) AS cnt
FROM users
)
SELECT *
FROM cte
WHERE cnt > 1;

Select column from selected column subquery [duplicate]

I am running this query on MySQL
SELECT ID FROM (
SELECT ID, msisdn
FROM (
SELECT * FROM TT2
)
);
and it is giving this error:
Every derived table must have its own alias.
What's causing this error?
Every derived table (AKA sub-query) must indeed have an alias. I.e. each query in brackets must be given an alias (AS whatever), which can the be used to refer to it in the rest of the outer query.
SELECT ID FROM (
SELECT ID, msisdn FROM (
SELECT * FROM TT2
) AS T
) AS T
In your case, of course, the entire query could be replaced with:
SELECT ID FROM TT2
I think it's asking you to do this:
SELECT ID
FROM (SELECT ID,
msisdn
FROM (SELECT * FROM TT2) as myalias
) as anotheralias;
But why would you write this query in the first place?
Here's a different example that can't be rewritten without aliases ( can't GROUP BY DISTINCT).
Imagine a table called purchases that records purchases made by customers at stores, i.e. it's a many to many table and the software needs to know which customers have made purchases at more than one store:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases)
GROUP BY customer_id HAVING 1 < SUM(1);
..will break with the error Every derived table must have its own alias. To fix:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases) AS custom
GROUP BY customer_id HAVING 1 < SUM(1);
( Note the AS custom alias).
I arrived here because I thought I should check in SO if there are adequate answers, after a syntax error that gave me this error, or if I could possibly post an answer myself.
OK, the answers here explain what this error is, so not much more to say, but nevertheless I will give my 2 cents, using my own words:
This error is caused by the fact that you basically generate a new table with your subquery for the FROM command.
That's what a derived table is, and as such, it needs to have an alias (actually a name reference to it).
Given the following hypothetical query:
SELECT id, key1
FROM (
SELECT t1.ID id, t2.key1 key1, t2.key2 key2, t2.key3 key3
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.key3 = 'some-value'
) AS tt
At the end, the whole subquery inside the FROM command will produce the table that is aliased as tt and it will have the following columns id, key1, key2, key3.
Then, with the initial SELECT, we finally select the id and key1 from that generated table (tt).

How to use AVG() function after GROUP BY with CASE in MySQL [duplicate]

I am running this query on MySQL
SELECT ID FROM (
SELECT ID, msisdn
FROM (
SELECT * FROM TT2
)
);
and it is giving this error:
Every derived table must have its own alias.
What's causing this error?
Every derived table (AKA sub-query) must indeed have an alias. I.e. each query in brackets must be given an alias (AS whatever), which can the be used to refer to it in the rest of the outer query.
SELECT ID FROM (
SELECT ID, msisdn FROM (
SELECT * FROM TT2
) AS T
) AS T
In your case, of course, the entire query could be replaced with:
SELECT ID FROM TT2
I think it's asking you to do this:
SELECT ID
FROM (SELECT ID,
msisdn
FROM (SELECT * FROM TT2) as myalias
) as anotheralias;
But why would you write this query in the first place?
Here's a different example that can't be rewritten without aliases ( can't GROUP BY DISTINCT).
Imagine a table called purchases that records purchases made by customers at stores, i.e. it's a many to many table and the software needs to know which customers have made purchases at more than one store:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases)
GROUP BY customer_id HAVING 1 < SUM(1);
..will break with the error Every derived table must have its own alias. To fix:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases) AS custom
GROUP BY customer_id HAVING 1 < SUM(1);
( Note the AS custom alias).
I arrived here because I thought I should check in SO if there are adequate answers, after a syntax error that gave me this error, or if I could possibly post an answer myself.
OK, the answers here explain what this error is, so not much more to say, but nevertheless I will give my 2 cents, using my own words:
This error is caused by the fact that you basically generate a new table with your subquery for the FROM command.
That's what a derived table is, and as such, it needs to have an alias (actually a name reference to it).
Given the following hypothetical query:
SELECT id, key1
FROM (
SELECT t1.ID id, t2.key1 key1, t2.key2 key2, t2.key3 key3
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.key3 = 'some-value'
) AS tt
At the end, the whole subquery inside the FROM command will produce the table that is aliased as tt and it will have the following columns id, key1, key2, key3.
Then, with the initial SELECT, we finally select the id and key1 from that generated table (tt).

Error when I delcare my own variable in MySQL [duplicate]

I am running this query on MySQL
SELECT ID FROM (
SELECT ID, msisdn
FROM (
SELECT * FROM TT2
)
);
and it is giving this error:
Every derived table must have its own alias.
What's causing this error?
Every derived table (AKA sub-query) must indeed have an alias. I.e. each query in brackets must be given an alias (AS whatever), which can the be used to refer to it in the rest of the outer query.
SELECT ID FROM (
SELECT ID, msisdn FROM (
SELECT * FROM TT2
) AS T
) AS T
In your case, of course, the entire query could be replaced with:
SELECT ID FROM TT2
I think it's asking you to do this:
SELECT ID
FROM (SELECT ID,
msisdn
FROM (SELECT * FROM TT2) as myalias
) as anotheralias;
But why would you write this query in the first place?
Here's a different example that can't be rewritten without aliases ( can't GROUP BY DISTINCT).
Imagine a table called purchases that records purchases made by customers at stores, i.e. it's a many to many table and the software needs to know which customers have made purchases at more than one store:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases)
GROUP BY customer_id HAVING 1 < SUM(1);
..will break with the error Every derived table must have its own alias. To fix:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases) AS custom
GROUP BY customer_id HAVING 1 < SUM(1);
( Note the AS custom alias).
I arrived here because I thought I should check in SO if there are adequate answers, after a syntax error that gave me this error, or if I could possibly post an answer myself.
OK, the answers here explain what this error is, so not much more to say, but nevertheless I will give my 2 cents, using my own words:
This error is caused by the fact that you basically generate a new table with your subquery for the FROM command.
That's what a derived table is, and as such, it needs to have an alias (actually a name reference to it).
Given the following hypothetical query:
SELECT id, key1
FROM (
SELECT t1.ID id, t2.key1 key1, t2.key2 key2, t2.key3 key3
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.key3 = 'some-value'
) AS tt
At the end, the whole subquery inside the FROM command will produce the table that is aliased as tt and it will have the following columns id, key1, key2, key3.
Then, with the initial SELECT, we finally select the id and key1 from that generated table (tt).

MySQL GROUP BY functional dependence in subquery

I'm writing a query to find duplicate rows in a table of people (including each duplicate):
SELECT *
FROM Person
WHERE CONCAT(firstName,lastName) IN (
SELECT CONCAT(firstName,lastName) AS name
FROM Person
GROUP BY CONCAT(firstName,lastName)
HAVING COUNT(*) > 1
)
When running this in MySQL 8.0.19 with ONLY_FULL_GROUP_BY enabled, it fails with the following error:
Query 1 ERROR: Expression #1 of HAVING clause is not in GROUP BY clause and contains nonaggregated column 'Person.firstName' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I can't figure out how to fix this. I tried changing COUNT(*) to COUNT(CONCAT(firstName,lastName)) but that didn't help.
What's odd is that a) it runs fine in MariaDB 10.2, with or without ONLY_FULL_GROUP_BY, and b) running the subquery by itself causes no issue.
What am I doing wrong? It almost seems like a bug in MySQL.
[edit]: I certainly appreciate alternative solutions to my query, however I'm really interested in an answer as to why my error is occurring.
try like below it will do the same that you tried
SELECT *
FROM Person
WHERE (firstName,lastName) IN (
SELECT firstName,lastName
FROM Person
GROUP BY firstName,lastName
HAVING COUNT(*) > 1
)
Do not merge fields:
SELECT *
FROM Person
WHERE (firstName,lastName) IN (
SELECT firstName,lastName AS name
FROM Person
GROUP BY firstName,lastName
HAVING COUNT(*) > 1
)
Or use ANY_VALUE() function:
SELECT *
FROM Person
WHERE CONCAT(firstName,lastName) IN (
SELECT ANY_VALUE(CONCAT(firstName,lastName)) AS name
FROM Person
GROUP BY CONCAT(firstName,lastName)
HAVING COUNT(*) > 1
)
I would write your query with exists logic:
SELECT p1.*
FROM Person p1
WHERE EXISTS (SELECT 1 FROM Person p2
WHERE p2.firstName = p1.firstName AND
p2.lastName = p1.lastName AND
p2.id <> p1.id);
This effectively says to select every person for whom we can find another, different, person (going by the primary key id column, or whatever the PK might be), with same first and last name.
The following index may speed up the above query:
CREATE INDEX idx ON Person (lastName, firstName);
This should allow the exists lookup to evaluate quickly. Note that on InnoDB, MySQL should automatically cover id by adding it to the end of the above two-column index.
Regarding your error, I can't help but wonder if perhaps the problem is that you did not use proper aliases in the subquery, leading MySQL to think that you are referring to the columns in the outer query. Try this version:
SELECT p1.*
FROM Person p1
WHERE CONCAT(firstName, lastName) IN (
SELECT CONCAT(p2.firstName, p2.lastName)
FROM Person p2
GROUP BY CONCAT(p2.firstName, p2.lastName)
HAVING COUNT(*) > 1
);