more efficient way to select duplicate users - mysql

Im trying to select * from all duplicate rows in users, where a duplicate is defined as two users sharing the same first_name and last_name. (I need to process the other columns that might differ)
Im using MySQL 8.0.28.
My first try was to literally translate my requirement:
select * from `users` AS u1 where exists (select 1 from `users` AS u2 WHERE `u2`.`first_name` = `u1`.`first_name` AND `u2`.`last_name` = `u1`.`last_name` AND `u2`.`id` != `u1`.`id`)
Which, obviously, has a horrendous execution time.
My current query is
SELECT * from users where Concat(first_name," ",last_name) IN (select Concat(first_name," ",last_name) from `users` GROUP BY first_name, last_name HAVING COUNT(*)>1)
which is vastly more efficient, but still takes more than 100ms for 8000 records. I suppose a solution that doesn't use concat could benefit from indicies and would not need to calculate the result for each row.
Also, I couldn't get group by to work because I need so select all columns of all rows that are duplicates, not just the distinct first_name's and last_name's. Also because I don't want to disable ONLY_FULL_GROUP_BY (not sure if disabling that would help anyway).
Is there a more efficient, proper way to select these duplicate rows?

I would just use an aggregation approach here:
SELECT *
FROM users
WHERE (first_name, last_name) IN (
SELECT first_name, last_name
FROM users
GROUP BY 1, 2
HAVING COUNT(*) > 1
);
On MySQL 8+, we can also use COUNT() as an analytic function here:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY first_name, last_name) AS cnt
FROM users
)
SELECT *
FROM cte
WHERE cnt > 1;

Related

MySQL GROUP BY functional dependence in subquery

I'm writing a query to find duplicate rows in a table of people (including each duplicate):
SELECT *
FROM Person
WHERE CONCAT(firstName,lastName) IN (
SELECT CONCAT(firstName,lastName) AS name
FROM Person
GROUP BY CONCAT(firstName,lastName)
HAVING COUNT(*) > 1
)
When running this in MySQL 8.0.19 with ONLY_FULL_GROUP_BY enabled, it fails with the following error:
Query 1 ERROR: Expression #1 of HAVING clause is not in GROUP BY clause and contains nonaggregated column 'Person.firstName' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I can't figure out how to fix this. I tried changing COUNT(*) to COUNT(CONCAT(firstName,lastName)) but that didn't help.
What's odd is that a) it runs fine in MariaDB 10.2, with or without ONLY_FULL_GROUP_BY, and b) running the subquery by itself causes no issue.
What am I doing wrong? It almost seems like a bug in MySQL.
[edit]: I certainly appreciate alternative solutions to my query, however I'm really interested in an answer as to why my error is occurring.
try like below it will do the same that you tried
SELECT *
FROM Person
WHERE (firstName,lastName) IN (
SELECT firstName,lastName
FROM Person
GROUP BY firstName,lastName
HAVING COUNT(*) > 1
)
Do not merge fields:
SELECT *
FROM Person
WHERE (firstName,lastName) IN (
SELECT firstName,lastName AS name
FROM Person
GROUP BY firstName,lastName
HAVING COUNT(*) > 1
)
Or use ANY_VALUE() function:
SELECT *
FROM Person
WHERE CONCAT(firstName,lastName) IN (
SELECT ANY_VALUE(CONCAT(firstName,lastName)) AS name
FROM Person
GROUP BY CONCAT(firstName,lastName)
HAVING COUNT(*) > 1
)
I would write your query with exists logic:
SELECT p1.*
FROM Person p1
WHERE EXISTS (SELECT 1 FROM Person p2
WHERE p2.firstName = p1.firstName AND
p2.lastName = p1.lastName AND
p2.id <> p1.id);
This effectively says to select every person for whom we can find another, different, person (going by the primary key id column, or whatever the PK might be), with same first and last name.
The following index may speed up the above query:
CREATE INDEX idx ON Person (lastName, firstName);
This should allow the exists lookup to evaluate quickly. Note that on InnoDB, MySQL should automatically cover id by adding it to the end of the above two-column index.
Regarding your error, I can't help but wonder if perhaps the problem is that you did not use proper aliases in the subquery, leading MySQL to think that you are referring to the columns in the outer query. Try this version:
SELECT p1.*
FROM Person p1
WHERE CONCAT(firstName, lastName) IN (
SELECT CONCAT(p2.firstName, p2.lastName)
FROM Person p2
GROUP BY CONCAT(p2.firstName, p2.lastName)
HAVING COUNT(*) > 1
);

MySQL database | querying count() and select at the same time

i am using MySql workbench 5.7 to run this.
i am trying to get the result of this query:
SELECT COUNT(Users) FROM UserList.custumers;
and this query:
SELECT Users FROM UserList.custumers;
at the same table, meaning i want a list of users in one column and the amount of total users in the other column.
when i tries this:
SELECT Users , COUNT(Users) FROM UserList.custumers;
i get a single row with the right count but only the first user in my list....
You can either use a cross join since you know the count query will result in one row... whose value you want repeated on every row.
SELECt users, userCount
FROM userlist.custumers
CROSS JOIN (Select count(*) UserCount from userlist.custumers)
Or you can run a count in the select.... I prefer the first as the count only has to be done once.
SELECT users, (SELECT count(*) cnt FROM userlist.custumers) as userCount
FROM userlist.custumers
Or in a environment supporting window functions (not mySQL) you could count(*) over (partition by 1) as userCount
The reason you're getting one row is due to mySQL's extension of the GROUP BY which will pick a single value from non-aggregated columns to display when you use aggregation without a group by clause. If you add a group by to your select, you will not get the count of all users. Thus the need for the inline select or the cross join.
Consider: -- 1 record not all users
SELECT Users , COUNT(Users) FROM UserList.custumers;
vs --all users wrong count
SELECT Users , COUNT(Users) FROM UserList.custumers group by users;
vs -- what I believe you're after
SELECT Users, x.usercount FROM UserList.custumers
CROSS JOIN (Select count(*) UserCount from userlist.custumers) x
Use a subquery in SELECT.
Select Users,
(SELECT COUNT(Users) FROM UserList.custumers) as total
FROM UserList.custumers;

SQLite select all records and count

I have the following table:
CREATE TABLE sometable (my_id INTEGER PRIMARY KEY AUTOINCREMENT, name STRING, number STRING);
Running this query:
SELECT * FROM sometable;
Produces the following output:
1|someone|111
2|someone|222
3|monster|333
Along with these three fields I would also like to include a count representing the amount of times the same name exists in the table.
I've obviously tried:
SELECT my_id, name, count(name) FROM sometable GROUP BY name;
though that will not give me an individual result row for every record.
Ideally I would have the following output:
1|someone|111|2
2|someone|222|2
3|monster|333|1
Where the 4th column represents the amount of time this number exists.
Thanks for any help.
You can do this with a correlated subquery in the select clause:
Select st.*,
(SELECT count(*) from sometable st2 where st.name = st2.name) as NameCount
from sometable st;
You can also write this as a join to an aggregated subquery:
select st.*, stn.NameCount
from sometable st join
(select name, count(*) as NameCount
from sometable
group by name
) stn
on st.name = stn.name;
EDIT:
As for performance, the best way to find out is to try both and time them. The correlated subquery will work best when there is an index on sometable(name). Although aggregation is reputed to be slow in MySQL, sometimes this type of query gets surprisingly good results. The best answer is to test.
Select *, (SELECT count(my_id) from sometable) as total from sometable

Event Handler to get rows of tables MySql

I'm trying to make an event in phpmyadmin that will get the number of rows in a number of tables and then insert the results into another database. The only issue is I can't seem to be able to use mysql to count the rows and then also put them into the table. I've also tried to set mysql variables with the COUNT(). Here is the current code that I have:
INSERT INTO user_count (users,taps,statues,questions,friendships,expressions)
SELECT COUNT(*) from `users`,COUNT(*) from `taps`,COUNT(*) from `status`,
COUNT(*) from `questions`,COUNT(*) from `friends`,COUNT(*) from `expressions`;
You are almost done. Following query would be worked for you.
INSERT INTO user_count (users, taps, statues, questions, friendships, expressions)
SELECT
(SELECT COUNT(*) from `users`)
(SELECT COUNT(*) from `taps`)
(SELECT COUNT(*) from `statues`)
(SELECT COUNT(*) from `questions`)
(SELECT COUNT(*) from `friendships`)
(SELECT COUNT(*) from `expressions`)
;

How to count the number of entries in two selects?

Semi-newbyism ahead: I need to do two selects and count the number of items in both of them. Here's a bad example of what I thought would work --
sum(
select count(*) as count1 from users where name = 'John'
union
select count(*) as count1 from users where name = 'Mary'
) as theCount
(This is, as I said, a BAD example, since I could obviously write this as a single select with an appropriate WHERE clause. In what I really have to do, the two things I have to do are such that I can't do them as a single select (or, at least, I haven't yet found a way to do them as a single select).
Anyway, I think what I'm trying to do is clear: the select-union-select bit returns a column containing the counts of the two selects; that part works fine. I thought that wrapping them in a SUM() would get me what I wanted, but it's throwing a syntax error. The right thing is probably trivial, but I just don't see it. Any thoughts out there? Thanks!
For generic selects that you can't necessarily write with one where:
SELECT sum(count1) as totalcount FROM (
select count(*) as count1 from users where name = 'John'
union all
select count(*) as count1 from users where name = 'Mary'
) as theCount
select count(*) as count1 from users where name in ('John','Mary')
This is another alternative
select ( select count(*) as count1 from users where name = 'John')
+
( select count(*) as count1 from users where name = 'Mary') as total
Another possible solution:
select
sum(if(name='John',1,0)) as tot_John,
sum(if(name='Mary',1,0)) as tot_Mary,
sum(if(name in ('John','Mary'),1,0)) as total
from users