MySQL: Finding repeated names in my User table

MySQL: Finding repeated names in my User table - mysql

I want to find all users whose name appears at least twice in my User table. 'email' is a unique field, but the combination of 'firstName' and 'lastName' is not necessarily unique.
So far I have come up with the following query, which is very slow, and I am not even sure it is correct. Please let me know a better way to rewrite this.
SELECT CONCAT(u2.firstName, u2.lastName) AS fullName
FROM cpnc_User u2
WHERE CONCAT(u2.firstName, u2.lastName) IN (
SELECT CONCAT(u2.firstName, u2.lastName) AS fullNm
FROM cpnc_User u1
GROUP BY fullNm
HAVING COUNT(*) > 1
)
Also, note that the above returns the list of names that appear at least twice (I think so, anyway), but what I really want is the complete list of all user 'id' fields for these names. So each name, since it appears at least twice, will be associated with at least two primary key 'id' fields.
Thanks for any help!
Jonah

SELECT u.*
FROM cpnc_User u JOIN
(
SELECT firstName, lastName
FROM cpnc_User
GROUP BY firstName, lastName
HAVING COUNT(*) > 1
) X on X.firstName = u.firstName AND x.lastName = u.lastName
ORDER BY u.firstName, u.lastName
There is no need to make up a concatenated field, just use the 2 fields separately

SELECT u.id, u.firstName, u.lastName
FROM cpnc_User u, (
SELECT uc.firstName, uc.lastName
FROM cpnc_User uc
GROUP BY uc.firstName, uc.lastName
HAVING count(*) > 1
) u2
WHERE (
u.firstName = u2.firstName
AND u.lastName = u2.lastName
)

To experiment I created a simple table with two columns a user id, and a name. I inserted a bunch of records, including some duplicates. Then ran this query:
SELECT
count(id) AS count,
group_concat(id) as IDs
FROM
test
GROUP BY
`name`
ORDER BY
count DESC
It should give you results like this:
+-------+----------+
| count | IDs |
+-------+----------+
| 4 | 7,15,4,1 |
| 2 | 2,8 |
| 2 | 6,13 |
| 2 | 14,9 |
| 1 | 11 |
| 1 | 10 |
| 1 | 3 |
| 1 | 5 |
| 1 | 17 |
| 1 | 12 |
| 1 | 16 |
+-------+----------+
You'll need to filter out the later results using something else.

SELECT u.id
, CONCAT(u.firstName, ' ', u.lastName) AS fullname
FROM cpnc_User u
JOIN
( SELECT min(id) AS minid
, firstName
, lastName
FROM cpnc_User
GROUP BY firstName, lastName
HAVING COUNT(*) > 1
) AS grp
ON u.firstName = grp.firstName
AND u.lastName = grp.lastName
ORDER BY grp.minid
, u.id
The ORDER BY grp.minid ensures that users with same first and last name stay grouped together in the output.

OK, you are doing a concatenation, then doing a compare on this, which essentially means that the DB is going to have to do something to every single row of the database.
How about a slightly different approach, you are holding surname and first name separately. So first select all those instances where surname appears > 1 time in your database. Now this has cut your population down dramatically.
Now you can do a compare on the first name to find out where the matches are.

Related

Can this query, which groups users by amount of comments posted, be simplified?

Two tables are used in this query, and all that matters in the result is the number of users which have or haven't posted any comments so far. The table user of course has the column id, which is the foreign key in the table comment, identified by the column user_id.
The first super-simple query groups users by whether or not they have any comments so far. It outputs two rows (a row with the user count who have comments, and a row with the user count who have no comments), with two columns (number of users, and whether or not they have posted any comments).
SELECT
COUNT(id) AS user_count,
IF( id IN ( SELECT user_id FROM `comment` ), 1, 0) AS has_comment
FROM `user`
GROUP BY has_comment
An example of how the output would look like here:
+------------+-------------+
| user_count | has_comment |
+------------+-------------+
| 150 | 0 |
| 140 | 1 |
+------------+-------------+
Now here comes my question. I want slightly more information here, by grouping these users into 3 groups instead:
Users that have posted no comments
Users that have posted fewer than 10 comments
Users that have posted 10 or more comments
And the best query that I know how to write for this purpose is as follows, which works, but unfortunately runs 4 subqueries and has 2 derived tables:
SELECT
COUNT(id) AS user_count,
CASE
WHEN id IN ( SELECT user_id FROM ( SELECT COUNT(user_id) AS comment_count, user_id FROM `comment` GROUP BY user_id HAVING comment_count >= 10 ) AS a) THEN '10 or more'
WHEN id IN ( SELECT user_id FROM ( SELECT COUNT(user_id) AS comment_count, user_id FROM `comment` GROUP BY user_id HAVING comment_count < 10 ) AS b) THEN 'less than 10'
ELSE 'none'
END AS has_comment
FROM `user`
GROUP BY has_comment
An example of the output here would be something like:
+------------+-------------+
| user_count | has_comment |
+------------+-------------+
| 150 | none |
| 130 | less than 10|
| 100 | 10 or more |
+------------+-------------+
This second query; can it be written more simply and efficiently, and still produce the same kind of result? (potentially maybe even be expanded into more of these kinds of "groups")

You can use two levels of aggregation:
select
count(*) no_users,
case
when no_comments = 0 then 'none'
when no_comments < 10 then 'less than 10'
else '10 or more'
end has_comment
from (
select
u.id,
(select count(*) from comments c where c.user_id = u.id) no_comments
from users u
) t
group by has_comment
order by no_comments
The subquery counts how many comments each user has (you could also express this with a left join and aggregation); then, the outer query classifies and count the users per number of comments.

sum of count(*) for all rows in MySQL

I'm stuck with sum() query where I want the sum of count(*) values in all rows with group by.
Here is the query:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u;
Current Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
----------------------------------
Expected Output:
----------------------------------
| user | count | sum |
----------------------------------
| driver | 58 | 90 |
| passenger | 32 | 90 |
----------------------------------
If I remove sum(u.count) from query then output is looks like:
--------------------------
| user | count |
--------------------------
| driver | 58 |
| passenger | 32 |
--------------------------

You need a subquery:
SELECT user_type,
Count(*) AS count,
(SELECT COUNT(*)
FROM users
WHERE user_type IN ("driver","passenger" )) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;
Note you dont need distinct here.
OR
SELECT user_type,
Count(*) AS count,
c.sum
FROM users
CROSS JOIN (
SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) as c
WHERE user_type IN ("driver","passenger" )
GROUP BY user_type ;

You can use WITH ROLLUP modifier:
select coalesce(user_type, 'total') as user, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type with rollup
This will return the same information but in a different format:
user | count
----------|------
driver | 32
passenger | 58
total | 90
db-fiddle
In MySQL 8 you can use COUNT() as window function:
select distinct
user_type,
count(*) over (partition by user_type) as count,
count(*) over () as sum
from users
where user_type in ('driver', 'passenger');
Result:
user_type | count | sum
----------|-------|----
driver | 32 | 90
passenger | 58 | 90
db-fiddle
or use CTE (Common Table Expressions):
with cte as (
select user_type, count(*) as count
from users
where user_type in ('driver', 'passenger')
group by user_type
)
select user_type, count, (select sum(count) from cte) as sum
from cte
db-fiddle

I would be tempted to ask; Are you sure you need this at the DB level?
Unless you are working purely in the database layer, any processing of these results will be built into an application layer and will presumably require some form of looping through the results
It could be easier, simpler, and more readable to run
SELECT user_type,
COUNT(*) AS count
FROM users
WHERE user_type IN ("driver", "passenger")
GROUP BY user_type
.. and simply add up the total count in the application layer
As pointed out by Juan in another answer, the DISTINCT is redundant as the GROUP BY ensures that each resultant row is different
Like Juan, I also prefer an IN here, rather than OR condition, for the user_type as I find it more readable. It also reduces the likelihood of confusion if combining further AND conditions in the future
As an aside, I would consider moving the names of the user types, "driver" and "passenger" into a separate user_types table and referencing them by an ID column from your users table
N.B. If you absolutely do need this at the DB level, I would advocate using one of Paul's excellent options, or the CROSS JOIN approach proffered by Tom Mac, and by Juan as his second suggested solution

Try this. Inline view gets the overall total :
SELECT a.user_type,
count(*) AS count,
b.sum
FROM users a
JOIN (SELECT COUNT(*) as sum
FROM users
WHERE user_type IN ("driver","passenger" )
) b ON TRUE
WHERE a.user_type IN ("driver","passenger" )
GROUP BY a.user_type;

You could simply combine SUM() OVER() with COUNT(*):
SELECT user_type, COUNT(*) AS cnt, SUM(COUNT(*)) OVER() AS total
FROM users WHERE user_type IN ('driver', 'passenger') GROUP BY user_type;
db<>fiddle demo
Output:
+------------+------+-------+
| user_type | cnt | total |
+------------+------+-------+
| passenger | 58 | 90 |
| driver | 32 | 90 |
+------------+------+-------+

Add a group by clause at the end for user-type, e.g:
select
u.user_type as user,
u.count,
sum(u.count)
FROM
(
select
DISTINCT
user_type,
count(*) as count
FROM
users
where
(user_type = "driver" OR user_type = "passenger")
GROUP BY
user_type
) u GROUP BY u.user_type;

Tom Mac Explain Properly Your answer. Here is the another way you can do that.
I check the query performance and not found any difference within 1000 records
select user_type,Countuser,(SELECT COUNT(*)
FROM users
WHERE user_type IN ('driver','passenger ') )as sum from (
select user_type,count(*) as Countuser from users a
where a.user_type='driver'
group by a.user_type
union
select user_type,count(*) as Countuser from users b
where b.user_type='passenger'
group by b.user_type
)c
group by user_type,Countuser

Try this:
WITH SUB_Q AS (
SELECT USER_TYPE, COUNT (*) AS CNT
FROM USERS
WHERE USER_TYPE = "passenger" OR USER_TYPE = "driver"
GROUP BY USER_TYPE
),
SUB_Q2 AS (
SELECT SUM(CNT) AS SUM_OF_COUNT
FROM SUB_Q
)
SELECT A.USER_TYPE, A.CNT AS COUNT, SUB_Q2 AS SUM
FROM SUB_Q JOIN SUB_Q2 ON (TRUE);
I used postgresql dialect but you can easily change to a subquery.

select
u.user_type as user,
u.count,
sum(u.count)
FROM users group by user

Query: I have 4 rows, need to add the results from 3 rows into one, and leave the last row untouched

I have a kind of tricky question for this query. First the code:
SELECT user_type.user_type_description,COUNT(incident.user_id) as Quantity
FROM incident
INNER JOIN user ON incident.user_id=user.user_id
INNER JOIN user_type ON user.user_type=user_type.user_type
WHERE incident.code=2
GROUP BY user.user_type
What Am I doing?
For example, I am counting police reports of robbery, made from different kind of users. In my example, "admin" users reported 6 incidents of code "2" (robbery) and so on, as is showed in 'where' clause (incident must be robbery, also code 2).
this brings the following result:
+-----------------------+----------+
| user_type_description | Quantity |
+-----------------------+----------+
| Admin | 6 |
| Moderator | 8 |
| Fully_registered_user | 8 |
| anonymous_user | 9 |
+-----------------------+----------+
Basically Admin,Moderator and Fully_registered_user are appropriately registered users. I need to add them in a result where it shows like:
+--------------+------------+
| Proper_users | Anonymous |
+--------------+------------+
| 22 | 9 |
+--------------+------------+
I am not good with sql. Any help is appreciated. Thanks.

You can try to use condition aggregate function base on your current result set.
SUM with CASE WHEN expression.
SELECT SUM(CASE WHEN user_type_description IN ('Admin','Moderator','Fully_registered_user') THEN Quantity END) Proper_users,
SUM(CASE WHEN user_type_description = 'anonymous_user' THEN Quantity END) Anonymous
FROM (
SELECT user_type.user_type_description,COUNT(incident.user_id) as Quantity
FROM incident
INNER JOIN user ON incident.user_id=user.user_id
INNER JOIN user_type ON user.user_type=user_type.user_type
WHERE incident.code=2
GROUP BY user.user_type
) t1

You just need conditional aggregation:
SELECT SUM( ut.user_type_description IN ('Admin', 'Moderator', 'Fully_registered_user') ) as Proper_users,
SUM( ut.user_type_description IN ('anonymous_user') as anonymous
FROM incident i INNER JOIN
user u
ON i.user_id = u.user_id INNER JOIN
user_type ut
ON u.user_type = ut.user_type
WHERE i.code = 2;
Notes:
Table aliases make the query easier to write and to read.
This uses a MySQL shortcut for adding values -- just just adding the booelean expressions.

I would solve it with a CTE, but it would be better to have this association in a table.
WITH
user_type_categories
AS
(
SELECT 'Admin' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'Moderator' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'Fully_registered_user' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'anonymous_user' AS [user_type_description] , 'Anonymous' AS [user_type_category]
)
SELECT
CASE WHEN utc.[user_type_category] = 'Proper_users' THEN
SUM(incident.user_id)
END AS [Proper_Users_Quantity]
, CASE WHEN utc.[user_type_category] = 'Anonymous' THEN
SUM(incident.user_id)
END AS [Anonymous_Quantity]
FROM
[incident]
INNER JOIN [user] ON [incident].[user_id] = [user].[user_id]
INNER JOIN [user_type] ON [user].[user_type] = [user_type].[user_type]
LEFT JOIN user_type_categories AS utc ON utc.[user_type_description] = [user_type].[user_type_description]
WHERE
[incident].[code] = 2

How to nest two Queries with Eloquent Laravel

I do have a table that has multiples visitors in it. In my result list only the last visit of each visitor should be shown.
the table looks like:
id | created_at | descr | website | source
------------------------------------------
2 | 2017_12_22 | john | john1.com | a
3 | 2017_12_23 | marc | ssdff.com | b
4 | 2017_12_24 | john | john1.com | c
5 | 2017_12_24 | tina | def.com | b
6 | 2017_12_25 | stef | abc.com | a
7 | 2017_12_26 | john | john2.com | c
If I do an
->orderBy('visitors.created_at', 'desc')
->groupBy('visitors.descr')
the result list is correct as only each visitor is shown once. But I need the latest occurrence of a visitor and the result list shows the first occurrence. So instead of ID 2 I would like to see ID 7.
The following query solves that issue:
select *
from visitors as v
INNER JOIN ( select descr , MAX(created_at) as max_created
from visitors as v
group by descr ) AS M
where v.descr = M.descr AND v.created_at = M.max_created
ORDER BY created_at DESC
Could someone help get this into eloquent or get me into another direction?

If you can use the AUTO_INCREMENT id column instead of created_at, you can use another way to get the (same) result:
select *
from visitors
where id in (
select MAX(id)
from visitors
group by descr
)
ORDER BY id DESC
This query is probably less efficient, but it's easier to use with eloquent, since whereIn() supports subqueries.
$subQuery = Visitor::groupBy('descr')->select(DB::raw('max(id)'));
$latestVisitors = Visitor::whereIn('id', $subQuery)->orderBy('id', 'desc')->get();
Note that your original query can return multiple rows per descr, if the value in created_at is the same. This can't happen using id since it is unique.

Why dont you just query the sql directly like this:
DB::table('visitors')->select("
select *
from visitors as v
INNER JOIN ( select descr , MAX(created_at) as max_created
from visitors as v
group by descr ) AS M
where v.descr = M.descr AND v.created_at = M.max_created
ORDER BY created_at DESC
");
In eloquent the query would be something like this:
DB::table('visitors as v')
->join(DB::raw("( select descr , MAX(created_at) as max_created
from visitors as v
group by descr ) AS M "),'v.id','=','M.id')
->select('*')
->where('v.descr','M.descr')->where('v.created_at','M.max_created')
->orderBy('created_at','desc')
->get()

You can write a helper relationship in your Visitor model
public function lastVisit()
{
$this->hasOne('VisitModel')->latest();
}
and then just fetch all visitors with their last visit
$visitors = Visitor::with('lastVisit')->get();
That's really all there is to it.

SELECT Only Records With Duplicate (Column A || Column B) But Different (Column C) Values

I apologize for the confusing title, I can't figure out the proper wording for this question. Instead, I'll just give you the background info and the goal:
This is in a table where a person may or may not have multiple rows of data, and those rows may contain the same value for the activity_id, or may not. Each row has an auto-incremented ID. The people do not have a unique identifier attached to their names, so we can only use first_name/last_name to identify a person.
I need to be able to find the people that have multiple rows in this table, but only the ones who have multiple rows that contain more than one different activity_id.
Here's a sample of the data we're looking through:
unique_id | first_name | last_name | activity_id
---------------------------------------------------------------
1 | ted | stevens | 544
2 | ted | stevens | 544
3 | ted | stevens | 545
4 | ted | stevens | 546
5 | rachel | jameson | 633
6 | jennifer | tyler | 644
7 | jennifer | tyler | 655
8 | jennifer | tyler | 655
9 | jack | fillion | 544
10 | mallory | taylor | 633
11 | mallory | taylor | 633
From that small sample, here are the records I would want returned:
unique_id | first_name | last_name | activity_id
---------------------------------------------------------------
dontcare | ted | stevens | 544
dontcare | jennifer | tyler | 655
Note that which value of unique_id gets returned is irrelvant, as long as it's one of the unique_ids belonging to that person, and as long as only one record is returned for that person.
Can anyone figure out how to write a query like this? I don't care what version of SQL you use, I can probably translate it into Oracle if it's somehow different.

I would do:
SELECT first_name, last_name, COUNT(DISTINCT activity_id)
FROM <table_name>
GROUP BY first_name, last_name
HAVING COUNT(DISTINCT activity_id) > 0;

I'll build through the logic with you. First, lets find all people that have more than one entry:
Unique list of name + activity ID:
select first_name, last_name,activity_id, count(1)
from yourtable
group by first_name, last_name,activity_id
Now we'll turn that into a subquery and look for users with more than 1 activity_ID
Select first_name, last_name
from
(select first_name, last_name,activity_id, count(1)
from yourtable
group by first_name, last_name,activity_id) a
group by first_name, last_name
having count(1) > 1
Should work as that...I didn't return an activity_id, adding max(activity_id) to the select statement will grab the highest one.

Note that which value of unique_id gets returned is irrelvant, as long as it's one of the unique_ids belonging to that person, and as long as only one record is returned for that person.
These querys should do the trick. there is no need for distinct keywords or an subquery to fetch the results BumbleShrimp needs (if BumbleShrimp needed the correct unique_id also an subquery is needed to match the right value)
Below is the most simple query i could think off that should work, but it could be slow on large tables.
SELECT
first_name
, last_name
, activity_id
FROM
person
GROUP BY
first_name
, last_name
, activity_id
HAVING COUNT(*) >= 2
Could be slow because explain shows "Using index; Using temporary; Using filesort".
Using temporary could trigger an disk based temporary table so we make use off an inner self join to remove the need of an Using temporary.
SELECT
person1.first_name
, person1.last_name
, person1.activity_id
FROM
person person1
INNER JOIN
person person2
ON
person1.unique_id < person2.unique_id
AND
person1.first_name = person2.first_name
AND
person1.last_name = person2.last_name
AND
person1.activity_id = person2.activity_id
ORDER BY
activity_id asc
See demo http://sqlfiddle.com/#!2/fe3ba/29
Side note the inner join will fail if there are three or more duplicates
see demo http://sqlfiddle.com/#!2/1ff33/15
New query
SELECT
first_name
, last_name
, activity_id
FROM
person
GROUP BY
activity_id
, last_name
, first_name
HAVING COUNT(activity_id) >= 2
ORDER BY
activity_id asc
see demo http://sqlfiddle.com/#!2/1e418/3 fixes the three or more duplicates problem / orders activity_id right and can be used on large tables because there is not need off a temporary table what can slow down the execution

To get only the names, the simplest is:
SELECT
first_name
, last_name
FROM
person
GROUP BY
first_name
, last_name
HAVING
COUNT(DISTINCT activity_id) >= 2 ;
To get one row for every name, you can use window function (work fine in Oracle):
WITH cte AS
( SELECT
unique_id, first_name, last_name, activity_id
, COUNT(DISTINCT activity_id) OVER (PARTITION BY last_name, first_name)
AS cnt
, MIN(unique_id) OVER (PARTITION BY last_name, first_name)
AS min_id
FROM
person
)
SELECT
unique_id, first_name, last_name, activity_id
FROM
cte
WHERE
cnt >= 2
AND
min_id = unique_id ;
Instead of MIN(unique_id) OVER ..., you could use MIN(activity_id) OVER ... (or MAX()) and accordingly min_id = activity_id. Or ROW_NUMBER() function. Since you need the COUNT(DISTINCT activity_id) anyway, let me add this version.
With an index on (last_name, first_name, activity_id, unique_id) it should be quite efficient:
WITH cte AS
( SELECT
unique_id, first_name, last_name, activity_id
, COUNT(DISTINCT activity_id) OVER (PARTITION BY last_name, first_name)
AS cnt
, ROW_NUMBER() OVER (PARTITION BY last_name, first_name
ORDER BY activity_id, unique_id)
AS rown
FROM
person
)
SELECT
unique_id, first_name, last_name, activity_id
FROM
cte
WHERE
cnt >= 2
AND
rown = 1 ;
Tested at SQL-Fiddle

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Finding repeated names in my User table - mysql

SELECT u.id, u.firstName, u.lastName FROM cpnc_User u, ( SELECT uc.firstName, uc.lastName FROM cpnc_User uc GROUP BY uc.firstName, uc.lastName HAVING count(*) > 1 ) u2 WHERE ( u.firstName = u2.firstName AND u.lastName = u2.lastName )

Related

Can this query, which groups users by amount of comments posted, be simplified?

sum of count(*) for all rows in MySQL

Query: I have 4 rows, need to add the results from 3 rows into one, and leave the last row untouched

How to nest two Queries with Eloquent Laravel

SELECT Only Records With Duplicate (Column A || Column B) But Different (Column C) Values

Categories

Resources