Merge based on "group by" groups - mysql

So I have a table called the Activities table that contains a schema of user_id, activity
There is a row for each user, activity combo.
Here is a what it might look like (empty rows added to make things easier to look at, please ignore):
| user_id | activity |
|---------|-----------|
| 1 | swimming | -- We want to match this
| 1 | running | -- person's activities
| | |
| 2 | swimming |
| 2 | running |
| 2 | rowing |
| | |
| 3 | swimming |
| | |
| 4 | skydiving |
| 4 | running |
| 4 | swimming |
I would like to basically find all other users with at least the same activities as a given input id so that I could recommend users with similar activities.
so in the table above, if I wanna find recommended users for user_id=1, the query would return user_id=2 and user_id=4 because they engage in both swimming, running (and more), but not user_id=3 because they only engage in swimming
So a result with a single column of:
| user_id |
|---------|
| 2 |
| 4 |
is what I would ideally be looking for
As far as what I've tried, I am kinda stuck at how to get a solid set of user_id=1's activities to match against. Basically I'm looking for something along the lines of:
SELECT user_id from Activities
GROUP BY user_id
HAVING input_user_activities in user_x_activities
where user1_activities is just a set of our input user's activities. I can create that set using a WITH input_user_activities AS (...) in the beginning, what I'm stuck at is the user_x_activities part
Any thoughts?

To get users with the same activities, you can use a self join. Let me assume that the rows are unique:
select a.user_id
from activities a1 join
activities a
on a1.activity = a.activity and
a1.user_id = #user_id
group by a.user_id
having count(*) = (select count(*) from activities a1 where a1.user_id = #user_id);
The having clause answers your question -- of getting users that have the same activities as a given user.

You can easily get all users ordered by similarity using a JOIN (that finds all common rows) and a GROUP BY (to summarize the similarity per user_id) and finally an ORDER BY to return the most similar users first.
SELECT b.user_id, COUNT(*) similarity
FROM activities a
JOIN activities b
ON a.activity = b.activity
WHERE a.user_id = 1 AND b.user_id != 1
GROUP BY b.user_id
ORDER BY COUNT(*) DESC
An SQLfiddle to test with.

Related

Using parent SQL column in subquery

Good morning. I'm trying to pull the username of the user from the column in to_id. Is there. It'd be simple if I was just filtering on to_id, but I also need records from another column from_id. I've attempted doing a UNION to get around this issue, but it only pulls records from user.id 3 of course.
Does anyone happen to know a way around this? I'm somewhat new to writing complex SQL queries. Haven't been able to figure much out from similar questions.
SELECT
users.username, -- Placeholder until username from to_id can be pulled
payment.id,
to_id,
amount,
state,
type,
timedate
FROM
payment
LEFT JOIN users ON users.id = payment.to_id AND users.id = payment.from_id
WHERE to_id = 3 OR from_id = 3
The result of that would be along the lines of:
+----------+----+-------+--------+----------+------+---------------------+
| username | id | to_id | amount | state | type | timedate |
+----------+----+-------+--------+----------+------+---------------------+
| NULL | 1 | 1 | 12.56 | COMPLETE | u2u | 2021-11-12 06:09:21 |
| NULL | 2 | 1 | 43.00 | COMPLETE | u2u | 2021-11-12 06:17:10 |
| NULL | 3 | 3 | 2.25 | COMPLETE | u2u | 2021-11-12 06:22:53 |
+----------+----+-------+--------+----------+------+---------------------+
Username is null due to the two Joins being AND. If it's OR, the username will show up, but the rows will be there twice. Once with the to_id username, once with the from_id username.
So you have one users table for all payers and payees accounts and one transaction table with two ID columns (payer and payee)? You need to join the users table to the transaction table twice, once to get the payer info, once to get the payee info.
select
payment.from_id,
from_user.username,
payment.to_id,
to_user.username,
payment.id,
amount,
state,
type,
timedate
from payment
left join users as from_user
on from_user.id = payment.from_id
left join users as to_user
on to_user.id = payment.to_id
where payment.to_id = 3 OR payment.from_id = 3

MySQL selective GROUP BY, using the maximal value

I have the following (simplified) three tables:
user_reservations:
id | user_id |
1 | 3 |
1 | 3 |
user_kar:
id | user_id | szak_id |
1 | 3 | 1 |
2 | 3 | 2 |
szak:
id | name |
1 | A |
2 | B |
Now I would like to count the reservations of the user by the 'szak' name, but I want to have every user counted only for one szak. In this case, user_id has 2 'szak', and if I write a query something like:
SELECT sz.name, COUNT(*) FROM user_reservations r
LEFT JOIN user_kar k ON k.user_id = r.user_id
LEFT JOIN szak s ON r.szak_id = r.id
It will return two rows:
A | 2 |
B | 2 |
However I want to every reservation counted to only one szak (lets say the highest id only). I tried MAX(k.id) with HAVING, but seems uneffective.
I would like to know if there is a supported method for that in MySQL, or should I first pick all the user ID-s on the backend site first, check their maximum kar.user_id, and then count only with those, removing them from the id list, when the given szak is counted, and then build the data back together on the backend side?
Thanks for the help - I was googling around for like 2 hours, but so far, I found no solution, so maybe you could help me.
Something like this?
SELECT sz.name,
Count(*)
FROM (SELECT r.user_id,
Ifnull(Max(k.szak_id), -1) AS max_szak_id
FROM user_reservations r
LEFT OUTER JOIN user_kar k
ON k.user_id = r.user_id
GROUP BY r.user_id) t
LEFT OUTER JOIN szak sz
ON sz.id = t.max_szak_id
GROUP BY sz.name;

Listing user types with counts and percentages

I have a 'users' table:
user_id | prov_platform | first_name | last_name
--------|-----------------|--------------|-------------------
1 | Facebook | Joe | Bloggs
2 | Facebook | Sue | Barker
3 | | John | Doe
4 | Twitter | John | Terry
5 | Google | Angelina | Jolie
And I originally wanted to return a list of all the different social platform types there were in my users table, with counts beside each one - so I came up with this:
SELECT
IFNULL(prov_platform, 'Other') AS prov_platform,
COUNT(*) AS platform_total
FROM users
GROUP BY prov_platform
ORDER BY platform_total DESC
Which resulted in this:
prov_platform | platform_total
---------------|-----------------
Facebook | 2
Twitter | 1
Google | 1
Other | 1
But I now want to add another couple of fields to this query; 'allround_total' and 'percentage'. So, the above recordset would become:
prov_platform | platform_total | allround_total | percentage
---------------|----------------|----------------|---------------
Facebook | 2 | 5 | 40%
Twitter | 1 | 5 | 20%
Google | 1 | 5 | 20%
Other | 1 | 5 | 20%
This is as far as I got before getting in a muddle:
SELECT
u.prov_platform,
COUNT(*) AS platform_total,
allround_total,
allround_total/platform_total*100 AS percentage
FROM
users AS u
INNER JOIN (
SELECT COUNT(*) AS allround_total FROM users
) AS allround_total
GROUP BY
prov_platform
ORDER BY
platform_total DESC
This returns the 'allround_total' field, which works, but have no idea how performance friendly it'll be. What I can't workout is how to get the percentage to work correctly. Currently, the above query returns an error:
Unknown column 'platform_total' in 'field list'
I think I'm close, I just need a much appreciated push over the line.
You cannot use column aliases in the same level as they are defined. I also think you have the calculation for percentage backwards.
SELECT u.prov_platform, COUNT(*) AS platform_total,
const.allround_total,
100*count(*)/const.allround_total AS percentage
FROM users u cross join
(SELECT COUNT(*) as allround_total FROM users
) const
GROUP BY prov_platform
ORDER BY platform_total DESC;
I changed the join from inner join to cross join. Although MySQL allows all joins to lack an on clause, I find it disconcerting to see an inner join with no on. Similarly, I changed the name of the table alias to differ from the column alias, to make the query easier to read.

MySQL SELECT Multiple DISTINCT COUNT

Here is what I'm trying to do. I have a table with user assessments which may contain duplicate rows. I'm looking to only get DISTINCT values for each user.
In the example of the table below. If only user_id 1 and 50 belongs to the specific location, then only the unique video_id's for each user should be returned as the COUNT. User 1 passed video 1, 2, and 1. So that should only be 2 records, and user 50 passed video 2. So the total for this location would be 3. I think I need to have two DISTINCT's in the query, but am not sure how to do this.
+-----+----------+----------+
| id | video_id | user_id |
+-----+----------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 50 |
| 3 | 1 | 115 |
| 4 | 2 | 25 |
| 5 | 2 | 1 |
| 6 | 6 | 98 |
| 7 | 1 | 1 |
+-----+----------+----------+
This is what my current query looks like.
$stmt2 = $dbConn->prepare("SELECT COUNT(DISTINCT user_assessment.id)
FROM user_assessment
LEFT JOIN user ON user_assessment.user_id = user.id
WHERE user.location = '$location'");
$stmt2->execute();
$stmt2->bind_result($video_count);
$stmt2->fetch();
$stmt2->close();
So my query returns all of the count for that specific location, but it doesn't omit the non-unique results from each specific user.
Hope this makes sense, thanks for the help.
SELECT COUNT(DISTINCT ua.video_id, ua.user_id)
FROM user_assessment ua
INNER JOIN user ON ua.user_id = user.id
WHERE user.location = '$location'
You can write a lot of things inside a COUNT so don't hesitate to put what you exactly want in it. This will give the number of different couple (video_id, user_id), which is what you wanted if I understood correctly.
The query below joins a sub-query that fetches the distinct videos per user. Then, the main query does a sum on those numbers to get the total of videos for the location.
SELECT
SUM(video_count)
FROM
user u
INNER JOIN
( SELECT
ua.user_id,
COUNT(DISTINCT video_id) as video_count
FROM
user_assessment ua
GROUP BY
ua.user_id) uav on uav.user_id = u.user_id
WHERE
u.location = '$location'
Note, that since you already use bindings, you can also pass $location in a bind parameter. I leave this to you, since it's not part of the question. ;-)

mysql select top unique values with inner join

I have 2 tables that look like this:
users (uid, name)
-------------------
| 1 | User 1 |
| 2 | User 2 |
| 3 | User 3 |
| 4 | User 4 |
| 5 | User 5 |
-------------------
highscores (user_id, time)
-------------------
| 3 | 12005 |
| 3 | 29505 |
| 3 | 17505 |
| 5 | 19505 |
-------------------
I want to query only for users that have a highscore and only the top highscore of each user. The result should look like:
------------------------
| User 3 | 29505 |
| User 5 | 19505 |
------------------------
My query looks like this:
SELECT user.name, highscores.time
FROM user
INNER JOIN highscores ON user.uid = highscores.user_id
ORDER BY time ASC
LIMIT 0 , 10
Actually this returns multiple highscores of the same user. I also tried to group them but it did not work since it did not return the best result but a random one (eg: for user id 3 it returned 17505 instead of 29505).
Many thanks!
You should use the aggregated function MAX() together with group by clause.
SELECT a.name, MAX(b.`time`) maxTime
FROM users a
INNER JOIN highscores b
on a.uid = b.user_id
GROUP BY a.name
SQLFiddle Demo
Your effort of grouping users was correct. You just needed to use MAX(time) aggregate function instead of selecting only time.
I think you wrote older query was like this:
SELECT name, time
FROM users
INNER JOIN highscores ON users.uid = highscores.user_id
GROUP BY name,time
But actual query should be:
SELECT user.name, MAX(`time`) AS topScore
FROM users
INNER JOIN highscores ON users.uid = highscores.user_id
GROUP BY user.name