How to use aggregate functions with joins? - mysql

I have an main dataset(users) as follows.
ID Username Status
1 John Active
2 Mike Active
3 Ann Deactive
4 Leta Active
5 Lena Active
6 Lara Active
7 Mitch Active
Further I have revenue table as follows.
subuser hour Revenue
John_01 2/26/2022 5:00 5
Mike_01 2/26/2022 7:00 8
Mike_02 2/26/2022 7:00 22
Leta_03 2/26/2022 7:00 67
Leta_07 2/26/2022 9:00 56
Mitch_07 2/26/2022 11:00 34
Now I need to get a table as follows.
User Total Usage
John 5
Mike 22
Leta 123
Lena 0
Lara 0
Mitch 0
Here I need to get the sum of all hours of each user substring and match with main user table.Further if same hour is for same substring I need to get the maximum revenue value and other values should be neglect for that particular hour.
Ex:
Mike_01 2/26/2022 7:00 8
Mike_02 2/26/2022 7:00 22
Here Mike_01 2/26/2022 7:00 8 should neglect.
So I tried as below.
SELECT
u.Username,
COALESCE(SUM(Revenue), 0) AS TOTAL USAGE
FROM users u
LEFT JOIN revenuetable e
ON SUBSTRING_INDEX(e.subuser, '_', 1) = u.Username AND
e.Hour BETWEEN 'XXX' and 'XXX'
where u.Status='Active'
GROUP BY
u.Username
order by u.ID.
But this didn't get the maximum value if same hour repeats. Can someone show me where I messed this?
update:
Do we have any method other tan using window functions?

If using MySQL that supports row_number() then join to a derived table that removes the unwanted rows.
SELECT
u.Username,
COALESCE(SUM(Revenue), 0) AS TOTAL USAGE
FROM users u
LEFT JOIN (
Select *
, row_number() OVER(partition by SUBSTRING_INDEX(e.subuser, '_', 1), hour order by revenue DESC) rn
From revenuetable ) e
ON SUBSTRING_INDEX(e.subuser, '_', 1) = u.Username AND rn = 1
e.Hour BETWEEN 'XXX' and 'XXX'
where u.Status='Active'
GROUP BY
u.Username
order by u.ID
Introducing this function and the over clause will give precedence to the highest revenue in each hour per user as the 'rn' column will be 1 for each such row.

Related

Group by multiple fields after SQL join

I have written the following query which correctly joins two tables which shows the number of completed tasks by individuals in a team and the associated cost of those tasks:
SELECT users.id AS user_id,
users.name,
COALESCE(tasks.cost, 0) AS cost,
tasks.assignee,
tasks.completed,
tasks.completed_by
FROM users
JOIN tasks
ON tasks.assignee = users.id
WHERE completed IS NOT NULL AND assignee IS NOT NULL
This provides the following table:
user id
name
asignee
cost
completed
completed_by
18
mike
8
0.25
2022-01-24 19:54:48
8
13
katie
13
0
2022-01-24 19:55:18
8
13
katie
13
0
2022-01-25 11:49:53
8
12
jim
12
0.5
2022-01-25 11:50:02
12
9
ollie
9
0.25
2022-03-03 02:38:41
9
I would now like to further find the SUM of cost, grouped by name and the month completed. However, I can't work out the syntax for the GROUP BY after my current select and WHERE clause. Ultimately, I would like the query to return something like this:
name
cost_sum
month
mike
62
January
katie
20
January
jim
15
January
ollie
45
January
mike
17
February
I have tried various combinations and nesting GROUP BY clauses but I can't seem to get the desired result. Any pointers would be greatly appreciated.
Join users to a query that aggregates in tasks and returns the total cost per month for a specific year:
SELECT u.name,
COALESCE(t.cost, 0) AS cost,
DATE_FORMAT(t.last_day, '%M')
FROM users u
INNER JOIN (
SELECT assignee, LAST_DAY(completed) last_day, SUM(cost) AS cost
FROM tasks
WHERE YEAR(completed) = 2022
GROUP BY assignee, last_day
) t ON t.assignee = u.id
ORDER BY t.last_day;
No need to check if completed is null or assignee is null, because nulls are filtered out here:
WHERE YEAR(completed) = 2022
and here:
ON t.assignee = u.id
Probably something like this:
SELECT users.name, tasks.completed_by month, sum(COALESCE(tasks.cost, 0)) cost_sum
FROM users
JOIN tasks
ON tasks.assignee = users.id
WHERE completed IS NOT NULL AND assignee IS NOT NULL
group by users.name, tasks.completed_by

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

Need help to figure out sorting sql query

I'm trying to get the total number of levels gained or lost from this sort of table:
id name level timestamp
1 Rex 15 10:25
2 Rex 15 10:26
3 Rex 15 10:27
4 Rex 14 10:28
5 Rex 13 10:29
6 Rex 13 10:30
7 Rex 13 10:31
8 Rex 13 10:29
9 Xer 44 10:30
10 Xer 44 10:31
11 Xer 45 10:32
12 Xer 45 10:33
13 Xer 45 10:34
Currently I'm running
SELECT id, name, level, timestamp, MAX(level) - MIN(level) AS gained
FROM log
GROUP BY name
But the problem with this query is that both gained and lost levels will count as gained. It would be perfect if I could get a negative int in the gained column if the user has lost levels
The output I want from the data above is:
id name level timestamp gained
8 Rex 13 10:29 -2
13 Xer 45 10:34 1
If you need to respect the timeline, then try something like this:
SELECT MAX(id) id, name,
( SELECT level FROM log l0 WHERE l.name = l0.name ORDER BY timestamp DESC LIMIT 1 ) level,
MAX(timestamp) timestamp,
-- last entry for the name
( SELECT level FROM log l1 WHERE l.name = l1.name ORDER BY timestamp DESC LIMIT 1 ) -
-- first entry for the name
( SELECT level FROM log l2 WHERE l.name = l2.name ORDER BY timestamp ASC LIMIT 1 ) gained
FROM log l
GROUP BY name
I used LAG in as subquery to get the changes and then summed those changes in an outer sub-query. To get the last row I uses yet another query to find the max time for each name. Maybe not the most efficient query but it works
SELECT l.id, l.name, l.level, l.timestamp, sg.gain
FROM log l
JOIN (SELECT name, SUM(gain) gain
FROM (SELECT name, level - COALESCE(LAG(level) OVER w, level) as gain
FROM log
WINDOW w AS (PARTITION BY name ORDER BY timestamp)) as g
GROUP BY name) as sg ON sg.name = l.name
JOIN (SELECT name, MAX(time) max_t
FROM log
GROUP BY name) mt ON mt.name = l.name AND mt.max_t = l.time

Display distinct users reaching specific count thresholds

Here's my table, showing user names and the timestamp they scored a point:
id user date
1 Aaron 23/02/2012 22:44
2 Betty 23/02/2012 22:47
3 Carlos 24/02/2012 16:01
4 David 28/02/2012 11:40
5 David 28/02/2012 12:32
6 David 28/02/2012 16:59
7 Aaron 2/03/2012 13:46
8 Aaron 30/03/2012 18:37
9 Betty 30/03/2012 19:58
10 Emma 9/04/2012 6:49
11 Emma 9/04/2012 13:19
12 Emma 9/04/2012 18:20
13 Emma 9/04/2012 20:46
14 Aaron 10/04/2012 15:47
15 Betty 10/04/2012 19:15
16 Betty 10/04/2012 20:40
17 Carlos 11/04/2012 9:44
18 Carlos 11/04/2012 20:01
19 David 11/04/2012 23:17
20 David 12/04/2012 17:09
And here is the results table I am trying to achieve, i.e. an x axis showing month-year, and a y axis displaying the number of users who reached a certain points threshold within that month:
date 1 point First time? 2 points First time? 3 points First time? 4 points First time? Total
Feb-12 A,B,C A,B,C D D 4
Mar-12 B A A 3
Apr-12 A,B,C B,C,D B,C,D E E 4
I've only got as far as calculating the total number of points and the total number of distinct scorers within a given month:
SELECT DISTINCT CONCAT (MONTHNAME(date), ' ', YEAR(date)) as 'date', COUNT(id) as total_points, COUNT(distinct referrer_id) as number_of_scorers
from points
group by CONCAT (MONTH(date), ' ', YEAR(date))
order by YEAR(date), MONTH(date)
which is only giving me:
date total_points number_of_scorers
Feb-12 6 4
Mar-12 3 3
etc.
So my questions are:
How can I amend the query to show me which users reached each point threshold within each month?
How can I amend the query to show me which users reached each point threshold for the first time within that month?
Thanks
The basic query you need is this:
select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user;
This gets the number of points for each user in a month.
The rest is just aggregations, joins, and conditions:
select ymu.yyyymm,
group_concat(case when ymu.points = 1 then user end) as Points1_Users,
group_concat(case when ymu.points = 1 and ymu.yyyymm = u.min_yyyymm then user end) as Points1_Users_First,
group_concat(case when ymu.points = 2 then user end) as Points2_Users,
group_concat(case when ymu.points = 2 and ymu.yyyymm = u.min_yyyymm then user end) as Points2_Users_First
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) ymu join
(select user, min(yyyymm) as min_yyyymm
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) t
group by user
) u
on ymu.user = u.user
group by yyyymm
order by yyyymm;