Database entry streak in mysql/mariadb - until today - mysql

I asked this question yesterday, but I didn't make it clear enough as it seems, so I'm gonna add some information to make everything clear.
Consider the following 2 tables:
0_12_table
ID userID text timestamp
1 1 bla 2020-08-07 10:30:00
2 1 blub 2020-08-06 11:30:00
3 1 abc 2020-08-05 09:20:00
4 1 def 2020-08-04 06:13:00
5 2 ghi 2020-08-02 08:05:00
6 2 abc 2020-08-05 10:20:00
7 3 def 2020-08-04 07:13:00
8 4 ghi 2020-08-02 09:05:00
9 5 jkl 2020-08-07 06:30:00
10 5 mno 2020-08-08 08:32:00
12_24_table:
ID userID text timestamp
1 1 bla 2020-08-07 19:30:00
2 1 blub 2020-08-06 21:30:00
3 1 abc 2020-08-05 19:20:00
4 2 def 2020-08-04 16:13:00
5 2 ghi 2020-08-02 18:05:00
6 2 abc 2020-08-05 20:20:00
7 3 def 2020-08-04 17:13:00
8 4 ghi 2020-08-02 19:05:00
9 5 jkl 2020-08-07 20:13:00
Basically, users can (and are animated to do so) to add one entry in the databse between 00:00 and 12:00 and one between 12:01 and 23:59.
Now I'd like to reward them for adding consecutive entries. Whenever they miss their timeframe, that "counter" is reset to 0 though...
In the above given data, the user with the userID 1 would have a streak of 3 days right now (in my time, its 9 AM right now), whenever its after 12 AM though, and he didn't make another entry, the counter would be set to 0 and the streak is over, because he missed adding an entry for the morning.
The users with the userID's 2,3 and 4 would have no streak at all. The streak is always cancelled, when there is one morning entry or evening entry missing.
The user with the userID 5 would have a streak of 1, which would increased to 2, whenever he made his entry for the timeframe of 12:01 to 23:59.
I hope you understand the logic. The important part is, that it does NOT matter, if he had a streak of 10 2 days ago. Whenever there is an entry missing, the streak is reset to 0. So when there is no entry until 12 AM on one day for the morning table or when there is no entry for the evening until 23:59, then the streak is gone. It always uses today as reference, so its really "consecutive entries until today".
The answer that seems to be as close as I got so far is the following:
select min(dte), max(dte), count(*)
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by dte
having sum(morning) > 0 and sum(evening) > 0
order by dte
) d cross join
(select #rn := 0) params
) me
group by dte - interval seqnum day
order by count(*) desc
limit 1;
However, I didn't introduce the userID there so far and the biggest problem: It just takes the last streak, no matter if there is a gap until today.. But, as mentioned, it always takes today as reference.
I hope someone can help me here.
Last important information: I'm using MariaDB 10.1.45, so "WITH" or "ROWNUM()" is not available, updating is not possible right now.
Thanks in advance!

This would really be simpler in a more recent version that uses window functions. But you can adapt the variables to get all streaks for users:
select userid, count(*) as length
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select userid, date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select userid, date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by userid, dte
having sum(morning) > 0 and sum(evening) > 0
order by userid, dte
) d cross join
(select #rn := 0) params
) me
group by userid, dte - interval seqnum day
order by count(*) desc;
It turns out that a "global" sequence works as well as local sequences for this problem, so the variable use is still simple. The changes are to the group by and order by clauses.
You can then use this as a subquery to get the maximum:
select userid, max(seq)
from (select userid, count(*) as seq
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select userid, date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select userid, date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by userid, dte
having sum(morning) > 0 and sum(evening) > 0
order by userid, dte
) d cross join
(select #rn := 0) params
) me
group by userid, dte - interval seqnum day
) u
group by userid;
Note: Users with no streaks would be filtered out. You can put them back in using a left join in the outer query. However, you would really want a table of all users for this, rather than your two separate tables, so I haven't bothered.

Related

MySQL query for records that existed at any point each week

I have a table with created_at and deleted_at timestamps. I need to know, for each week, how many records existed at any point that week:
week
records
2022-01
4
2022-02
5
...
...
Essentially, records that were created before the end of the week and deleted after the beginning of the week.
I've tried various variations of the following but it's under-reporting and I can't work out why:
SELECT
DATE_FORMAT(created_at, '%Y-%U') AS week,
COUNT(*)
FROM records
WHERE
deleted_at > DATE_SUB(deleted_at, INTERVAL (WEEKDAY(deleted_at)+1) DAY)
AND created_at < DATE_ADD(created_at, INTERVAL 7 - WEEKDAY(created_at) DAY)
GROUP BY week
ORDER BY week
Any help would be massively appreciated!
I would create a table wktable that looks like so (for the last 5 weeks of last year):
yrweek | wkstart | wkstart
-------+------------+------------
202249 | 2022-11-27 | 2022-12-03
202250 | 2022-12-04 | 2022-12-10
202251 | 2022-12-11 | 2022-12-17
202252 | 2022-12-18 | 2022-12-24
202253 | 2022-12-25 | 2022-12-31
To get there, find a way to create 365 consecutive integers, make all the dates of 2022 out of that, and group them by year-week.
This is an example:
CREATE TABLE wk AS
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,
i(i) AS (
SELECT hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
)
,
dt(dt) AS (
SELECT
DATE_ADD(DATE '2022-01-01', INTERVAL i DAY)
FROM i
WHERE i < 365
)
SELECT
YEAR(dt)*100 + WEEK(dt) AS yrweek
, MIN(dt) AS wkstart
, MAX(dt) AS wkend
FROM dt
GROUP BY yrweek
ORDER BY yrweek;
With that table, go:
SELECT
yrweek
, COUNT(*) AS records
FROM wk
JOIN input_table ON wk.wkstart < input_table.deleted_at
AND wk.wkend > input_table.created_at
GROUP BY
yrweek
;
I first build a list with the records, their open count, and the closed count
SELECT
created_at,
deleted_at,
(SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at ) as new,
(SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at) as closed
FROM records r1
ORDER BY r1.created_at;
After that it's just adding a GROUP BY:
SELECT
date_format(created_at,'%Y-%U') as week,
MAX((SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at )) as new,
MAX((SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at)) as closed
FROM records r1
GROUP BY week
ORDER BY week;
see: DBFIDDLE
NOTE: Because I use random times, the results will change when re-run. A sample output is:
week
new
closed
2022-00
31
0
2022-01
298
64
2022-02
570
212
2022-03
800
421

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

Creating an overdraft statement

I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions

How to include some data and other data in SQL conditioned to date?

So I have this query, that selects the users, some data, with some filters (such as group that they are in and stuff) and with them the amount they produced (in $) last month (get the last existing record from last month, using MAX(created_date)), for a management platform, which shows how much they produced this month and at the previous (us.amount_produced and up.amount_produced last_month_amount).
The problem is that it doesn't select users that are new (that haven´t produced any amount last month), and I need those to return too.
Any help is appreciated, thanks
(I was thinking about doing a JOIN or even two queries, but I´m sure about the best approach)
Note by examples below that the user #3 didnt have any logs at the User_Performance table before February, he was created on february. So the query below won't return him (i need it to return him)
User table structure:
Users
id email login amount_produced created_date
---------------------------------------------
1 foo#bar.com foo 1000 2019-12-20 22:30:01
2 jack#gmail.com jack 0 2019-12-20 22:30:01
3 john#gmail.com john 2000 2020-02-01 00:00:01
User_Group_Config table structure:
User_Group_Config
user_id group_id
---------------------------------------------
1 4
2 1
3 4
User_Performance table structure this table is a log table that a job inserts data every hour, calculating users productivity and logging:
Users
user_id amount_produced created_date
---------------------------------------------
1 500 2020-01-31 22:30:01
2 0 2020-01-31 22:30:01
1 500 2020-01-31 23:30:01
2 0 2020-01-31 23:30:01
1 1000 2020-02-01 00:30:01
2 0 2020-02-01 00:30:01
3 0 2020-02-01 00:30:01
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
up.user_id,
up.amount_produced last_month_amount
FROM
db.User_Performance AS up,
db.User_Group_Config ugc,
db.User AS us
WHERE
created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id)
AND ugc.group_id = 4
AND up.user_id = ugc.user_id
AND us.id = up.user_id;
Desired Results (note that user #2 wasn´t selected since his group_id is #1
Results
(current month) (previous month)
id email login amount_produced last_month_amount
---------------------------------------------
1 foo#bar.com foo 1000 500
3 john#gmail.com john 0 null or 0
Test
SELECT
us.id,
us.contact_phone,
us.email,
us.first_name,
us.last_name,
us.login,
ugc.group_id,
us.create_date,
us.expire_date,
us.profile_photo,
us.dashboard_enabled,
us.general_rating,
us.rework_rating,
us.amount_produced,
us.amount_spent,
up.user_id,
up.amount_produced last_month_amount
FROM db.User_Performance AS up
LEFT JOIN db.User_Group_Config ugc ON up.user_id = ugc.user_id AND ugc.group_id = 4
LEFT JOIN db.User us ON us.id = up.user_id
WHERE
up.created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id);
Solved using this, with subquery and JOIN (not the best solution, but a solution):
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
us.amount_produced,
(
SELECT
perf.amount_produced
FROM
User_Performance perf
WHERE
perf.user_id = us.id AND
perf.created_date BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') and CONCAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), " 23:59:59")
ORDER BY
perf.created_date DESC
LIMIT 1
) as amount_produced_last_month
FROM
User AS us
INNER JOIN
User_Group_Config ugc ON ugc.user_id = us.id
WHERE
ugc.group_id = 4;

Correct query to get average from top 5 of 7 days?

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.
Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2
Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid=? AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL $x DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
Here's the same query with the values filled in for testing:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
As #SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN (SELECT DISTINCT Startsemester FROM Courses) c
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(c.Startsemester, INTERVAL 6 DAY)
ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
) a
GROUP BY a.encodedid
Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.
StepsTotal AVGSteps
37553 12517.666666666666
Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change
(SELECT DISTINCT Startsemester FROM Courses)
to
(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')
Try this query:
SELECT #rn := 1, #weekAndYear := 0;
SELECT weekDayAndYear,
SUM(steps),
AVG(steps)
FROM (
SELECT #weekAndYear weekAndYearLag,
CASE WHEN #weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
THEN #rn := #rn + 1 ELSE #rn := 1 END rn,
#weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
steps,
lightly_act_min,
fairly_act_min,
sed_act_min,
vact_min,
encodedid,
activitydate,
username
FROM activities
ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear
Demo
With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).
Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE `my_table`
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);
insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
, sum(steps) total
from (
select steps
, case when #prev = grp
then #j:=#j+1 else #j:=1 end j
, #prev:=grp prev
from (SELECT steps
, case when mod(#i,3)=0
then #grp := #grp+1 else #grp:=#grp end grp -- a 3 day week
, #i:=#i+1 i
from my_table
, (select #i:=0,#grp:=0) vars
order
by id) x
, (select #prev:= null, #j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;
+------+-------+
| prev | total |
+------+-------+
| 1 | 16 |
| 2 | 15 |
| 3 | 4 |
+------+-------+
http://sqlfiddle.com/#!9/ee46d7/11