Growing number of months passed - MySQL - mysql

I would like to create the MonthCount column described below. I have the ID and Date fields already created I am just having trouble thinking of a clever way to count the number of dates that have passed. The dates are always the first of the month, but the first month could be any month between Jan and Dec.
ID Date MonthCount
1 1/2016 1
1 2/2016 2
1 3/2016 3
2 5/2015 1
2 6/2015 2
2 7/2015 3
It seems like I remember reading somewhere about joining the table to itself using a > or < operator but I can't completely recall the method.

The best way to handle this in MySQL is to use variables:
select t.*,
(#rn := if(#id = id, #rn + 1,
if(#id := id, 1, 1)
)
) as rn
from t cross join
(select #rn := 0, #id := -1) params
order by id, date;

It looks like you're looking for:
select a.id, a.date, b.mindate
from table as a
inner join (
select id, min(date) as mindate
from table
group by id
) as b on (a.id=b.id)
this will give you
ID Date mindate
1 1/1/2016 1/1/2016
1 1/2/2016 1/1/2016
1 1/3/2016 1/1/2016
2 1/5/2015 1/5/2015
2 1/6/2015 1/5/2015
2 1/7/2015 1/5/2015
now homework for you is to figure out how to calculate difference between two dates

Related

Database entry streak in mysql/mariadb - until today

I asked this question yesterday, but I didn't make it clear enough as it seems, so I'm gonna add some information to make everything clear.
Consider the following 2 tables:
0_12_table
ID userID text timestamp
1 1 bla 2020-08-07 10:30:00
2 1 blub 2020-08-06 11:30:00
3 1 abc 2020-08-05 09:20:00
4 1 def 2020-08-04 06:13:00
5 2 ghi 2020-08-02 08:05:00
6 2 abc 2020-08-05 10:20:00
7 3 def 2020-08-04 07:13:00
8 4 ghi 2020-08-02 09:05:00
9 5 jkl 2020-08-07 06:30:00
10 5 mno 2020-08-08 08:32:00
12_24_table:
ID userID text timestamp
1 1 bla 2020-08-07 19:30:00
2 1 blub 2020-08-06 21:30:00
3 1 abc 2020-08-05 19:20:00
4 2 def 2020-08-04 16:13:00
5 2 ghi 2020-08-02 18:05:00
6 2 abc 2020-08-05 20:20:00
7 3 def 2020-08-04 17:13:00
8 4 ghi 2020-08-02 19:05:00
9 5 jkl 2020-08-07 20:13:00
Basically, users can (and are animated to do so) to add one entry in the databse between 00:00 and 12:00 and one between 12:01 and 23:59.
Now I'd like to reward them for adding consecutive entries. Whenever they miss their timeframe, that "counter" is reset to 0 though...
In the above given data, the user with the userID 1 would have a streak of 3 days right now (in my time, its 9 AM right now), whenever its after 12 AM though, and he didn't make another entry, the counter would be set to 0 and the streak is over, because he missed adding an entry for the morning.
The users with the userID's 2,3 and 4 would have no streak at all. The streak is always cancelled, when there is one morning entry or evening entry missing.
The user with the userID 5 would have a streak of 1, which would increased to 2, whenever he made his entry for the timeframe of 12:01 to 23:59.
I hope you understand the logic. The important part is, that it does NOT matter, if he had a streak of 10 2 days ago. Whenever there is an entry missing, the streak is reset to 0. So when there is no entry until 12 AM on one day for the morning table or when there is no entry for the evening until 23:59, then the streak is gone. It always uses today as reference, so its really "consecutive entries until today".
The answer that seems to be as close as I got so far is the following:
select min(dte), max(dte), count(*)
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by dte
having sum(morning) > 0 and sum(evening) > 0
order by dte
) d cross join
(select #rn := 0) params
) me
group by dte - interval seqnum day
order by count(*) desc
limit 1;
However, I didn't introduce the userID there so far and the biggest problem: It just takes the last streak, no matter if there is a gap until today.. But, as mentioned, it always takes today as reference.
I hope someone can help me here.
Last important information: I'm using MariaDB 10.1.45, so "WITH" or "ROWNUM()" is not available, updating is not possible right now.
Thanks in advance!
This would really be simpler in a more recent version that uses window functions. But you can adapt the variables to get all streaks for users:
select userid, count(*) as length
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select userid, date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select userid, date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by userid, dte
having sum(morning) > 0 and sum(evening) > 0
order by userid, dte
) d cross join
(select #rn := 0) params
) me
group by userid, dte - interval seqnum day
order by count(*) desc;
It turns out that a "global" sequence works as well as local sequences for this problem, so the variable use is still simple. The changes are to the group by and order by clauses.
You can then use this as a subquery to get the maximum:
select userid, max(seq)
from (select userid, count(*) as seq
from (select dte, (#rn := #rn + 1) as seqnum
from (select dte
from ((select userid, date(timestamp) as dte, 1 as morning, 0 as evening
from morning
) union all
(select userid, date(timestamp) as dte, 0 as morning, 1 as evening
from evening
)
) me
group by userid, dte
having sum(morning) > 0 and sum(evening) > 0
order by userid, dte
) d cross join
(select #rn := 0) params
) me
group by userid, dte - interval seqnum day
) u
group by userid;
Note: Users with no streaks would be filtered out. You can put them back in using a left join in the outer query. However, you would really want a table of all users for this, rather than your two separate tables, so I haven't bothered.

Correct query to get average from top 5 of 7 days?

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.
Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2
Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid=? AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL $x DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
Here's the same query with the values filled in for testing:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN Courses
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY)
ORDER BY activities.steps DESC LIMIT 5
) a
GROUP BY a.encodedid
As #SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:
SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
JOIN (SELECT DISTINCT Startsemester FROM Courses) c
WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND
DATE_ADD(c.Startsemester, INTERVAL 6 DAY)
ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
) a
GROUP BY a.encodedid
Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.
StepsTotal AVGSteps
37553 12517.666666666666
Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change
(SELECT DISTINCT Startsemester FROM Courses)
to
(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')
Try this query:
SELECT #rn := 1, #weekAndYear := 0;
SELECT weekDayAndYear,
SUM(steps),
AVG(steps)
FROM (
SELECT #weekAndYear weekAndYearLag,
CASE WHEN #weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
THEN #rn := #rn + 1 ELSE #rn := 1 END rn,
#weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
steps,
lightly_act_min,
fairly_act_min,
sed_act_min,
vact_min,
encodedid,
activitydate,
username
FROM activities
ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear
Demo
With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).
Consider the following:
DROP TABLE IF EXISTS my_table;
CREATE TABLE `my_table`
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);
insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
, sum(steps) total
from (
select steps
, case when #prev = grp
then #j:=#j+1 else #j:=1 end j
, #prev:=grp prev
from (SELECT steps
, case when mod(#i,3)=0
then #grp := #grp+1 else #grp:=#grp end grp -- a 3 day week
, #i:=#i+1 i
from my_table
, (select #i:=0,#grp:=0) vars
order
by id) x
, (select #prev:= null, #j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;
+------+-------+
| prev | total |
+------+-------+
| 1 | 16 |
| 2 | 15 |
| 3 | 4 |
+------+-------+
http://sqlfiddle.com/#!9/ee46d7/11

Mysql limit 2 rows per date per user

Okay so lets say I have a basic table
thing
id
user_id
created_at
And some data
id user_id created_at
1 1 2016-09-06
2 1 2016-09-06
3 1 2016-09-06
4 1 2016-09-07
5 1 2016-09-08
6 1 2016-09-08
7 1 2016-09-08
What I want to achive is selecting max two rows per USER per DATE of created_at. I'm only displaying data from one user, but I hope you get the point.
So the results of the select should be
id user_id created_at
1 1 2016-09-06
2 1 2016-09-06
4 1 2016-09-07
5 1 2016-09-08
6 1 2016-09-08
I know I somehow have to use the LIMIT keyword, but I'm not so sure how. I'm also pretty sure I have to use a subquery and group by the date.
I hope you understand the problem and please do ask some questions if there's something difficult to understand.
One way is to use variables:
SELECT id, user_id, created_at
FROM (
SELECT id, user_id, created_at,
#rn := IF(#dt = created_at, #rn + 1,
IF(#dt := created_at, 1, 1)) AS rn
FROM mytable
CROSS JOIN (SELECT #rn := 0, #dt := '1900-01-01') AS var
ORDER BY created_at) AS t
WHERE t.rn <= 2
Demo here

Query how often an event occurred at a given time

[Aim]
We would like to find out how often an event "A" ocurred before time "X". More concretely, given the dataset below we want to find out the count of the prior purchases.
[Context]
DMBS: MySQL 5.6
We have following dataset:
user | date
1 | 2015-06-01 17:00:00
2 | 2015-06-02 18:00:00
1 | 2015-06-03 19:00:00
[Desired output]
user | date | purchase count
1 | 2015-06-01 17:00:00 | 1
2 | 2015-06-02 18:00:00 | 1
1 | 2015-06-03 19:00:00 | 2
[Already tried]
We managed to get the count on a specific day using an inner join on the table itself.
[Problem(s)]
- How to do this in a single query?
This could be done using user defined variable which is faster as already mentioned in the previous answer.
This needs creating incremental variable for each group depending on some ordering. And from the given data set its user and date.
Here how you can achieve it
select
user,
date,
purchase_count
from (
select *,
#rn:= if(#prev_user=user,#rn+1,1) as purchase_count,
#prev_user:=user
from test,(select #rn:=0,#prev_user:=null)x
order by user,date
)x
order by date;
Change the table name test to your actual table name
http://sqlfiddle.com/#!9/32232/12
Probably the most efficient way is to use variables:
select t.*,
(#rn := if(#u = user, #rn + 1,
if(#u := user, 1, 1)
)
) as purchase_count;
from table t cross join
(select #rn := 0, #u := '') params
order by user, date ;
You can also do this with correlated subqueries, but this is probably faster.

MySQL query extracting two pieces of information from table

I have a table that keeps track of the scores of people playing my game
userID | game_level | date_of_attempt | score
1 1 2014-02-07 19:29:00 2
1 2 2014-02-08 19:00:00 0
2 1 2014-03-03 11:11:04 4
... ... ... ...
I am trying to write a query that, for a given user, will tell me their cumulative score for each game_level as well as they average of the last 20 scores they have obtained on a particular game_level (by sorting on date_of_attempt)
For example:
userID | game_level | sum of scores on game level | average of last 20 level scores
1 1 26 4.5
1 2 152 13
Is it possible to do such a thing in a single query? I often need to perform the query for multiple game_levels, and I use a long subquery to work out which levels are needed which makes me think a single query would be better
MySQL does not support analytic functions, so obtaining the average is trickier than it would be in some other RDBMS. Here I use user-defined variables to obtain the groupwise rank and then test on the result to average only over the 20 most recent records:
SELECT userID, game_level, SUM(score), x.avg
FROM my_table JOIN (
SELECT AVG(CASE WHEN (#rank := (CASE
WHEN t.userID = #userID
AND t.game_level = #gamelevel
THEN #rank + 1
ELSE 0
END) < 20 THEN score END) AS avg,
#userID := userID AS userID,
#game_level := game_level AS game_level
FROM my_table,
(SELECT #rank := #userID := #game_level := NULL) init
ORDER BY userID, game_level, date_of_attempt DESC
) x USING (userID, game_level)
GROUP BY userID, game_level
See How to select the first/least/max row per group in SQL for further information.