How to calculate a moving 4 week average every week in MySQL - mysql

I have a table something like this.
count | date
------------------
1 2012-01-01
4 2012-01-01
5 2012-01-02
12 2012-01-03
7 2012-01-04
4 2012-01-05
19 2012-01-06
1 2012-01-07
etc...
I'm looking for a way to calculate the average count per week over the previous 4 week period for each week.
The results should be something like...
avg | yearweek
------------------
3 201201
5 201202
6 201203
1 201204
11 201205
3 201206
18 201207
12 201208
etc...
...where each yearweek is the weekly average over the past 4 yearweeks.
Getting the weekly averages is simple enough but how do I then get that over the past 4 yearweeks? And then how to do I do that as a rolling average? Am I better off just doing this in code?

While you could certainly do this in the code of your application, if you really need to do it in SQL, you could first create a table of results aggregated by week and then join it to itself to get the 4-week moving average.
In doing so, instead of storing the averages, I would store the sums and the number of days (1st or last week of year might not have 7 days - thinking of the edge cases). That way, you would avoid calculating unweighted averages when the denominators of averages are different.
So let's say you have a table "weekly_results", which has fields: yearweek, sumcount, numdays. You can now self-join to the last 4 weeks and get the sums and counts, and then calculate the averages from that:
SELECT yearweek, sum_cnt/sum_dys as avg_moving_4wk
FROM (
SELECT a.yearweek, sum(b.sumcount) as sum_cnt, sum(b.numdays) as sum_dys
FROM weekly_results a
join weekly_results b
on a.yearweek - b.yearweek <4 and a.yearweek - b.yearweek >=0
GROUP BY a.yearweek
) t1
GROUP BY yearweek

Related

Select weekly average of user usage, only for some users (mysql)

I have 2 tables, and I want to show a weekly TOTAL average of data usage for users who started using the application 10 weeks ago. (in that week)
Table 1 is called "users"
user_id user_name user_date
1 a 2020-05-01
2 b 2020-05-03
3 c 2020-06-01
4 d 2020-06-06
5 e 2020-06-09
Table 2 is called "data_tbl"
data_id user_id date_used data_used
1 1 2020-05-09 7
2 1 2020-05-09 12
3 2 2020-05-12 100
4 2 2020-05-20 177
5 1 2020-05-21 78
6 2 2020-05-29 33
7 1 2020-06-01 44
8 2 2020-06-01 123
9 1 2020-06-03 62
Consider 10 weeks ago is between 2020-05-01 and 2020-05-08
So the 2 users we are interested in in that case is user_id 1 and 2 (a and b)
We consider first week from 05-01 to 05-08
Second week from 2020-05-08 to 2020-05-15
Third week from 2020-05-15 to 2020-05-22
Forth week from 2020-05-22 to 2020-05-29 and so on
For week 1 we would have average usage = 0
For week 2 we would have average usage (7+12+100)/3=39
For week 3 we would have average usage (177+78)/2=127
For week 4 we would have average usage 33
For week 5 we would have average usage (44+123+62)/3=76
I really don't know how to start, if I should do a join, or a select in select with average.
I tested something like: (but no success)
SELECT AVG(data_used),
FROM data_tbl
LEFT JOIN users ON data_tbl.user_id=users.user_id
WHERE users.user_date>= "2020-05-01" AND users.user_date<="2020-05-08"
GROUP BY date
ORDER BY date;
You can achieve this easily with YEARWEEK() function
However what you want to achieve is not totally clear for me because the results you want don't really match your data.
Example:
SELECT YEARWEEK(SYSDATE()) AS Actual_Week,
YEARWEEK(user_date) User_Date_Week,
YEARWEEK(SYSDATE()) - YEARWEEK(user_date) AS diff_weeks ,
u.*
FROM users u
Returns
Actual_Week User_Date_Week diff_weeks user_id user_name user_date
202029 202017 12 1 a 2020-05-01
202029 202018 11 2 b 2020-05-03
202029 202022 7 3 c 2020-06-01
202029 202022 7 4 d 2020-06-06
202029 202023 6 5 e 2020-06-09
So you can see that user 1 is 12 weeks ago, and user 2 is 11 week ago. And you assume they are 10 weeks ago, which is incorrect. Sames goes with your date_used in data_tbl.
So I'll just put you on the right path, it should then be easy to adapt following your needs...
Do something like this
SELECT YEARWEEK(d.date_used), AVG(data_used)
FROM users u
INNER JOIN data_tbl d ON u.user_id = d.user_id
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) BETWEEN 11 AND 12
GROUP BY YEARWEEK(d.date_used)
Returns
YEARWEEK(d.date_used) AVG(data_used)
202018 9.5
202019 100
202020 127.5
202021 33
202022 76.3333
You can see that the numbers you expect are there, but that they are others. And this result seems correct to me, the results in your question were wrong.
Notice that to get the results for user 1 and 2, I specified
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) BETWEEN 11 AND 12
If you want the user of 10 weeks ago, just do
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) = 10
And to conclude :
you might want to change the mode of YEARWEEK(), if the weeks should start on Monday, Sunday, or other options. Modes are well described here, with plenty of examples
If you also want the weeks without data in your results (so always 0), you have to use a Calendar table. There are plenty of examples on SO.

how can I calculate a year information by week

I want the query calculate like if week 2 for "Shalma", then "total seconds" for week 2 only will Sum in query....I want this result in query not in report, is that possible?
exp the result I want:
PIC Total Seconds SortByWeek
Aida 600 2
Arifah 540000 2
Shalma 28000 1
Shalma 72036900 2
Zul 54000000 1
Zul 3000 2
Zul 100000 3
Zul 283500 4
it shows total by week for each name.
Don't include the full date in grouping.
SELECT PIC, Sum([Total seconds]) AS [SumOfTotalSeconds], SortByWeek, Year(date_worked_smt) AS Yr, days
FROM [union]
INNER JOIN [day] ON union.SortByWeek = day.ID
GROUP BY PIC, Year(date_worked_smt), SortByWeek, days
ORDER BY PIC, SortByWeek;

Run complicated query on multiple dates

I have the following query to get the monthly amount of users:
SELECT count(user_id) from subs
where (started_at between #start_date and #start_date + interval 1 month
or (expires_at>#start_date + interval 1 month and started_at<#start_date))
If we had the following DB:
user_id started_at expires_at
=============================
1 2015-01-01 2015-12-31
2 2015-01-01 2015-01-03
3 2015-02-01 2015-02-28
4 2015-03-01 2015-03-31
5 2015-04-01 2015-04-31
6 2015-04-01 2016-04-01
7 2015-05-01 2015-05-09
I need a query that will return the following table:
2015-01 - 2
2015-02 - 2 (because one of Jan records doesn't expire till Dec)
2015-03 - 2
2015-04 - 3
2015-05 - 3
etc
So what is the efficient way to get this result in one query?
You probably want something like this:
SELECT YEAR(started_at) as 'Year',
MONTH(started_at) as 'Month',
COUNT(user_id) as 'Users'
FROM subs
GROUP BY YEAR(started_at),MONTH(started_at);
Note that in case a month has no users, this query will not return an entry for that month. If you want to also include months with 0 users you want a more complex query; check this for more info.
You want to GROUP BY the year and month.
Assuming your started_at column is of a DATE type, you can f.e. use GROUP_BY YEAR(started_at), MONTH(started_at), or also use DATE_FORMAT to format the column value to a single string value, of the form YYYY-MM and GROUP BY that. Select that same value as a column too, to get the proper identifier you want.

mysql group by day and count then filter only the highest value for each day

I'm stuck on this query. I need to do a group by date, card_id and only show the highest hits. I have this data:
date card_name card_id hits
29/02/2016 Paul Stanley 1345 12
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Dave Mustaine 1349 10
25/02/2016 Ozzy 1351 17
23/02/2016 Jhonny Cash 1353 13
23/02/2016 Elvis 1355 15
20/02/2016 James Hethfield 1357 9
20/02/2016 Max Cavalera 1359 12
My query at the moment
SELECT DATE(card.create_date) `day`, `name`,card_model_id, count(1) hits
FROM card
Join card_model ON card.card_model_id = card_model.id
WHERE DATE(card.create_date) >= DATE(DATE_SUB(NOW(), INTERVAL 1 MONTH)) AND card_model.preview = 0
GROUP BY `day`, card_model_id
;
I want to group by date, card_id and filter the higher hits result showing only one row per date. As if I run a max(hits) with group by but I won't work
Like:
date card_name card_id hits
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Ozzy 1351 17
23/02/2016 Elvis 1355 15
20/02/2016 Max Cavalera 1359 12
Any light on that will be appreciated. Thanks for reading.
Here is one way to do this. Based on your sample data (not the query):
select s.*
from sample s
where s.hits = (select max(s2.hits)
from sample s2
where date(s2.date) = date(s.date)
);
Your attempted query seems to have no relationship to the sample data, so it is unclear how to incorporate those tables (the attempted query has different columns and two tables).

How to select database rows by days ,month,years

How to select database rows by days ,month,years
I have table like this
id count generatedAt
1 130 2013-01-13 02:21:02
2 120 2013-01-08 04:15:06
3 89 2013-01-08 01:42:57
4 24 2012-11-25 05:31:43
5 3 2012-02-31 09:25:24
I would like to select the rows by day or month or year.
For example by day.
2-3 is same day so I need only
1,2,4,5
for example by month,1,2,3 is same month so I need only
1,4,5
for year I need only 1,4
How can I make it?
I am using doctorine2
You can do something like this . .
you can choose a date , month or specific year to select rows.
select * from TabeName
//for days
where DAY(myDate) = 20
//for month
MONTH(myDate) = 12
// for year
YEAR(myDate) = 2008