Select weekly average of user usage, only for some users (mysql) - mysql

I have 2 tables, and I want to show a weekly TOTAL average of data usage for users who started using the application 10 weeks ago. (in that week)
Table 1 is called "users"
user_id user_name user_date
1 a 2020-05-01
2 b 2020-05-03
3 c 2020-06-01
4 d 2020-06-06
5 e 2020-06-09
Table 2 is called "data_tbl"
data_id user_id date_used data_used
1 1 2020-05-09 7
2 1 2020-05-09 12
3 2 2020-05-12 100
4 2 2020-05-20 177
5 1 2020-05-21 78
6 2 2020-05-29 33
7 1 2020-06-01 44
8 2 2020-06-01 123
9 1 2020-06-03 62
Consider 10 weeks ago is between 2020-05-01 and 2020-05-08
So the 2 users we are interested in in that case is user_id 1 and 2 (a and b)
We consider first week from 05-01 to 05-08
Second week from 2020-05-08 to 2020-05-15
Third week from 2020-05-15 to 2020-05-22
Forth week from 2020-05-22 to 2020-05-29 and so on
For week 1 we would have average usage = 0
For week 2 we would have average usage (7+12+100)/3=39
For week 3 we would have average usage (177+78)/2=127
For week 4 we would have average usage 33
For week 5 we would have average usage (44+123+62)/3=76
I really don't know how to start, if I should do a join, or a select in select with average.
I tested something like: (but no success)
SELECT AVG(data_used),
FROM data_tbl
LEFT JOIN users ON data_tbl.user_id=users.user_id
WHERE users.user_date>= "2020-05-01" AND users.user_date<="2020-05-08"
GROUP BY date
ORDER BY date;

You can achieve this easily with YEARWEEK() function
However what you want to achieve is not totally clear for me because the results you want don't really match your data.
Example:
SELECT YEARWEEK(SYSDATE()) AS Actual_Week,
YEARWEEK(user_date) User_Date_Week,
YEARWEEK(SYSDATE()) - YEARWEEK(user_date) AS diff_weeks ,
u.*
FROM users u
Returns
Actual_Week User_Date_Week diff_weeks user_id user_name user_date
202029 202017 12 1 a 2020-05-01
202029 202018 11 2 b 2020-05-03
202029 202022 7 3 c 2020-06-01
202029 202022 7 4 d 2020-06-06
202029 202023 6 5 e 2020-06-09
So you can see that user 1 is 12 weeks ago, and user 2 is 11 week ago. And you assume they are 10 weeks ago, which is incorrect. Sames goes with your date_used in data_tbl.
So I'll just put you on the right path, it should then be easy to adapt following your needs...
Do something like this
SELECT YEARWEEK(d.date_used), AVG(data_used)
FROM users u
INNER JOIN data_tbl d ON u.user_id = d.user_id
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) BETWEEN 11 AND 12
GROUP BY YEARWEEK(d.date_used)
Returns
YEARWEEK(d.date_used) AVG(data_used)
202018 9.5
202019 100
202020 127.5
202021 33
202022 76.3333
You can see that the numbers you expect are there, but that they are others. And this result seems correct to me, the results in your question were wrong.
Notice that to get the results for user 1 and 2, I specified
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) BETWEEN 11 AND 12
If you want the user of 10 weeks ago, just do
WHERE (YEARWEEK(SYSDATE()) - YEARWEEK(u.user_date)) = 10
And to conclude :
you might want to change the mode of YEARWEEK(), if the weeks should start on Monday, Sunday, or other options. Modes are well described here, with plenty of examples
If you also want the weeks without data in your results (so always 0), you have to use a Calendar table. There are plenty of examples on SO.

Related

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

Mysql select result in one currency

I have to create a reports in one currency. I need to do query in MySQL without using PHP process. but unable to figure it out.
There is a table called currency_exchange_rate table as follows, (exchange rate in LKR to other currency).this table is updating like one record for each currency in LKR in every month
exchange_rates
id currency_id start_date exchange_rate
1 5 2017-01-2 155
2 4 2017-01-3 25
3 6 2017-01-3 53
4 5 2017-02-1 156
5 4 2017-02-1 24
6 6 2017-02-1 54
There is a project table as follows
pro_id name value currency_id status_id owner_id date
1 studio1 500 5 1 44 2017-01-20
2 lotus 120 5 1 42 2017-01-21
3 auro 300 4 2 45 2017-01-21
4 studio2 400 6 1 44 2017-01-22
5 holland 450 4 3 46 2017-02-05
6 studio3 120 4 3 47 2017-02-06
7 studio4 400 6 3 48 2017-02-06
how to generate reports in one currency(DKK but exchange rate in LKR) like status wise,monthly total, total by owner, etc..
and we have to consider currency id,currency to be convert and exchange rate for the month for those currency types to get relevant value for project row.
hope you are clear about my scenario. your help is much appreciated.
I don't need every report. just want a sql for convert values in project table using exchange rates table or status wise report as follows
status_id value_in_one_currency
1 xxxx
2 xxxx
3 xxxx
Try this:
SELECT A.status_id, A.`value` * B.exchange_rate `value_in_one_currency`
FROM project A JOIN exchange_rates B
ON A.currency_id=C.currency_id
AND DATE_FORMAT(A.`date`,'%m-%Y')=DATE_FORMAT(B.`start_date`,'%m-%Y');
See MySQL Join Made Easy for some insight.
This is what I finalize:
I took currency_id=5 as the final currency to be converted
SELECT A.*,C.exchange_rate AS DKK,D.exchange_rate AS LKR, (order_value * D.exchange_rate /C.exchange_rate ) AS `converted_value`
FROM projects A
LEFT JOIN exchange_rates C ON (DATE_FORMAT(C.start_date,'%Y-%m')=DATE_FORMAT(A.`date`,'%Y-%m') AND C.currency_id=5)
LEFT JOIN exchange_rates D ON DATE_FORMAT(D.start_date,'%Y-%m')=DATE_FORMAT(A.`date`,'%Y-%m') AND D.currency_id=A.currency_id

Join tables by filling gaps with previous data

I have this table:
DATE ENGINEERS
----------------------
2014-03-06 6
2014-03-10 7
In which I register when the number of engineers change. For example, in this case I had 6 engineers, but on the 10th March, I hired one more, so I have 7 from then onwards.
I have another table with the appointments per day
DATE APPOINTMENTS
-------------------------
2014-03-06 4
2014-03-07 5
2014-03-10 5
2014-03-11 6
How can I get a view like this, which combines the appointments and the number of engineers per day?
DATE APPOINTMENTS ENGINEERS
--------------------------------------
2014-03-06 4 6
2014-03-07 5 6
2014-03-10 5 7
2014-03-11 6 7
This is what I could do :
SELECT t2.at, t2.appointments, (#n := IFNULL(t1.engineers, #n)) FROM t2
LEFT JOIN (
SELECT t.at, t1.engineers
FROM t1
JOIN t2 t ON t1.at = t.at
) t1 ON t1.at = t2.at;
I am sure there is something better out there, as the redondant JOIN could not be needed, but I could not find it.
It actually use a sql variable to get the last value if there is no corresponding entry in the engineers table.
Don't forget to run SET #n = 0;.
The corresponding sqlfiddle.

MySQL Select Last n Rows For List of ID'S

Fixture Table
uid home_uid away_uid winner date season_division_uid
1 26 6 6 2013-07-30 18
2 8 21 8 2013-06-30 18
3 6 8 8 2013-06-29 18
4 21 26 21 2013-05-20 18
5 6 26 6 2013-04-19 18
This table contains hundreds of rows.
Currently I have a query to select all the teams in a division, i.e.
SELECT team_uid
FROM Season_Division_Team
WHERE season_division_uid='18'
which lists the rows of team uid's i.e. [6,26,8,21,26].
Now for each of the unique team ids, I would like to return the last 3 winner values, ordered by the date column, that they were involved in (they could be an away_uid or home_uid).
So the returned value example would be:
team_id winner date
6 6 2013-07-30
6 8 2013-06-29
6 26 2013-04-19
26 6 2013-07-30
26 21 2013-05-20
26 6 2013-04-19
Any ideas? Thank you
Im not sure how to get it direct, a query like
select * from Season_division_Team where
`date >= (select min(`date`) from
(select `date` from season_division_team order by date desc limit 3))
and (home_uid = 6 or away_uid = 6)
Thats not going to be a good query. But only way i can think of currently
Its hard to get the 3rd largest value from SQL Example
the sub query is trying to get the date where the last win occured, and then getting all dates after that where the team played.
EDIT:
SELECT * FROM Season_Division_Team WHERE winner = 6 ORDER BY `date` DESC LIMIT 3
that sounds more like your latter comment

How to calculate a moving 4 week average every week in MySQL

I have a table something like this.
count | date
------------------
1 2012-01-01
4 2012-01-01
5 2012-01-02
12 2012-01-03
7 2012-01-04
4 2012-01-05
19 2012-01-06
1 2012-01-07
etc...
I'm looking for a way to calculate the average count per week over the previous 4 week period for each week.
The results should be something like...
avg | yearweek
------------------
3 201201
5 201202
6 201203
1 201204
11 201205
3 201206
18 201207
12 201208
etc...
...where each yearweek is the weekly average over the past 4 yearweeks.
Getting the weekly averages is simple enough but how do I then get that over the past 4 yearweeks? And then how to do I do that as a rolling average? Am I better off just doing this in code?
While you could certainly do this in the code of your application, if you really need to do it in SQL, you could first create a table of results aggregated by week and then join it to itself to get the 4-week moving average.
In doing so, instead of storing the averages, I would store the sums and the number of days (1st or last week of year might not have 7 days - thinking of the edge cases). That way, you would avoid calculating unweighted averages when the denominators of averages are different.
So let's say you have a table "weekly_results", which has fields: yearweek, sumcount, numdays. You can now self-join to the last 4 weeks and get the sums and counts, and then calculate the averages from that:
SELECT yearweek, sum_cnt/sum_dys as avg_moving_4wk
FROM (
SELECT a.yearweek, sum(b.sumcount) as sum_cnt, sum(b.numdays) as sum_dys
FROM weekly_results a
join weekly_results b
on a.yearweek - b.yearweek <4 and a.yearweek - b.yearweek >=0
GROUP BY a.yearweek
) t1
GROUP BY yearweek