Moving average in MYSQL without dates but grouped by another column - mysql

I have a table in MYSQL(version 5.7.33) which looks like shown below:
Date
SalesRep
Sale
2021-04-01
Jack
10
2021-04-02
Jack
8
2021-03-01
Lisa
10
2021-03-02
Lisa
14
2021-03-03
Lisa
21
2021-03-04
Lisa
7
2021-03-08
Lisa
10
2021-03-09
Lisa
20
2021-03-10
Lisa
15
I want the moving average of Sale column, but don't want that to be based on the dates since the dates have gap, instead I want it based on row numbers and grouped by SalesRep. So something like this:
Date
SalesRep
Sale
MoveAvg
2021-04-01
Jack
10
10
2021-04-02
Jack
8
9
2021-03-01
Lisa
10
10
2021-03-02
Lisa
14
12
2021-03-03
Lisa
21
15
2021-03-04
Lisa
7
13
2021-03-08
Lisa
10
12.4
2021-03-09
Lisa
20
13.6
2021-03-10
Lisa
15
13.8
So the moving average is for all the dates from start to finish for a particular sales rep and then it starts over for another sales rep and so on. Is this possible to do in MYSQL? Thank you in advance!

You could use avg as a window function with a frame clause for this:
SELECT dt, salesrep, sale,
AVG(sale) OVER (PARTITION BY salesrep ORDER BY dt
ROWS UNBOUNDED PRECEDING)
AS moveavg
Without window functions, you simply join all previous rows for each salesrep:
select a.dt, a.salesrep, a.sale, avg(b.sale) as moveavg
from mysterytablename a
join mysterytablename b on b.salesrep=a.salesrep and b.dt <= a.dt
group by a.salesrep, a.dt

Related

how to generate student attendance percentage per course, when they have specific day in a week

hi guys i really newbie in sql, i need help to generate percentage of attendance, here is the table:
Table Schedule
Schedule_ID Course_ID Lecture_ID Start_Date End_Date Course_Days
1 1 11 2019-09-09 2019-12-08 2,4,6
2 3 4 2019-09-09 2019-12-08 3,4
3 4 13 2019-09-09 2019-12-08 2,5
4 5 28 2019-09-09 2019-12-08 3
5 2 56 2020-01-27 2020-04-26 2,4
6 7 1 2020-01-27 2020-04-26 4,5
7 1 11 2020-01-27 2020-04-26 2,4,6
8 7 22 2020-01-27 2020-04-26 2,3
9 8 56 2020-01-27 2020-04-26 5
10 3 37 2020-01-27 2020-04-26 5,6
Reference of days of week used in this data.
1: Sunday, 2:Monday, 3:Tuesday, 4:Wednesday, 5:Thursday, 6:Friday, 7:Saturday
Table course_attendance
ID STUDENT_ID SCHEDULE_ID ATTEND_DT
1 1 2 2019-09-10
2 1 2 2019-09-11
3 1 2 2019-09-17
4 1 2 2019-09-18
......
46 2 1 2019-12-02
47 2 1 2019-09-11
48 2 1 2019-09-18
49 2 1 2019-09-25
50 2 1 2019-10-09
51 2 1 2019-10-16
....
111 6 1 2019-09-23
112 6 1 2019-09-30
113 6 1 2019-10-07
114 6 1 2019-10-14
table student
ID NAME
1 Jonny
2 Cecilia
3 Frank
4 Jones
5 Don
6 Harry
i need to show up like this :
STUDENT_ID NAME Course_ID Attendance rate
1 Jonny 1 82%
2 Cecilia 1 30%
3 Frank 3 100%
4 Jones 2 100%
5 Don 2 25%
6 Harry 4 40%
EDIT this my last step to get percentage:
result:
with main as (
select ca.STUDENT_ID,
ca.SCHEDULE_ID,
s.COURSE_ID,
co.NAME as course_name,
st.NAME,
count(ca.ID) as total_attendance,
((CHAR_LENGTH(s.COURSE_DAYS) - CHAR_LENGTH(REPLACE(s.COURSE_DAYS , ',', '')) + 1) * 13) as attendance_needed
from univ.course_attendance ca
left join univ.schedule s on ca.SCHEDULE_ID = s.ID
left join univ.student st on ca.SCHEDULE_ID = st.ID
left join univ.course co on ca.SCHEDULE_ID = co.ID
group by ca.STUDENT_ID, ca.SCHEDULE_ID
)
select *,total_attendance/attendance_needed as attendance_percentage
from main
order by 1,2;
This can be done following three steps.
Step 1: Calculate the total number of days a particular course of a schedule has. It's a good thing the start_date is always on Monday and the end_date is always on Sunday, which makes the week complete and saves some trouble. By calculating the total number of weeks a course go through and the number of days a week has for that course, we can get the total number of days a particular course of a schedule has.
Step 2:Calculate the total number of days a student for a schedule. This is done fairly easily. Note: As the majority part of the table has been skipped and the OP has yet to provide the complete data set, I could only have 14 existing rows provided.
Step 3: Calculate the percentage for the attendance using the result from the above two steps and get other required columns.
Here is the complete statement I wrote and tested in workbench:
select t2.student_id as student_id,`name`,course_id, (t2.total_attendance/t1.total_course_days)*100 as attendance_rate
from (select schedule_id,course_id,
length(replace(course_days,',',''))*(week(end_date)-week(start_date)) as total_course_days
from Schedule) t1
JOIN
(select count(attend_dt) as total_attendance,student_id,schedule_id
from course_attendance group by student_id, schedule_id) t2
ON t1.schedule_id=t2.schedule_id
JOIN
student s
ON t2.student_id=s.id;
Here is the result set ( the attendance_rate is not nice due to the abridged course_attendance table):
student_id, name, course_id, attendance_rate
2, Cecilia, 1, 15.3846
6, Harry, 1, 10.2564
1, Jonny, 3, 15.3846

Mysql: Calculating running average for each Employee by day Monthly

I have this table that has the name of the employee and their phone time duration in mysql. The table looks like this:
Caller Emplid Calldate Call_Duration
Jack 333 1/1/2016 43
Jack 333 1/2/2016 45
Jack 333 1/3/2016 87
Jack 333 2/4/2016 44
Jack 333 2/5/2016 234
jack 333 2/6/2016 431
Jeff 111 1/1/2016 23
Jeff 111 1/2/2016 54
Jeff 111 1/3/2016 67 48
I am trying to calculate the running Daily average of each employee total_Duration by day each month. Suppose I have daily running average for the month of April, then the running average for the May should start from 1st of may and end on 31st of that month. I have tried doing many ways and mysql does not have pivot and partition function like sql server. The total employee who made the call changes daily, I need something that dynamically takes care of no of employees that makes call.
The output should look like this:
Caller Emplid Calldate Call_Duration Running_avg
Jack 333 1/1/2016 43 43
Jack 333 1/2/2016 45 44
Jack 333 1/3/2016 87 58.33333333
Jack 333 2/4/2016 44 44
Jack 333 2/5/2016 234 139
Jack 333 2/6/2016 431 236.3333333
Jeff 111 1/1/2016 23 23
Jeff 111 1/2/2016 54 38.5
Jeff 111 1/3/2016 67 48
This is the query that I started below:
SELECT row_number,Month_Year,Callername,Calldate,Caller_Emplid,`Sum of Call`,`Sum of Call`/row_number as AvgCall,
#`sum of call`:=#`sum of call`+ `sum of call` overallCall,
#row_number:=row_number overallrow_number,
#RunningTotal:=#`sum of call`/#row_number runningTotal
FROM
(SELECT
#row_number:=
CASE WHEN #CallerName=CallerName and date_format(calldate,'%d') = date_format(calldate,'%d') and
date_format(calldate,'%m') = date_format(calldate,'%m')
THEN #row_number+1
ELSE 1 END AS row_number,#CallerName:=CallerName AS Callername,Calldate,Caller_Emplid,Month_Year,`Sum of Call`
FROM transform_data_2, (SELECT #row_number:=0,#CallerName:='') AS t
ORDER BY callername) a
JOIN (SELECT #`Sum of call`:= 0) t

How to create a SQL query that calculate monthly grow in population

I want to create a SQL query that count the number of babies born in month A, then it should count the babies born in month B but the second record should have the sum of month A plus B. For example;
Month | Number
--------|---------
Jan | 5
Feb | 7 <- Here were 2 babies born but it have the 5 of the previous month added
Mar | 13 <- Here were 6 babies born but it have the 7 of the two previous months added
Can somebody maybe please help me with this, is it possible to do something like this?
I have a straight forward table with babyID, BirthDate, etc.
Thank you very much
Consider using a subquery that calculates a running count. Both inner and outer query would be aggregate group by queries:
Using the following sample data:
babyID Birthdate
1 2015-01-01
2 2015-01-15
3 2015-01-20
4 2015-02-01
5 2015-02-03
6 2015-02-21
7 2015-03-11
8 2015-03-21
9 2015-03-27
10 2015-03-30
11 2015-03-31
SQL Query
SELECT MonthName(BirthDate) As BirthMonth, Count(*) As BabyCount,
(SELECT Count(*) FROM BabyTable t2
WHERE Month(t2.BirthDate) <= Month(BabyTable.BirthDate)) As RunningCount
FROM BabyTable
GROUP BY Month(BirthDate)
Output
BirthMonth BabyCount RunningCount
January 3 3
February 3 6
March 5 11

Select rows from last existing 12 months

There's a DATETIME column called time. How could I select all rows that fall within the last existing 12 months (NOT within the last year from today)? Not every month might have a row, and months may have more than one row.
For example, out of this table (ORDER BY time DESC), rows with ids 2 to 17 would be selected.
id time
-- ----
17 2015-04-01
16 2015-04-01
15 2015-03-01
14 2015-02-01
13 2015-01-01
12 2014-12-01
11 2014-11-01
10 2014-10-01
9 2013-12-01
8 2013-11-01
7 2013-10-01
6 2013-09-01
5 2013-09-01
4 2013-09-01
3 2013-09-01
2 2013-08-01
1 2013-07-01
Another way to put this:
Take the table above and group by month/year, so we get:
2015-04
2015-03
2015-02
2015-01
2014-12
2014-11
2014-10
2013-12
2013-11
2013-10
2013-09
2013-08
2013-07
Now take the 12 most recent months from this list, which is everything except 2013-07.
2015-04
2015-03
2015-02
2015-01
2014-12
2014-11
2014-10
2013-12
2013-11
2013-10
2013-09
2013-08
And select everything from those months.
I guess I could do this with multiple queries or subqueries but is there another way to do this?
If your time field is only month-precision, you could do it with a pretty simple subselect:
SELECT * FROM Table t1
WHERE time IN (
SELECT DISTINCT time FROM Table t2 ORDER BY time DESC LIMIT 12
)
If your timestamps are full-precision, you could do the same thing, but you'd need to do some date manipulation to round the dates to the month for comparison.

MySQL self join to get past averages

I am trying to find an average of past records in the database based on a specific time frame (between 9 and 3 months ago) if there is no value recorded for a recent sale. the reason for this is recent sales on our website sometimes do not immediately collect commissions so i am needing to go back to historic records to find out what a commission rate estimate might be.
Commission rate is calculated as:
total_commission / gross_sales
It is only necessary to find out what an estimate would be if a recent sale has no "total_commission" recorded
here is what i have tried so far but i think this is wrong:
SELECT
cs.*
,SUM(cs2.gross_sales)
,SUM(cs2.total_commission)
FROM
(SELECT
sale_id
, date
, customer_code
, customer_country
, gross_sales
, total_commission
FROM customer_sale cs ) cs
LEFT JOIN customer_sale cs2
ON cs2.customer_code = cs.customer_code
AND cs2.customer_country = cs.customer_country
AND cs2.date > cs.date - interval 9 month
AND cs2.date < cs.date - interval 3 month
GROUP BY cs.sale_id
so that data would be structured as follows:
sale_id date customer_code customer_country gross_sales total_commission
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000
8 2014-04-02 cust2 france 20000
9 2014-04-02 cust3 united states 15000
10 2014-04-02 cust4 france 14000
11 2014-04-02 cust5 united states 13000
12 2014-04-02 cust6 france 12000
so I would need to output results from the query similar to this: (based on sales between 9 and 3 months ago from the same customer_code in the same customer_country)
sale_id date customer_code customer_country gross_sales total_commission gross_sales_past total_commission_past
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000 10000 1500
8 2014-04-02 cust2 france 20000 20000 3000
9 2014-04-02 cust3 united states 15000 15000 2250
10 2014-04-02 cust4 france 14000 14000 2100
11 2014-04-02 cust5 united states 13000 13000 1950
12 2014-04-02 cust6 france 12000 12000 1800
Your query looks mostly right, but I think your outer query needs to be GROUP BY cs.sale_id (assuming that sale_id is unique in the customer_sale table, and assuming that the date column is datatype DATE, DATETIME, or TIMESTAMP).
And I think you want to include a join predicate so that you match only match "past" rows to those rows where you don't have a total commission, e.g.
AND cs.total_commission IS NULL
And I don't think you really need an inline view.
Here's what I came up with:
SELECT cs.sale_id
, cs.date
, cs.customer_code
, cs.customer_country
, cs.gross_sales
, cs.total_commission
, SUM(ps.gross_sales) AS gross_sales_past
, SUM(ps.total_commission) AS total_commission_past
FROM customer_sale cs
LEFT
JOIN customer_sale ps
ON ps.customer_code = cs.customer_code
AND ps.customer_country = cs.customer_country
AND ps.date > cs.date - INTERVAL 9 MONTH
AND ps.date < cs.date - INTERVAL 3 MONTH
AND cs.total_commission IS NULL
GROUP
BY cs.sale_id
Appropriate indexes will likely improve performance of the query. Likely, the EXPLAIN output will show "Using temporary; Using filesort", and that can be expensive for large sets.
MySQL will likely be able to make use of a covering index for the JOIN:
... ON customer_sale (customer_code,customer_country,date,gross_sales,total_commission).