Calculating a Moving Average MySQL? - mysql

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?

If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.

Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|

Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present

an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate

This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.

Related

How to find maximum time range collision occurencies in Mysql

I have a time range entity with start and end datetime column.
I need to find the maximum occurrencies (count) of overlapping the same time slot.
In the example above, the count is 4.
https://www.db-fiddle.com/f/pcq1MjQeqSEMDdyGxkFsR5/0
Probably I need some recurring query but I don't know how to start.
For MySQL 5.x:
SELECT SUM(points2.weight) max_weight
FROM (
SELECT start dt FROM slots
UNION DISTINCT
SELECT `end` FROM slots
) points1
JOIN (
SELECT dt, SUM(weight) weight
FROM (
SELECT start dt, 1 weight FROM slots
UNION ALL
SELECT `end`, -1 FROM slots
) points
GROUP BY dt
) points2 ON points1.dt >= points2.dt
GROUP BY points1.dt
ORDER BY max_weight DESC LIMIT 1
https://dbfiddle.uk/f0b56Q4X (step-by-step, with comments)

Getting total average between dates

I have a table named sales with the following format.
sale_id user_id sale_date sale_cost
j847bv-6ggd bd48ta36-cn5x 2017-01-10 15:43:12 30
vf87x2-15gr bd48ta36-cn5x 2017-01-05 13:41:16 60
3gfd7f-2cdd 8g4f5ccf-1fet 2017-01-15 14:10:12 100
4bgfd5-12vn 8g4f5ccf-1fet 2017-01-20 19:47:14 20
b58e32-bf87 8g4f5ccf-1fet 2017-01-20 17:35:13 15
bg87db-127g gr4gg1f4-3gbb 2017-01-20 12:26:15 80
How could I get the average amount that a user (user_d) spends within the first X amount of days since their first purchase? I don't want an average for every user, but a total average for all.
I know that I can get the average using select avg(sale_cost) but I'm unsure how to find out the average for a date period.
You can find average of total for each user within 10 days date range from intial sales date like this:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select user_id, min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
group by user_id
) t2 on t.user_id = t2.user_id
and t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
It finds the first sale_date and date 10 days after this for each user. Then joins it with the table to get total for each user within that range and then finally average of the above calculated totals.
Demo
If you want to find the average between overall first sale_date (not individual) and 10 days from it, use:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
) t2 on t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
Demo
The between operator comes in handy whenever it comes to checking ranges
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
In this case value1 and value2 will be replaced by your dates using:
'2011-01-01 00:00:00' AND '2011-01-31 23:59:59'
or
sale_date AND DATE_ADD(OrderDate,INTERVAL 10 DAY)
The first way is faster and also the between values are inclusive.

MySQL Select where column greater than or equal to closest past date from given date

TABLE
Table:
Id Date
1 01-10-15
2 01-01-16
3 01-03-16
4 01-06-16
5 01-08-16
Given two dates startdate 01-02-16 and enddate 01-05-16. I need to get the data from the table such that it returns all data between the closest past date from startdate and closest future date from enddate including the two dates. So the result will look like this.
Result:
Id Date
2 01-01-16
3 01-03-16
4 01-06-16
What I am doing
What I am doing now is fetching the whole data and removing from the array results less than closest fromdate and greater than closest enddate
What I want
What I want is to do this in query itself so that I don't have to fetch the whole data from table each time.
If you column's type is date, use union can do it:
(select * from yourtable where `date` <= '2016-01-02' order by `date` desc limit 1)
-- This query will get record which is closest past date from startdate
union
(select * from yourtable where `date` => '2016-01-05' order by `date` asc limit 1)
-- This query will get record which is closest future date from enddate
union
(select * from yourtable where `date` between '2016-01-02' and '2016-01-05')
Demo Here
Imaging your date is in YYYY-mm-dd
## get rows within the dates
SELECT * FROM tab WHERE ymd BETWEEN :start_date AND :end_date
## get one row closest to start date
UNION
SELECT * FROM tab WHERE ymd < :start_date ORDER BY ymd DESC LIMIT 1
## get one row closest to end date
UNION
SELECT * FROM tab WHERE ymd > :end_date ORDER BY ymd LIMIT 1
Try this
Select *
From
dTable
Where
[Date]
Between
(Select
Max(t1.Date)
From
dTable t1
Where
t1.date <startdate) And
(Select
Min(t2.Date)
From
dTable t2
Where
t2.date >enddate)
If Date is String, STR_TO_DATE and DATEDIFF can be used here.
SELECT id, Date
FROM tab
where
STR_TO_DATE(Date, '%d-%m-%y') BETWEEN('2016-02-01')AND('2016-05-01')
or
id = (SELECT id FROM tab
where STR_TO_DATE(Date, '%d-%m-%y') > '2016-05-01'
ORDER BY DATEDIFF(STR_TO_DATE(Date, '%d-%m-%y'), '2016-05-01') Limit 1)
or
id = (SELECT id FROM tab
where STR_TO_DATE(Date, '%d-%m-%y') < '2016-02-01'
ORDER BY DATEDIFF('2016-02-01', STR_TO_DATE(Date, '%d-%m-%y')) Limit 1)

MySQL - get min/max of consecutive events in a series of rows

I have a table that looks like this:
http://sqlfiddle.com/#!9/152d2/1/0
CREATE TABLE Table1 (
id int,
value decimal(10,5),
dt datetime,
threshold_id int
);
Current Query:
SELECT sensors_id, DATE_FORMAT(datetime, '%Y-%m-%d'), MIN(value), MAX(value)
FROM Readings
WHERE datetime < "2015-11-18 00:00:00"
AND datetime > "2015-10-18 00:00:00"
AND sensors_id = 9
GROUP BY DATE_FORMAT(datetime, '%Y-%m-%d')
ORDER BY datetime DESC
What I'm trying to do is to return the min/max value in each group, where threshold_id IS NOT NULL. Therefore, the example should return something like:
min_value | max_value | start_date | end_date
9 | 10.5 | 2015-07-29 10:52:31 | 2015-07-29 10:57:31
8.5 | 9.5 | 2015-07-29 11:03:31 | 2015-07-29 11:05:31
I can't work out how to do this grouping. I need to return the min/max for each group of consecutive rows where the threshold_id IS NOT NULL.
Use user variables to compare existing value to the previous value and increment a column you can use to group by,tested on my machine.
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt)
FROM (
SELECT id,value,dt,
CASE WHEN COALESCE(threshold_id,'')=#last_ci THEN #n ELSE #n:=#n+1 END AS g,
#last_ci := COALESCE(threshold_id,'') As th
FROM
Table1, (SELECT #n:=0) r
ORDER BY
id
) s
WHERE th!=''
GROUP BY
g
For mysql 8 this could be rewritten as below.Use a CTE to get different sequences and GROUP By the difference between them.
WITH cte as (
SELECT *,
ROW_NUMBER() OVER (ORDER BY id)as rn,
ROW_NUMBER() OVER (PARTITION BY threshold_id ORDER BY id)as rnn
FROM Table1
ORDER BY id
)
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt) FROM cte WHERE threshold_id IS NOT NULL GROUP BY rn-rnn
MYSQL8
FIDDLE
Your sample data only includes a single day's worth, so you only get a single row back (assuming you want to group by day):
SELECT DAYOFYEAR(dt) `day`, MIN(`value`) min_value, MAX(`value`) max_value
FROM Table1
GROUP BY `day`
ORDER BY `day` ASC

specific status on consecutive days

I have a MySQL table ATT which has EMP_ID,ATT_DATE,ATT_STATUS with ATT_STATUS with different values 1-Present,2-Absent,3-Weekly-off. I want to find out those EMP_ID's which have status 2 consecutively for 10 days in a given date range.
Please help
Please have a try with this:
SELECT EMP_ID FROM (
SELECT
IF((#prevDate!=(q.ATT_DATE - INTERVAL 1 DAY)) OR (#prevEmp!=q.EMP_ID) OR (q.ATT_STATUS != 2), #rownum:=#rownum+1, #rownum:=#rownum) AS rownumber, #prevDate:=q.ATT_DATE, #prevEmp:=q.EMP_ID, q.*
FROM (
SELECT
EMP_ID
, ATT_DATE
, ATT_STATUS
FROM
org_tb_dailyattendance, (SELECT #rownum:=0, #prevDate:='', #prevEmp:=0) vars
WHERE ATT_DATE BETWEEN '2013-01-01' AND '2013-02-15'
ORDER BY EMP_ID, ATT_DATE, ATT_STATUS
) q
) sq
GROUP BY EMP_ID, rownumber
HAVING COUNT(*) >= 10
The logic is, to first sort the table by employee id and the dates. Then introduce a rownumber which increases only if
the days are not consecutive or
the employee id is not the previous one or
the status is not 2
Then I just grouped by this rownumber and counted if there are 10 rows in each group. That should be the ones who were absent for 10 days or more.
Have you tried something like this
SELECT EMP_ID count(*) as consecutive_count min(ATT_DATE)
FROM (SELECT * FROM ATT ORDER BY EMP_ID)
GROUP BY EMP_ID, ATT_DATE
WHERE ATT_STATUS = 2
HAVING consecutive_count > 10