MySQL calculate moving average of N rows - mysql

I'm trying to calculate the moving average of N rows, for all rows in a single query. In the example case, I am attempting to calculate the moving average of 50 rows.
SELECT
h1.date,
h1.security_id,
( SELECT
AVG(last50.close)
FROM (
SELECT h.close
FROM history as h
WHERE h.date <= h1.date AND h.security_id = h1.security_id
ORDER BY h.date DESC
LIMIT 50
) as last50
) as avg50
FROM history as h1
However, MySQL gives me an error when running this query:
Unknown column 'h1.date' in 'where clause'
I'm trying this method because the other solutions listed don't really seem to work for my use case. There are solutions for a moving average of N days, but since all dates are not accounted for in my data set, I need the average of N rows.
This solution, shown below, doesn't work because AVG (also SUM and COUNT) doesn't account for LIMIT:
SELECT
t1.data_date
( SELECT SUM(t2.price) / COUNT(t2.price)
FROM t as t2
WHERE t2.data_date <= t1.data_date
ORDER BY t2.data_date DESC
LIMIT 5
) AS 'five_row_moving_average_price'
FROM t AS t1
ORDER BY t1.data_date;
This question looks promising, but is somewhat indecipherable to me.
Any suggestions? Here's an SQLFiddle to play around in.

plan
self join history on last 50 days
take average grouping by date and security id ( of current )
query
select curr.date, curr.security_id, avg(prev.close)
from history curr
inner join history prev
on prev.`date` between date_sub(curr.`date`, interval 49 day) and curr.`date`
and curr.security_id = prev.security_id
group by 1, 2
order by 2, 1
;
output
+---------------------------+-------------+--------------------+
| date | security_id | avg(prev.close) |
+---------------------------+-------------+--------------------+
| January, 04 2016 00:00:00 | 1 | 10.770000457763672 |
| January, 05 2016 00:00:00 | 1 | 10.800000190734863 |
| January, 06 2016 00:00:00 | 1 | 10.673333485921225 |
| January, 07 2016 00:00:00 | 1 | 10.59250020980835 |
| January, 08 2016 00:00:00 | 1 | 10.432000160217285 |
| January, 11 2016 00:00:00 | 1 | 10.40166680018107 |
| January, 12 2016 00:00:00 | 1 | 10.344285828726631 |
| January, 13 2016 00:00:00 | 1 | 10.297500133514404 |
| January, 14 2016 00:00:00 | 1 | 10.2877779006958 |
| January, 04 2016 00:00:00 | 2 | 56.15999984741211 |
| January, 05 2016 00:00:00 | 2 | 56.18499946594238 |
| .. | .. | .. |
+---------------------------+-------------+--------------------+
sqlfiddle
reference
sql rolling averages
modified to use last 50 rows
select
rnk_curr.`date`, rnk_curr.security_id, avg(rnk_prev50.close)
from
(
select `date`, security_id,
#row_num := if(#lag = security_id, #row_num + 1,
if(#lag := security_id, 1, 1)) as row_num
from history
cross join ( select #row_num := 1, #lag := null ) params
order by security_id, `date`
) rnk_curr
inner join
(
select date, security_id, close,
#row_num := if(#lag = security_id, #row_num + 1,
if(#lag := security_id, 1, 1)) as row_num
from history
cross join ( select #row_num := 1, #lag := null ) params
order by security_id, `date`
) rnk_prev50
on rnk_curr.security_id = rnk_prev50.security_id
and rnk_prev50.row_num between rnk_curr.row_num - 49 and rnk_curr.row_num
group by 1,2
order by 2,1
;
sqlfiddle
note
the if function is to force the correct order of evaluation of variables.

In mysql 8 window function frame can be used to obtain the averages.
SELECT date, security_id, AVG(close) OVER (PARTITION BY security_id ORDER BY date ROWS 49 PRECEDING) as ma
FROM history
ORDER BY date DESC
This calculates the average of the current row and 49 preceding rows.

Related

MySQL query to return n'th lag from latest observation

I have following MySQL table:
firm | Sales | year
A | 100 | 2018
A | 200 | 2017
A | 300 | 2016
B | 400 | 2017
B | 500 | 2016
B | 600 | 2015
C | 700 | 2016
C | 800 | 2015
C | 900 | 2014
I am trying to write MySQL query that will return last observation (Sales) or n'th lag from last observation, for every group (firm).
I have found MySQL query to return last observation for every group:
select *
from (select * from mytable order by `Group`, firm, datum desc) x
group by `Group`
but I don't know how to modify the code to return one lag from latest observation, that is:
firm | Sales | year
A | 200 | 2017
B | 500 | 2016
C | 800 | 2015
I think the most general method is to enumerate the values using variables:
select t.*
from (select t.*,
(#rn := if(#f = t.firm, #rn + 1,
if(#f := t.firm, 1, 1)
)
) as rn
from mytable t cross join
(select #f := '', #rn := 0) params
order by t.firm, t.year desc
) t
where rn = 2;
Your version has a fatal flaw: it is using group by with select *. You have unaggregated columns in the select that are not group by keys. This is broken SQL and will fail in almost any database and using the default settings in the more recent versions of MySQL.
If the intention is that "n" is the number of years before the latest year (well, offset by 1), then joins can work:
select t.*
from mytable t join
(select firm, max(year) as max_year
from mytable
group by firm
) f
on t.firm = f.firm and t.year = f.max_year - (2 - 1);
It would be easier to use LAG with the Windowing Functions introduced in MySQL 8

Get the count() where created_date is cumulative and date based

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on
Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |
This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |
Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2

Never decreasing cumulative figure

CREATE TABLE `test` (
`UniqueID` INT(11) NOT NULL AUTO_INCREMENT,
`Date` date,
`Entry` VARCHAR(20),
PRIMARY KEY (`UniqueID`)
);
INSERT INTO `test` (Date,Entry) VALUES
('2015-09-01','text1'),
('2015-09-01','text1'),
('2015-09-01','text1'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-03','text3'),
('2015-09-03','text3'),
('2015-09-03','text3'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4');
SET #total:= 0;
SET #prevCount:= 0;
SELECT
#total:= IF (#prevCount <= COUNT(Entry),#total + (COUNT(Entry) - #prevCount),#total) AS total,
#prevCount := COUNT(Entry) AS dayTotal,
`Entry`,
`Date`
FROM test
GROUP BY `Date`
ORDER BY `Date` ASC
| total | dayTotal | Entry | Date |
|-------|----------|-------|-----------------------------|
| 3 | 3 | text1 | September, 01 2015 00:00:00 |
| 4 | 4 | text2 | September, 02 2015 00:00:00 |
| 3 | 3 | text3 | September, 03 2015 00:00:00 |
| 6 | 6 | text4 | September, 04 2015 00:00:00 |
fiddle of same: http://sqlfiddle.com/#!9/d9031/2
I need the total figure to never decrease because it is a cumulative figure over time.
My problem seems to be that MySQL doesn't store #prevCount on the loop - so I can't use it to calculate the total.
What I expect to see is that total will show
3
4
4
7
Note that the 7 is correct because it is the 4 plus the 3 new entries on the 4th.
Doing calculations with variables is tricky. With group by, you need to use a subquery.
Your logic doesn't make full sense to me. The closest reasonable thing I can think of is a cumulative max:
SELECT #max := if(#max > dayTotal, #max, dayTotal)
FROM (SELECT `Date`, COUNT(*) as dayTotal
FROM test
GROUP BY `Date`
) t CROSS JOIN
(SELECT #max := 0) params
ORDER BY `Date` ASC;
Note: I removed Entry because it is not in the GROUP BY.

Finding count for a Period in sql

I have a table with :
user_id | order_date
---------+------------
12 | 2014-03-23
12 | 2014-01-24
14 | 2014-01-26
16 | 2014-01-23
15 | 2014-03-21
20 | 2013-10-23
13 | 2014-01-25
16 | 2014-03-23
13 | 2014-01-25
14 | 2014-03-22
A Active user is someone who has logged in last 12 months.
Need output as
Period | count of Active user
----------------------------
Oct-2013 - 1
Jan-2014 - 5
Mar-2014 - 10
The Jan 2014 value - includes Oct -2013 1 record and 4 non duplicate record for Jan 2014)
You can use a variable to calculate the running total of active users:
SELECT Period,
#total:=#total+cnt AS `Count of Active Users`
FROM (
SELECT CONCAT(MONTHNAME(order_date), '-', YEAR(order_date)) AS Period,
COUNT(DISTINCT user_id) AS cnt
FROM mytable
GROUP BY Period
ORDER BY YEAR(order_date), MONTH(order_date) ) t,
(SELECT #total:=0) AS var
The subquery returns the number of distinct active users per Month/Year. The outer query uses #total variable in order to calculate the running total of active users' count.
Fiddle Demo here
I've got two queries that do the thing. I am not sure which one's the fastest. Check them aginst your database:
SQL Fiddle
Query 1:
select per.yyyymm,
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(per.yyyymm - INTERVAL 1 YEAR) and o.order_date < per.yyyymm + INTERVAL 1 MONTH) as `count`
from
(select DISTINCT LAST_DAY(order_date) + INTERVAL 1 DAY - INTERVAL 1 MONTH as yyyymm
from orders) per
order by per.yyyymm
Results:
| yyyymm | count |
|---------------------------|-------|
| October, 01 2013 00:00:00 | 1 |
| January, 01 2014 00:00:00 | 5 |
| March, 01 2014 00:00:00 | 6 |
Query 2:
select DATE_FORMAT(order_date, '%Y-%m'),
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(LAST_DAY(o1.order_date) + INTERVAL 1 DAY - INTERVAL 13 MONTH) and
o.order_date <= LAST_DAY(o1.order_date)) as `count`
from orders o1
group by DATE_FORMAT(order_date, '%Y-%m')
Results:
| DATE_FORMAT(order_date, '%Y-%m') | count |
|----------------------------------|-------|
| 2013-10 | 1 |
| 2014-01 | 5 |
| 2014-03 | 6 |
The best thing I could do is this:
SELECT Date, COUNT(*) as ActiveUsers
FROM
(
SELECT DISTINCT userId, CONCAT(YEAR(order_date), "-", MONTH(order_date)) as Date
FROM `a`
ORDER BY Date
)
AS `b`
GROUP BY Date
The output is the following:
| Date | ActiveUsers |
|---------|-------------|
| 2013-10 | 1 |
| 2014-1 | 4 |
| 2014-3 | 4 |
Now, for every row you need to sum up the number of active users in previous rows.
For example, here is the code in C#.
int total = 0;
while (reader.Read())
{
total += (int)reader['ActiveUsers'];
Console.WriteLine("{0} - {1} active users", reader['Date'].ToString(), reader['ActiveUsers'].ToString());
}
By the way, for the March of 2014 the answer is 9 because one row is duplicated.
Try this, but thise doesn't handle the last part: The Jan 2014 value - includes Oct -2013
select TO_CHAR(order_dt,'MON-YYYY'), count(distinct User_ID ) cnt from [orders]
where User_ID in
(select User_ID from
(select a.User_ID from [orders] a,
(select a.User_ID,count (a.order_dt) from [orders] a
where a.order_dt > (select max(b.order_dt)-365 from [orders] b where a.User_ID=b.User_ID)
group by a.User_ID
having count(order_dt)>1) b
where a.User_ID=b.User_ID) a
)
group by TO_CHAR(order_dt,'MON-YYYY');
This is what I think you are looking for
SET #cnt = 0;
SELECT Period, #cnt := #cnt + total_active_users AS total_active_users
FROM (
SELECT DATE_FORMAT(order_date, '%b-%Y') AS Period , COUNT( id) AS total_active_users
FROM t
GROUP BY DATE_FORMAT(order_date, '%b-%Y')
ORDER BY order_date
) AS t
This is the output that I get
Period total_active_users
Oct-2013 1
Jan-2014 6
Mar-2014 10
You can also do COUNT(DISTINCT id) to get the unique Ids only
Here is a SQL Fiddle

MySQL get records what have max value

This is my table named period.
id | year | month
222 | 2014 | 2
345 | 2013 | 5
33 | 2014 | 1
224 | 2014 | 2
I want get only id what have latest month (2014-02). Result should be 222, 224.
I wrote following query.
SELECT id, MAX(year*100 + month) FROM period
But it is returning following result.
222| 201402
How can i get my result
SELECT x.*
FROM period x
JOIN
( SELECT year
, month
FROM period
ORDER
BY year DESC
, month DESC
LIMIT 1
) y
ON y.year = x.year
AND y.month = x.month;
You should you the following query:---
SELECT id FROM period where year=(SELECT max(year) from period) and month=(SELECT max(month) from period);