mysql USE CASE STATEMENT as variable - mysql

I have the following query with a quite large:
SELECT
DATE(added_on) 'Week Of',
COUNT(*) 'No. Updates',
(CASE WHEN COUNT(*) <= 500 THEN 6.75 WHEN COUNT(*) <= 750
THEN 6.30 WHEN COUNT(*) <= 1000 THEN 6.00 WHEN COUNT(*) <= 1250
THEN 5.50 ELSE 4.60 END
) Rate
Rate * COUNT(*) // HOW TO DO THIS??
FROM
Fox_title
GROUP BY
WEEK(added_on)
ORDER BY
added_on
How would I multiple the COUNT(*) * the Rate that I have from my CASE statement? Or do I have to write that CASE statement again?

Either repeat the case or use a subquery:
select t.*, t.Rate * `No. Updates`
from (SELECT DATE(min(added_on)) as `Week Of`, COUNT(*) as `No. Updates`,
(CASE WHEN COUNT(*) <= 500 THEN 6.75
WHEN COUNT(*) <= 750 THEN 6.30
WHEN COUNT(*) <= 1000 THEN 6.00
WHEN COUNT(*) <= 1250 THEN 5.50
ELSE 4.60
END) as Rate
FROM Fox_title
GROUP BY WEEK(added_on)
) t
ORDER BY `Week Of`;
I made a few other changes to your query. First, I changed the single quotes around the column aliases to back ticks. Single quotes should be used, in general, only for string constants. Back ticks are the MySQL method for enclosing identifiers.
I also changed date(addon) to date(min(addon)). This ensures that you will get the earliest date in the week. Otherwise, you get an arbitrary date.

Related

SQL Combining Multiple SELECT Statements

I am trying to build an SQLite query that will collect statistics from a single table.
The table holds a log, of sorts, with several entries per day. I need to get a separate row for each day within the search parameters and then compile the totals of rows within those dates with certain boolean values.
Here is the query I have so far:
SELECT DATE(DateTime) AS SearchDate,
(SELECT COUNT() AS Total
FROM CallRecords
WHERE DATE(DateTime)
BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY DATE(DateTime)
ORDER BY Total DESC) AS Total,
(SELECT COUNT() AS Total
FROM CallRecords
WHERE NoMarket = 1
AND DATE(DateTime)
BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY DATE(DateTime)
ORDER BY Total DESC) AS NoMarkets,
(SELECT COUNT() AS Total
FROM CallRecords
WHERE Complaint = 1
AND DATE(DateTime)
BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY DATE(DateTime)
ORDER BY Total DESC) AS Complaints,
(SELECT COUNT() AS Total
FROM CallRecords
WHERE Voicemail = 1
AND DATE(DateTime)
BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY DATE(DateTime)
ORDER BY Total DESC) AS Voicemails
FROM CallRecords
WHERE DATE(DateTime) BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY SearchDate
And the output:
8/28/2017 175 27 11
8/29/2017 175 27 11
8/30/2017 175 27 11
8/31/2017 175 27 11
9/1/2017 175 27 11
As you can see, it is properly getting each individual date, but the totals for the columns is incorrect.
Obviously, I am missing something in my query, but I am not sure where. Is there a better way to perform this query?
EDIT: I have looked into several of the other questions with near-identical titles here, but I have not found anything similar to what I'm looking for. Most seem much more complicated than what I'm trying to accomplish.
It looks like you have a mess of columns in your CallRecords table with names like Complaint and Voicemail, each of which classifies a call.
It looks like those columns have the value 1 when relevant.
So this query should probably help you.
SELECT DATE(DateTime) AS SearchDate,
COUNT(*) AS Total,
SUM(NoMarket = 1) AS NoMarkets,
SUM(Complaint = 1) AS Complaints,
SUM(Voicemail = 1) AS Voicemails
FROM CallRecords
WHERE DateTime >= '2017-08-27'
AND DateTime < '2017-09-02' + INTERVAL 1 DAY
GROUP BY DATE(DateTime)
Why does this work? Because in MySQL a Boolean expression like Voicemail = 1 has the value 1 when it's true and 0 when it's false. You can sum those values up quite nicely.
Why is it faster than what you have? Because DATE(DateTime) BETWEEN this AND that can't exploit an index on DateTime.
Why is it correct for the end of your date range? Because DateTime < '2017-09-02' + INTERVAL 1 DAY pulls in all the records up until, but not including, midnight, on the day after your date range.
If you're using Sqlite, you need AND DateTime < date('2017-09-02', '+1 day'). The + INTERVAL 1 DAY stuff is slightly different there.
you can doing like this , although i wrote in SQL server
SELECT DATE(DateTime) AS SearchDate,
COUNT() AS TOTAL,
SUM(CASE WHEN NoMarket = 1 THEN 1 ELSE 0 END) AS NoMarkets,
SUM(CASE WHEN Complaint = 1 THEN 1 ELSE 0 END) AS Complaints,
SUM(CASE WHEN Voicemail = 1 THEN 1 ELSE 0 END) AS Voicemails
FROM CallRecords
WHERE DATE(DateTime) BETWEEN '2017-08-27' AND '2017-09-02'
GROUP BY SearchDate
SELECT DATE(DateTime) AS SearchDate, Total, NoMarkets, Complaints, Voicemails FROM
(SELECT COUNT() AS Total FROM CallRecords) CR
JOIN
(SELECT COUNT() AS NoMarkets FROM CallRecords WHERE NoMarket = 1) NM
ON CR.DateTime = NM.DateTime
JOIN
(SELECT COUNT() AS Complaints FROM CallRecords WHERE Complaint = 1) C
ON NM.DateTime = C.DateTime
JOIN
(SELECT COUNT() AS Voicemails FROM CallRecords WHERE Voicemail = 1) VM
ON C.DateTime = VM.DateTime
JOIN CallRecords CLR ON VM.DateTime=CLR.DateTime WHERE DATE(CLR.DateTime) >= '2017-08-27' AND DATE(CLR.DateTime) <= '2017-09-02'GROUP BY SearchDate;
This may Output correctly.

Combining database queries

How can these SQL-queries to extract statistics from my database be combined for better performance?
$total= mysql_query("SELECT COUNT(*) as number, SUM(order_total) as sum FROM history");
$month = mysql_query("SELECT COUNT(*) as number, SUM(order_total) as sum FROM history WHERE date >= UNIX_TIMESTAMP(DATE_ADD(CURDATE(),INTERVAL -30 DAY))");
$day = mysql_query("SELECT COUNT(*) as number, SUM(order_total) as sum FROM history WHERE date >= UNIX_TIMESTAMP(CURDATE())");
If you want to all the data in a single query, you have two choices:
Use a UNION query (as sugested by bishop in his answer)
Tweak a query to get what you need in a single row
I'll show option 2 (option 1 has been already covered).
Note: I'm using user variables (that stuff in the init subquery) to avoid writing the expressions again and again. Also, to filter the aggregate data, I'm using case ... end expressions.
select
-- Your first query:
count(*) as number, sum(order_total) as `sum`
-- Your second query:
, sum(case when `date` <= #prev_date then 1 else 0 end) as number_prev
, sum(case when `date` <= #prev_date then order_total else 0 end) as sum_prev
-- Your third query:
, sum(case when `date` <= #cur_date then 1 else 0 end) as number_cur
, sum(case when `date` <= #cur_date then order_total else 0 end) as sum_cur
from (
select #cur_date := unix_timestamp(curdate())
, #prev_date := unix_timestamp(date_add(curdate(), interval -30 day))
) as init
, history;
Hope this helps
Since the queries have the same column structure, you can ask MySQL to combine them with the UNION operation:
(SELECT 'total' AS kind, COUNT(*) as number, SUM(order_total) as sum FROM history~
UNION
(SELECT 'by-month' AS kind, COUNT(*) as number, SUM(order_total) as sum FROM history WHERE date <= UNIX_TIMESTAMP(DATE_ADD(CURDATE(),INTERVAL -30 DAY)))
UNION
(SELECT 'by-day' AS kind, COUNT(*) as number, SUM(order_total) as sum FROM history WHERE date <= UNIX_TIMESTAMP(CURDATE()))

interval by 4 using sql - Mysql

I've a table and i want that data is interval by 4 or, when i'm using modulo the record is not that i expected, PFB `
SELECT (DATE_FORMAT(subscribed_from, '%Y-%m')) AS date_ FROM subscription
WHERE operator = 'tim'
AND DATE_FORMAT(subscribed_from, '%Y-%m-%d') BETWEEN '2013-01-01' AND '2014-12-31'
GROUP BY (DATE_FORMAT(subscribed_from, '%Y-%m'));
it will show record like this
2013-01
2013-02
2013-03
2013-04
2013-05
2013-06
2013-07
2013-08
2013-09
i want take only data interval by 4, this below is record that i expected.
2013-01
2013-05
2013-09
2014-02
and also for interval by 2, this below record is that i expected
2013-01
2013-03
2013-05
2013-07
2013-09
if i using modulo % 2 it will start from 2013-01 and jump by 2, but the problem if the where range i want to start from 2013-02, 02 it self not showing on the result. so if the where clause the month start from 2 it will given the interval such as 2,4,6,8,10,12
SELECT date_, SUM(the_metric_you_want_to_aggregate)
FROM (
SELECT 4*FLOOR(
(DATE_FORMAT(subscribed_from, '%Y%m') - 201301)
/4) AS date_,
the_metric_you_want_to_aggregate
FROM subscription
WHERE operator = 'tim'
AND subscribed_from BETWEEN 20130101000000 AND 201412315959
) AS ilv
GROUP BY date_
(where 201301 is the year/month start of the range you are selecting by - assuming that is the reference for the 4-month aggregation)
Note that enclosing column references in functions (...DATE_FORMAT(subscribed_from, '%Y-%m-%d') BETWEEN...) prevents the use of indexes.
You have to use variables. Here is sample for interval by 4.
SET #row_number:=0;
SELECT date_ from (
SELECT (DATE_FORMAT(subscribed_from, '%Y-%m')) AS date_,#row_number:=#row_number+1 FROM subscription
WHERE operator = 'tim' AND DATE_FORMAT(subscribed_from, '%Y-%m-%d') BETWEEN '2013-01-01' AND '2014-12-31'
GROUP BY (DATE_FORMAT(subscribed_from, '%Y-%m'))
) as tbl where #row_number % 4=0;
let says i'm using this method to generate the intevals, but i want the start number is from my input, let says it start from 4 and if the condition put %4 should be the output is 4, 8 ,12 ....
enter code here
SET #row:=0;
SELECT *
FROM (
SELECT
#row := #row +1 AS rownum
FROM (
SELECT #row) r, subscription
) ranked
WHERE rownum %4 = 1

Group by half hour interval

I was lucky enough to find this awesome piece of code on Stack Overflow, however I wanted to change it up so it showed each half hour instead of every hour, but messing around with it, only caused me to ruin the query haha.
This is the SQL:
SELECT CONCAT(HOUR(created_at), ':00-', HOUR(created_at)+1, ':00') as hours,
COUNT(*)
FROM urls
GROUP BY HOUR(created_at)
ORDER BY HOUR(created_at) ASC
How would I go about getting a result every half an hour? :)
Another thing, is that, if it there is half an hour with no results, I would like it to return 0 instead of just skipping that step. It looks kinda of weird win I do statistics over the query, when it just skips an hour because there were none :P
If the format isn't too important, you can return two columns for the interval. You might even just need the start of the interval, which can be determined by:
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start
the alias can be used in GROUP BY and ORDER BY clauses. If you also need the end of the interval, you will need a small modification:
SELECT
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start,
date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i') as period_end,
COUNT(*)
FROM urls
GROUP BY period_start
ORDER BY period_start ASC;
Of course you can also concatenate the values:
SELECT concat_ws('-',
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i'),
date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i')
) as period,
COUNT(*)
FROM urls
GROUP BY period
ORDER BY period ASC;
Demo: http://rextester.com/RPN50688
Another thing, is that, if it there is half an hour with no results, I
would like it to return 0
If you use the result in a procedural language, you can initialize all 48 rows with zero in a loop and then "inject" the non-zero rows from the result.
However - If you need it to be done in SQL, you will need a table for a LEFT JOIN with at least 48 rows. That could be done inline with a "huge" UNION ALL statement, but (IMHO) it would be ugly. So I prefer to have sequence table with one integer column, which can be very usefull for reports. To create that table I usually use the information_schema.COLUMNS, since it is available on any MySQL server and has at least a couple of hundreds rows. If you need more rows - just join it with itself.
Now let's create that table:
drop table if exists helper_seq;
create table helper_seq (seq smallint auto_increment primary key)
select null
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
limit 100; -- adjust as needed
Now we have a table with integers from 1 to 100 (though right now you only need 48 - but this is for demonstration).
Using that table we can now create all 48 time intervals:
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48;
We will get the following result:
period_start | period_end
00:00:00 | 00:30:00
00:30:00 | 01:00:00
...
23:30:00 | 24:00:00
Demo: http://rextester.com/ISQSU31450
Now we can use it as a derived table (subquery in FROM clause) and LEFT JOIN your urls table:
select p.period_start, p.period_end, count(u.created_at) as cnt
from (
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48
) p
left join urls u
on time(u.created_at) >= p.period_start
and time(u.created_at) < p.period_end
group by p.period_start, p.period_end
order by p.period_start
Demo: http://rextester.com/IQYQ32927
Last step (if really needed) is to format the result. We can use CONCAT or CONCAT_WS and TIME_FORMAT in the outer select. The final query would be:
select concat_ws('-',
time_format(p.period_start, '%H:%i'),
time_format(p.period_end, '%H:%i')
) as period,
count(u.created_at) as cnt
from (
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48
) p
left join urls u
on time(u.created_at) >= p.period_start
and time(u.created_at) < p.period_end
group by p.period_start, p.period_end
order by p.period_start
The result would look like:
period | cnt
00:00-00:30 | 1
00:30-01:00 | 0
...
23:30-24:00 | 3
Demo: http://rextester.com/LLZ41445
Switch to seconds.
Do arithmetic to get a number for each unit of time (using 30*60 for half-hour, in your case)
Have a table of consecutive numbers.
Use LEFT JOIN to get even missing units of time.
Do the GROUP BY.
Convert back from units of time to actual time -- for display.
(Steps 3 and 4 are optional. The question says "every", so I assume they are needed.)
Steps 1 and 2 are embodied in something like
FLOOR(UNIX_TIMESTAMP(created_at) / (30*60))
For example:
mysql> SELECT NOW(), FLOOR(UNIX_TIMESTAMP(NOW()) / (30*60));
+---------------------+----------------------------------------+
| NOW() | FLOOR(UNIX_TIMESTAMP(NOW()) / (30*60)) |
+---------------------+----------------------------------------+
| 2018-03-02 08:24:48 | 844448 |
+---------------------+----------------------------------------+
Step 3 is needs to be done once and kept in a permanent table. Or, if you have MariaDB, use a "seq" pseudo-table; for example `seq_844448_to_900000 would dynamically give a table that would reach pretty far into the future.
Step 6 example:
mysql> SELECT DATE_FORMAT(FROM_UNIXTIME((844448) * 30*60), "%b %d %h:%i");
+-------------------------------------------------------------+
| DATE_FORMAT(FROM_UNIXTIME((844448) * 30*60), "%b %d %h:%i") |
+-------------------------------------------------------------+
| Mar 02 08:00 |
+-------------------------------------------------------------+
+---------------------------------------------------------------+
| DATE_FORMAT(FROM_UNIXTIME((844448+1) * 30*60), "%b %d %h:%i") |
+---------------------------------------------------------------+
| Mar 02 08:30 |
+---------------------------------------------------------------+
Well, this could be a bit verbose but it works:
SELECT hours, SUM(count) as count FROM (
SELECT CONCAT(HOUR(created_at), ':', LPAD(30 * FLOOR(MINUTE(created_at)/30), 2, '0'), '-',
HOUR(DATE_ADD(created_at, INTERVAL 30 minute)), ':', LPAD(30 * FLOOR(MINUTE(DATE_ADD(created_at, INTERVAL 30 minute))/30), 2, '0')) as hours,
COUNT(*) as count
FROM urls
GROUP BY HOUR(created_at), FLOOR(MINUTE(created_at)/30)
UNION ALL
SELECT '00:00-00:30'as hours, 0 as count UNION ALL SELECT '00:30-01:00'as hours, 0 as count UNION ALL
SELECT '01:00-01:30'as hours, 0 as count UNION ALL SELECT '01:30-02:00'as hours, 0 as count UNION ALL
SELECT '02:00-02:30'as hours, 0 as count UNION ALL SELECT '02:30-03:00'as hours, 0 as count UNION ALL
SELECT '03:00-03:30'as hours, 0 as count UNION ALL SELECT '03:30-04:00'as hours, 0 as count UNION ALL
SELECT '04:00-04:30'as hours, 0 as count UNION ALL SELECT '04:30-05:00'as hours, 0 as count UNION ALL
SELECT '05:00-05:30'as hours, 0 as count UNION ALL SELECT '05:30-06:00'as hours, 0 as count UNION ALL
SELECT '06:00-06:30'as hours, 0 as count UNION ALL SELECT '06:30-07:00'as hours, 0 as count UNION ALL
SELECT '07:00-07:30'as hours, 0 as count UNION ALL SELECT '07:30-08:00'as hours, 0 as count UNION ALL
SELECT '08:00-08:30'as hours, 0 as count UNION ALL SELECT '08:30-09:00'as hours, 0 as count UNION ALL
SELECT '09:00-09:30'as hours, 0 as count UNION ALL SELECT '09:30-10:00'as hours, 0 as count UNION ALL
SELECT '10:00-10:30'as hours, 0 as count UNION ALL SELECT '10:30-11:00'as hours, 0 as count UNION ALL
SELECT '11:00-11:30'as hours, 0 as count UNION ALL SELECT '11:30-12:00'as hours, 0 as count UNION ALL
SELECT '12:00-12:30'as hours, 0 as count UNION ALL SELECT '12:30-13:00'as hours, 0 as count UNION ALL
SELECT '13:00-13:30'as hours, 0 as count UNION ALL SELECT '13:30-14:00'as hours, 0 as count UNION ALL
SELECT '14:00-14:30'as hours, 0 as count UNION ALL SELECT '14:30-15:00'as hours, 0 as count UNION ALL
SELECT '15:00-15:30'as hours, 0 as count UNION ALL SELECT '15:30-16:00'as hours, 0 as count UNION ALL
SELECT '16:00-16:30'as hours, 0 as count UNION ALL SELECT '16:30-17:00'as hours, 0 as count UNION ALL
SELECT '17:00-17:30'as hours, 0 as count UNION ALL SELECT '17:30-18:00'as hours, 0 as count UNION ALL
SELECT '18:00-18:30'as hours, 0 as count UNION ALL SELECT '18:30-19:00'as hours, 0 as count UNION ALL
SELECT '19:00-19:30'as hours, 0 as count UNION ALL SELECT '19:30-20:00'as hours, 0 as count UNION ALL
SELECT '20:00-20:30'as hours, 0 as count UNION ALL SELECT '20:30-21:00'as hours, 0 as count UNION ALL
SELECT '21:00-21:30'as hours, 0 as count UNION ALL SELECT '21:30-22:00'as hours, 0 as count UNION ALL
SELECT '22:00-22:30'as hours, 0 as count UNION ALL SELECT '22:30-23:00'as hours, 0 as count UNION ALL
SELECT '23:00-23:30'as hours, 0 as count UNION ALL SELECT '23:30-00:00'as hours, 0 as count
) AS T
GROUP BY hours ORDER BY hours;
The most difficult part of your query is output of statistics for intervals that don't have any hits. SQL is all about querying and aggregating existing data; selecting or aggregating the data missing in the table is quite unordinary task. That's why, like Wolph stated in comments, there is no pretty solution for this task.
I solved this problem by explicitly selecting all half intervals of the day. This solution could be used if number of intervals is limited like in your case. This will not work however if you aggregate by different days from long period of time.
I'm not a fan of this query but I can't propose anything better. More elegant solution could be achieved with stored procedure with a loop, but seems like you want to solve it with raw SQL query.
You can add some math to calculate 48 intervals instead of 24 and put it into another field by which you're going to group and sort.
SELECT HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) as interval48,
if(HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) % 2 =0,
CONCAT(HOUR(created_at), ':00-', HOUR(created_at), ':30'),
CONCAT(HOUR(created_at), ':30-', HOUR(created_at)+1, ':00')
) as hours,
count(*)
FROM urls
GROUP BY HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30)
ORDER BY HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) ASC
Example of result:
0 0:00-0:30 2017
1 0:30-1:00 1959
2 1:30-2:00 1830
3 1:30-2:00 1715
4 2:30-3:00 1679
5 2:30-3:00 1688
The result of original query posted by Jazerix was:
0:00-1:00 3976
1:00-2:00 3545
2:00-3:00 3367
A different Approach without creating additional tables. May look like a hack though :-)
Step 1 : Generate a Time Table Dynamically
Assumption : INFORMATION_SCHEMA DB is avaialble and has a table COLLATIONS which normally has more than 100 records. You can use any table which has minimum 48 records
Query :
SELECT #time fromTime, ADDTIME(#time, '00:29:00') toTime,
#time := ADDTIME(#time, '00:30:00')
FROM information_schema.COLLATIONS
JOIN (SELECT #time := TIME('00:00:00')) a
WHERE #time < '24:00:00'
Above query will give a table with from time and to time with an interval of 30 minutes.
Step 2 : Use the first query to generate required result joining urls table
Query :
SELECT CONCAT(fromTime, '-', toTime) AS halfHours, COUNT(created_at)
FROM
(SELECT #time fromTime, ADDTIME(#time, '00:29:00') toTime, #time := ADDTIME(#time, '00:30:00')
FROM information_schema.COLLATIONS
JOIN (SELECT #time := TIME('00:00:00')) a
WHERE #time < '24:00:00'
) timeTable
LEFT JOIN urls ON HOUR(created_at) BETWEEN HOUR(fromTime) AND HOUR(toTime)
AND MINUTE(created_at) BETWEEN MINUTE(fromTime) AND MINUTE(toTime)
GROUP BY fromTime
SQLFiddle
I hope this will work for,
SELECT
#sTime:= CONCAT(HOUR(created_at),":",
(CASE WHEN MINUTE(created_at) > 30 THEN 30 ELSE 0 END)) as intVar,
(CONCAT(
AddTime(#sTime, '00:00:00'),
' to ',
AddTime(#sTime, '00:30:00')
)) as timeInterval,
COUNT(*) FROM urls
GROUP BY
(CONCAT(HOUR(created_at),":",(CASE WHEN MINUTE(created_at) > 30 THEN 30 ELSE 0 END)))
ORDER BY HOUR(created_at) ASC
Simply convert to sec and divide by 30 mins(1800secs). And to verify i used min, max on timestamp.
SELECT concat(TIME_FORMAT(min(created_at),"%H:%i")," - ", TIME_FORMAT(max(created_at),"%H:%i")) as hours,
COUNT(*)
FROM urls
GROUP BY FLOOR(TIME_TO_SEC(created_at)/1800)
ORDER BY HOUR(created_at) ASC

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.