Calculating increasing or decreasing trend over time in MySQL - mysql

I have a table store_visits with the following structure:
store_visits:
store_name: string
visit_count: integer
visit_date: date
My goal is to create a query that for each store and a given date range, will calculate:
Average Number of Visits over the date range (currently using AVG(visit_count))
Whether store visits are increasing or decreasing
The relative rate of increase/decrease (1 to 4 scale where 1 = low rate, 4 = high rate)
The relative rate of increase/decrease in visits is for directional purpose only. It will always be a linear scale.
I've spent a day trying to construct the MySQL query to do this, and just can't get my head around it.
Any help would be greatly appreciated.
Thanks,
-Scott

Assuming you just want to compare the store visits in the first half of the date range to the second half, here's an example that spans the last 40 days using 2 sub-queries to get the counts for each range.
select
((endVisits + startVisits)/40) average,
(endVisits > startVisits) increasing,
((endVisits - startVisits)/(startVisits) * 100) percentChange
from
(select sum(visit_count) startVisits
from store_visit
where
visit_date > current_date - 40
and visit_date <= current_date - 20) startRange,
(select sum(visit_count) endVisits
from store_visit
where
visit_date > current_date - 20) endRange;
Notes
I don't know where the how you want to calculate your 1-4 increase amount, so I just made it a percentage and you can modify that to whatever logic you want. Also, you'll need to update the date ranges in the sub-queries as needed.
Edit: Just updated the average to ((endVisits + startVisits)/40) instead of ((endVisits + startVisits)/2). You could also use the avg function in your sub-queries and divide the sum of those by 2 to get the average over the whole period.

Related

Efficient SQL Query to calculate portion of a row in half hourly time series that has occurred

I have a table that looks like this:
id
slot
total
1
2022-12-01T12:00
100
2
2022-12-01T12:30
150
3
2022-12-01T13:00
200
There's an index on slot already. The table has ~100mil rows (and a bunch more columns not shown here)
I want to sum the total up to the current moment in time (EDIT: WASN'T CLEAR INITIALLY, I WILL PROVIDE A LOWER SLOT BOUND, SO THE SUM WILL BE OVER SOME NUMBER OF DAYS/WEEKS, NOT OVER FULL TABLE). Let's say the time is currently 2022-12-01T12:45. If I run select * from my_table where slot < CURRENT_TIMESTAMP(),
then I get back records 1 and 2.
However, in my data, the records represent forecasted sales within a time slot. I want to find the forecasts as of 2022-12-01T12:45, and so I want to find the proportion of the half hour slot of record 2 that has elapsed, and return that proportion of the total.
As of 2022-12-01T12:45 (assuming minute granularity), 50% of row 2 has elapsed, so I would expect the total to return as 150 / 2 = 75.
My current query works, but is slow. What are some ways I can optimise this, or other approaches I can take?
Also, how can we extend this solution to be generalised to any interval frequency? Maybe tomorrow we change our forecasting model and the data comes in sporadically. The hardcoded 30 would not work in that case.
select sum(fraction * total) as t from
select total,
LEAST(
timestampdiff(
minute,
datetime,
current_timestamp()
),
30
) / 30 as fraction
from my_table
where slot <= current_timestamp()
Consider computing your sum first, then remove the last element partial total. In order to keep the last element total, I'd prefer applying window functions instead of aggregations, and limit the output to the last row.
SET #current_time = CURRENT_TIMESTAMP();
WITH cte AS (
SELECT slot,
SUM(total) OVER(ORDER BY slot) AS total,
total AS rowtotal
FROM my_table
WHERE slot < #current_time
ORDER BY slot DESC
LIMIT 1
)
SELECT slot,
total - (30 - TIMESTAMPDIFF(MINUTE,
slot,
#current_time))
/30 * rowtotal AS total
FROM cte
Check the demo here.
Note1: Adding an index on the slot field is likely to boost this query performance.
Note2: If your query is running on millions of data, your timestamp may be likely to change during the query. You could store it into a variable before the query is run (or into another cte).
create an ondex in slot column btree as it is having high selectivity;

MySQL Daily Time Coverage Without Gaps

I have a table like the following example:
What I need to do is return the coverage (number of hours an operator/s were onsite) for each day. The challenge is that I need to ignore gaps in coverage and not double count hours where two operators were signed in at the same time. For instance, the image below is a visual representation of the table.
The logic of the image is as follows:
Operator A: Signed in at 10 and signed out at noon for a total of 2 hours
Operator B: Signed in at 1 and signed out at 3 for a total of 2 hours
Operator A: Came back and signed in at 2 and signed out at 5 for a total of 3 hours but 1 hour overlaps with operator A so I cannot count that 1 hour otherwise I will be double counting coverage
Therefore the total coverage time without overlaps is 6 hours and the value I need the query to produce. So far I can ignore double counting by taking the max in min dates of each day and subtracting the two:
SELECT YEAR, WEEK, SUM(HOURS)
FROM
(SELECT
YEAR(SignedIn) AS YEAR,
WEEK(SignedIn) AS WEEK,
DAY(SignedIn) AS DAY,
time_to_sec(timediff(MAX(SignedOut), MIN(SignedIn)))/ 3600 AS HOURS
FROM OperatorLogs
GROUP BY YEAR, WEEK, DAY) As VirtualTable
GROUP BY YEAR, WEEK
Which produces 7 because it takes the first sign-in (10 AM) and calculates the hours up until the last sign-out (4:00 PM). However, it includes the gap in coverage (12 - 1) which should not be included. I am unsure of how to remove that time from the total hours while also not double counting when there is overlap, i.e. from 2-3 there should only be 1 hour of coverage even though two separate operators are on site each putting in an hour. Any help is appreciated.
Sorry, work interrupted me.
Here's my working solution, I'm not convinced it's optimal due to the (relatively) expensive nature of the joins, but I've optimised it slightly based on the soft-rule that "shifts" never span multiple days.
SELECT
calendar_date,
SUM(coverage_seconds) / 3600 AS coverage_hours
FROM
(
-- Signins that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedin) AS calendar_date,
-(UNIX_TIMESTAMP(e.signedin) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedin)
AND o.signedin < e.signedin
AND o.signedout >= e.signedin
WHERE
o.signedin IS NULL
UNION ALL
-- Signouts that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedout) AS calendar_date,
+(UNIX_TIMESTAMP(e.signedout) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedout)
AND o.signedin <= e.signedout
AND o.signedout > e.signedout
WHERE
o.signedin IS NULL
)
AS coverage_markers
GROUP BY
calendar_date
;
Feel free to test it with more rigourous data...
https://www.db-fiddle.com/f/4RgWVhcdNEro21rUksVdXD/0
(As a note, to make your sample data match your excel image, your first shift should have started at 9am)

Mysql 10 min avergage 1to10 min

I need to calculate 10 minute average of my MySql table. But the default average it's not what I want. I've a code for "default" average:
SELECT Date, convert((min(Time) DIV 1000)*1000,time) as Time, ROUND(AVG(Value),2) FROM RawData
GROUP BY Date, Time DIV 1000
How can I calculate average like this:
http://prntscr.com/8k4slp
And one more thing... I need to prevent the average calculation of incomplete intervals. How can I do this?

SQL - Calculating variable moving average over variable lenghts

FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.
I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?

subtract the data for every 5 minutes between two particular times

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.
For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.