subtract the data for every 5 minutes between two particular times - mysql

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.

For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.

Related

Efficient SQL Query to calculate portion of a row in half hourly time series that has occurred

I have a table that looks like this:
id
slot
total
1
2022-12-01T12:00
100
2
2022-12-01T12:30
150
3
2022-12-01T13:00
200
There's an index on slot already. The table has ~100mil rows (and a bunch more columns not shown here)
I want to sum the total up to the current moment in time (EDIT: WASN'T CLEAR INITIALLY, I WILL PROVIDE A LOWER SLOT BOUND, SO THE SUM WILL BE OVER SOME NUMBER OF DAYS/WEEKS, NOT OVER FULL TABLE). Let's say the time is currently 2022-12-01T12:45. If I run select * from my_table where slot < CURRENT_TIMESTAMP(),
then I get back records 1 and 2.
However, in my data, the records represent forecasted sales within a time slot. I want to find the forecasts as of 2022-12-01T12:45, and so I want to find the proportion of the half hour slot of record 2 that has elapsed, and return that proportion of the total.
As of 2022-12-01T12:45 (assuming minute granularity), 50% of row 2 has elapsed, so I would expect the total to return as 150 / 2 = 75.
My current query works, but is slow. What are some ways I can optimise this, or other approaches I can take?
Also, how can we extend this solution to be generalised to any interval frequency? Maybe tomorrow we change our forecasting model and the data comes in sporadically. The hardcoded 30 would not work in that case.
select sum(fraction * total) as t from
select total,
LEAST(
timestampdiff(
minute,
datetime,
current_timestamp()
),
30
) / 30 as fraction
from my_table
where slot <= current_timestamp()
Consider computing your sum first, then remove the last element partial total. In order to keep the last element total, I'd prefer applying window functions instead of aggregations, and limit the output to the last row.
SET #current_time = CURRENT_TIMESTAMP();
WITH cte AS (
SELECT slot,
SUM(total) OVER(ORDER BY slot) AS total,
total AS rowtotal
FROM my_table
WHERE slot < #current_time
ORDER BY slot DESC
LIMIT 1
)
SELECT slot,
total - (30 - TIMESTAMPDIFF(MINUTE,
slot,
#current_time))
/30 * rowtotal AS total
FROM cte
Check the demo here.
Note1: Adding an index on the slot field is likely to boost this query performance.
Note2: If your query is running on millions of data, your timestamp may be likely to change during the query. You could store it into a variable before the query is run (or into another cte).
create an ondex in slot column btree as it is having high selectivity;

MySQL - group by interval query optimisation

Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).

Optimalization of MySQL query

I'm using a MySQL database to store values from some energy measurement system. The problem is that the DB contains millions of rows, and the queries take somewhat long to complete. Are the queries optimal? What should I do to improve them?
The database table consists of rows with 15 columns each (t, UL1, UL2, UL3, PL1, PL2, PL3, P, Q1, Q2, Q3,CosPhi1, CosPhi2, CosPhi3, i), where t is time, P is total power and i is some identifier.
Seeing as I display the data in graphs grouped in different intervals (15 minutes, 1 hour, 1 day, 1 month) I want to group the querys as such.
As an example I have a graph that shows the kWh for every day in the current year. The query to gather the data goes like this:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The database has been gathering measurements for 13 days, and this query alone is already taking 2-3 seconds to complete. Those 13 days have added about 1-1.3 million rows to the db, as a new row gets added every second.
Is this query optimal?
I would actually create a secondary table that has a column for each DAY, and one for the total. Then, via a trigger, your insert into the detail table can update the secondary aggregate table. This way, you can sum the DAILY table which will be much quicker, and yet still have the per second table if you needed to look at the granular level details.
Having aggregate tables can be a common time-saver for querying, especially for read-only types of data, or data you know wont be changing. Then, if you want more granular detail such as hourly or 15 minute intervals, go directly to the raw data.
For this query:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The optimal index is a covering index: table(i, t, p).
2-3 seconds for 1+ million rows suggests that you already have an index.
You may want to consider DRapp's suggestion and use summary tables. In a few months, you will have so much data that historical queries could be taking a long time.
In the meantime, though, indexes and partitioning might provide sufficient performance for your needs.

Grouping by time ignoring the Date portion

I have a table that has a column that is called scores and another one that is called date_time
I am trying to find out for each 5 minute time increment how many I have that are above a certain score. I want to ignore the date portion completely and just base this off of time.
This is kind of like in a stats program where they display your peak hours with the only difference that I want to go is detailed as 5 minute time segments.
I am still fairly new at MySQL and Google seems to be my best companion.
What I have found so far is:
SELECT id, score, date_time, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY TIME(date_time) DIV 300;
Would this work or is there a better way to do this.
I don't think your query would work. You need to do a bit more work to get the time rounded to 5 minute intervals. Something like:
SELECT SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300) as time5, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300)
ORDER BY time5;

average rows in a column that are between 5 minutes

I would like to ask about how can I take average of rows in a column that are between 5 minutes.
In order to be more accurate I have a table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:03:34)----------45
1---------123------(24/4/2014 12:04:34)----------43
2---------127------(24/4/2014 12:04:37)----------50
3---------123------(28/4/2014 12:03:34)----------60
i would like to create a new table that will have the average of speed for rows that have the same link_id and are between 5 minutes
In the case that I mentioned above only the two first rows comply the requirements
and i want a new table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:00:00)----------44
2---------127------(24/4/2014 12:00:00)----------50
3---------123------(28/4/2014 12:00:00)----------60
which is the query that i have to use to create a new table with those requirments?
thank you in advance
It is not clear what you mean by 'average of speed for rows that ... are between five minutes.' So I will guess.
I guess you want to compute the averages for each distinct five minute interval. For example, you want averages of all items with timestamps from 2014-04-24 12:00:00 to 2014-04-24:12:04:59, then another average for items with timestamps from 2014-04-24 12:05:00 to 2014-04-24:12:09:59, and so forth.
To do this, you need to start with an expression that will take any DATETIME value and round it down to the beginning of its five-minute interval. How do you get that?
First, this expression will round down a timestamp to the beginning of the minute in which it occurs:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00')
This expression gives the number of minutes past the hour, modulo 5.
MINUTE(`date`)%5
So, this expression gives you the rounded-down DATETIME you need:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
Great. Now we need to use that in an aggregate query to compute the average speeds.
SELECT link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE AS five_min
AVG(speed) AS avg_speed
FROM mytable
GROUP BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
ORDER BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
This will do the trick you need done. There will be one row for each distinct link_id and five-minute interval of time. The time interval will be named by giving the time at which it begins. Each row will contain the average speed for observations in that time interval.
It's helpful when creating your specification for this kind of query to think very carefully about what you want each row of your result set to contain. If you do that, you will probably find that your query flows naturally from your specification.
Here's a more extensive writeup on how to do this sort of thing.
http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/