How to count entries in a mysql table grouped by time - mysql

I've found lots of not quite the answers to this question, but nothing I can base my rather limited sql skills on...
I've got a gas meter, which gives a pulse every cm3 of gas used - the time the pulses happen is obtained by a pi and stored in a mysql db. I'm trying to graph the db. In order to graph the data, I want to sum how many pulses are received every n time period. Where n may be 5 mins for a graph covering a day or n may be up to 24hours for a graph covering a year.
The data are in a table which has two columns, a primary key/auto inc called "pulse_ref" and "pulse_time" which stores a unix timestamp of the time a pulse was received.
Can anyone suggest a sql query to count how many pulses occurred grouped up into, say, 5minutely intervals?
Create table:
CREATE TABLE `gas_pulse` (
`pulse_ref` int(11) NOT NULL AUTO_INCREMENT,
`pulse_time` int(11) DEFAULT NULL,
PRIMARY KEY (`pulse_ref`));
Populate some data:
INSERT INTO `gas_pulse` VALUES (1,1477978978),(2,1477978984),(3,1477978990),(4,1477978993),(5,1477979016),(6,1477979063),(7,1477979111),(8,1477979147),(9,1477979173),(10,1477979195),(11,1477979214),(12,1477979232),(13,1477979249),(14,1477979267),(15,1477979285),(16,1477979302),(17,1477979320),(18,1477979337),(19,1477979355),(20,1477979372),(21,1477979390),(22,1477979408),(23,1477979425),(24,1477979443),(25,1477979461),(26,1477979479),(27,1477979497),(28,1477979515),(29,1477979533),(30,1477979551),(31,1477979568),(32,1477979586),(33,1477980142),(34,1477980166),(35,1477981433),(36,1477981474),(37,1477981526),(38,1477981569),(39,1477981602),(40,1477981641),(41,1477981682),(42,1477981725),(43,1477981770),(44,1477981816),(45,1477981865),(46,1477981915),(47,1477981966),(48,1477982017),(49,1477982070),(50,1477982124),(51,1477982178),(52,1477982233),(53,1477988261),(54,1477988907),(55,1478001784),(56,1478001807),(57,1478002385),(58,1478002408),(59,1478002458),(60,1478002703),(61,1478002734),(62,1478002784),(63,1478002831),(64,1478002863),(65,1478002888),(66,1478002909),(67,1478002928),(68,1478002946),(69,1478002964),(70,1478002982),(71,1478003000),(72,1478003018),(73,1478003036),(74,1478003054),(75,1478003072),(76,1478003090),(77,1478003108),(78,1478003126),(79,1478003145),(80,1478003163),(81,1478003181),(82,1478003199),(83,1478003217),(84,1478003235),(85,1478003254),(86,1478003272),(87,1478003290),(88,1478003309),(89,1478003327),(90,1478003346),(91,1478003366),(92,1478003383),(93,1478003401),(94,1478003420),(95,1478003438),(96,1478003457),(97,1478003476),(98,1478003495),(99,1478003514),(100,1478003533),(101,1478003552),(102,1478003572),(103,1478003592),(104,1478003611),(105,1478003632),(106,1478003652),(107,1478003672),(108,1478003693),(109,1478003714),(110,1478003735),(111,1478003756),(112,1478003778),(113,1478003799),(114,1478003821),(115,1478003844),(116,1478003866),(117,1478003889),(118,1478003912),(119,1478003936),(120,1478003960),(121,1478003984),(122,1478004008),(123,1478004033),(124,1478004058),(125,1478004084),(126,1478004109),(127,1478004135),(128,1478004161),(129,1478004187),(130,1478004214),(131,1478004241),(132,1478004269),(133,1478004296),(134,1478004324),(135,1478004353),(136,1478004381),(137,1478004410),(138,1478004439),(139,1478004469),(140,1478004498),(141,1478004528),(142,1478004558),(143,1478004589),(144,1478004619),(145,1478004651),(146,1478004682),(147,1478004714),(148,1478004746),(149,1478004778),(150,1478004811),(151,1478004844),(152,1478004877),(153,1478004911),(154,1478004945),(155,1478004979),(156,1478005014),(157,1478005049),(158,1478005084),(159,1478005120),(160,1478005156),(161,1478005193),(162,1478005231),(163,1478005268),(164,1478005306),(165,1478005344),(166,1478005383),(167,1478005422),(168,1478005461),(169,1478005501),(170,1478005541),(171,1478005582),(172,1478005622),(173,1478005663),(174,1478005704),(175,1478005746),(176,1478005788),(177,1478005831),(178,1478005873),(179,1478005917),(180,1478005960),(181,1478006004),(182,1478006049),(183,1478006094),(184,1478006139),(185,1478006186),(186,1478006231),(187,1478006277),(188,1478010694),(189,1478010747),(190,1478010799),(191,1478010835),(192,1478010862),(193,1478010884),(194,1478010904),(195,1478010924),(196,1478010942),(197,1478010961),(198,1478010980),(199,1478010999),(200,1478011018),(201,1478011037),(202,1478011056),(203,1478011075),(204,1478011094),(205,1478011113),(206,1478011132),(207,1478011151),(208,1478011170),(209,1478011189),(210,1478011208),(211,1478011227),(212,1478011246),(213,1478011265),(214,1478011285),(215,1478011304),(216,1478011324),(217,1478011344),(218,1478011363),(219,1478011383),(220,1478011403),(221,1478011423),(222,1478011443),(223,1478011464),(224,1478011485),(225,1478011506),(226,1478011528),(227,1478011549),(228,1478011571),(229,1478011593),(230,1478011616),(231,1478011638),(232,1478011662),(233,1478011685),(234,1478011708),(235,1478011732),(236,1478011757),(237,1478011782),(238,1478011807),(239,1478011832),(240,1478011858),(241,1478011885),(242,1478011912),(243,1478011939),(244,1478011967),(245,1478011996),(246,1478012025),(247,1478012054),(248,1478012086),(249,1478012115),(250,1478012146),(251,1478012178),(252,1478012210),(253,1478012244),(254,1478012277),(255,1478012312),(256,1478012347),(257,1478012382),(258,1478012419),(259,1478012456),(260,1478012494),(261,1478012531),(262,1478012570),(263,1478012609),(264,1478012649),(265,1478012689),(266,1478012730),(267,1478012771),(268,1478012813),(269,1478012855),(270,1478012898),(271,1478012941),(272,1478012984),(273,1478013028),(274,1478013072),(275,1478013117),(276,1478013163),(277,1478013209),(278,1478013255),(279,1478013302),(280,1478013350),(281,1478013399),(282,1478013449),(283,1478013500),(284,1478013551),(285,1478013604),(286,1478013658),(287,1478013714),(288,1478013771),(289,1478013830),(290,1478013891),(291,1478013954),(292,1478014019),(293,1478014086),(294,1478014156),(295,1478014228),(296,1478014301),(297,1478014373),(298,1478014446),(299,1478014518),(300,1478014591),(301,1478014664),(302,1478014736),(303,1478014809),(304,1478014882),(305,1478015377),(306,1478015422),(307,1478015480),(308,1478015543),(309,1478015608),(310,1478015676),(311,1478015740),(312,1478015803),(313,1478015864),(314,1478015921),(315,1478015977),(316,1478016030),(317,1478016081),(318,1478016129),(319,1478016176);

I assume you need to get the pulse count in n-minute (in your case 5 minutes) intervals. For achieving this, please try the following query
SELECT
COUNT(*) AS gas_pulse_count,
FROM_UNIXTIME(pulse_time - MOD(pulse_time, 5 * 60)) from_time,
FROM_UNIXTIME((pulse_time - MOD(pulse_time, 5 * 60)) + 5 * 60) to_time
FROM
gas_pulse
GROUP BY from_time

Related

What data type should I use in MySQL to store daily items and plot 1-year charts?

I am developing a web application and I want to plot a 1-year chart with daily data points.
The x-axis is time (date) and the y-axis is of numeric type.
MySQL version: 8.0 (or higher)
The DDBB must store data points for multiple customers.
For each customer I want to show the last 365 data points (1-year data).
Each data point is a tuple: (date, int). For example: (2022/11/10, 35)
The chart displays data for one single customer at a time.
Every day a new data point is calculated and added to the customer dataset.
Every customer must contain up to 5 years of data points
The number of customers is 1000.
Assuming customer is a foreign key (FK) to the Customers table, I have considered two options for the dataset.
Option A
Primary Key
Customer(FK)
Date
Value
1
Customer 1
Date 1
Val1
2
Customer 1
Date 2
Val2
...
...
...
...
N
Customer 1
Date N
ValN
N+1
Customer 2
Date 1
ValN+1
...
...
...
...
2N
Customer 2
Date N
Val2N
Option B
Use a JSON type for the dataset
Primary Key
Customer(FK)
Dataset
1
Customer 1
Dataset 1
2
Customer 2
Dataset 2
Where each dataset looks like:
((2022/01/01, 35), (2022/01/02, 17), ...., (2022/12/31, 42))
Comments:
My interest is to plot the chart as fast as possible and since data insert/update operations only happen once a day (for every customer), my question is:
Which option is better for data retrieval?
Right now I have around 50 customers and 2-year data history, but I don't know how the DDBB will perform when I increase both, the number of customers and years.
Additionally, I am using a JavaScript plotting library in the frontend so I was wondering whether the JSON data type approach could fit better for this purpose.
CREATE TABLE datapoints (
c_id SMALLINT UNSIGNED NOT NULL,
date DATE NOT NULL,
datapoint SMALLINT/MEDIUMINT/INT [UNSIGNED] /FLOAT NOT NULL,
PRIMARY KEY(c_id, date),
) ENGINE=InnoDB;
Pick the smallest datatype that is appropriate for your values. For example, SMALLINT UNSIGNED takes only 2 bytes and allows non-negative values up to 64K. FLOAT is 4 bytes and has a big range and far more significant digits (about 7) than you can reasonably graph.
The main queries. First various ways to do the daily INSERT:
INSERT INTO datapoints (c_id, date, datapoint)
VALUES(?,?,?);
or
INSERT INTO datapoints (c_id, date, datapoint)
VALUES
(?,?,?),
(?,?,?), ...
(?,?,?); -- 1000 rows batched
or
LOAD DATA ...
Fetching for the graph:
SELECT date, datapoint
FROM datapoints
WHERE c_id = ...
AND date >= CURDATE() - INTERVAL 1 YEAR -- or whatever
ORDER BY date;
1.8M rows (probably under 1GB) is not very big. Still, I recommend the PRIMARY KEY be in that order and not involve an AUTO_INCREMENT. The INSERT(s) will poke into the table at 1000 places once a day. The SELECT (for graphing) will find all the data clustered together -- very fast.
If you will be keeping the data past the year, we can discuss things further. Meanwhile, to purge after 5 years, this will be slow, but it is only once a day:
DELETE FROM datapoints
WHERE date < CURDATE() - INTERVAL 5 YEAR;

What is the best way to handle millions of rows inside the Visits table?

According to this question, The answer is correct and made the queries better but does not solve the whole problem.
CREATE TABLE `USERS` (
`ID` char(255) COLLATE utf8_unicode_ci NOT NULL,
`NAME` char(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
There are only 5 rows inside the USERS table.
ID
NAME
C9XzpOxWtuh893z1GFB2sD4BIko2
...
I2I7CZParyMatRKnf8NiByujQ0F3
...
EJ12BBKcjAr2I0h0TxKvP7uuHtEg
...
VgqUQRn3W6FWAutAnHRg2K3RTvVL
...
M7jwwsuUE156P5J9IAclIkeS4p3L
...
CREATE TABLE `VISITS` (
`USER_ID` char(255) COLLATE utf8_unicode_ci NOT NULL,
`VISITED_IN` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `USER_ID` (`USER_ID`,`VISITED_IN`),
CONSTRAINT `VISITS_ibfk_1` FOREIGN KEY (`USER_ID`) REFERENCES `USERS` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The indexes inside the VISITS table:
Keyname
Type
Unique
Packed
Column
Cardinality
Collation
Null
Comment
USER_ID
BTREE
No
No
USER_ID VISITED_IN
3245 5283396
A A
No No
There are 5,740,266 rows inside the VISITS table:
C9XzpOxWtuh893z1GFB2sD4BIko2 = 4,359,264 profile visits
I2I7CZParyMatRKnf8NiByujQ0F3 = 1,237,286 profile visits
EJ12BBKcjAr2I0h0TxKvP7uuHtEg = 143,716 profile visits
VgqUQRn3W6FWAutAnHRg2K3RTvVL = 0 profile visits
M7jwwsuUE156P5J9IAclIkeS4p3L = 0 profile visits
The time is taken for queries: (Seconds will change according to the number of rows)
SELECT COUNT(*) FROM VISITS WHERE USER_ID = C9XzpOxWtuh893z1GFB2sD4BIko2
Before applying Rick James' answer, The query took between 90 to 105 seconds
After applying Rick James' answer, The query took between 55 to 65 seconds
SELECT COUNT(*) FROM VISITS WHERE USER_ID = I2I7CZParyMatRKnf8NiByujQ0F3
Before applying Rick James' answer, The query took between 90 to 105 seconds
After applying Rick James' answer, The query took between 20 to 30 seconds
SELECT COUNT(*) FROM VISITS WHERE USER_ID = EJ12BBKcjAr2I0h0TxKvP7uuHtEg
Before applying Rick James' answer, The query took between 90 to 105 seconds
After applying Rick James' answer, The query took between 4 to 8 seconds
SELECT COUNT(*) FROM VISITS WHERE USER_ID = VgqUQRn3W6FWAutAnHRg2K3RTvVL
Before applying Rick James' answer, The query took between 90 to 105 seconds
After applying Rick James' answer, The query took between 1 to 3 seconds
SELECT COUNT(*) FROM VISITS WHERE USER_ID = M7jwwsuUE156P5J9IAclIkeS4p3L
Before applying Rick James' answer, The query took between 90 to 105 seconds
After applying Rick James' answer, The query took between 1 to 3 seconds
As you can see before applying the index, It was taken between 90 to 105 seconds to count the visits of a specific user even if the user has a few rows (visits).
After applying the index things became better but the problem is:
If I visit the C9XzpOxWtuh893z1GFB2sD4BIko2 profile, It will take
between 55 to 65 seconds to get profile visits.
If I visit the I2I7CZParyMatRKnf8NiByujQ0F3 profile, It will take
between 20 to 30 seconds to get profile visits.
Etc...
The user who has a few rows (visits) will be lucky because his profile will load faster.
I can ignore everything above and create a column inside the USERS table to count the user visits and increase it when catching a new visit without creating millions of rows but that will not be working with me because I allow the user to filter the visits like this:
Last 60 minutes
Last 24 hours
Last 7 days
Last 30 days
Last 6 months
Last 12 months
All-time
What should I do?
The problem is that you are evaluating, and continually re-evaluating, very large row counts that are actually part of history and can never change. You cannot count these rows every time, because that takes too long. You want to provide counts for:
Last 60 minutes
Last 24 hours
Last 7 days
Last 30 days
Last six months
All-time
You need four tables:
Table 1: A small, fast table holding the records of visits today and yesterday
Table 2: An even smaller, very fast table holding counts for the periods 'Day before yesterday ("D-2") to "D-7", field 'D2toD7', the period 'D8toD30', 'D31toD183' and 'D184andEarlier'
Table 3: A table holding the visit counts for each user on each day
Table 4: The very large and slow table you already have, with each visit logged against a timestamp
You can then get the 'Last 60 minutes' and 'Last 24 hours' counts by doing a direct query on Table 1, which will be very fast.
‘Last 7 days’ is the count of all records in Table 1 (for your user) plus the D2toD7 value (for your user) in Table 2.
‘Last 30 days’ is the count of all records in Table 1 (for your user) plus D2toD7, plus D8toD30.
‘Last six months’ is Table 1 plus D2toD7, plus D8toD30, plus D31toD183.
‘All-time’ is Table 1 plus D2toDy, plus D8toD30, plus D31toD183, plus D184andEarlier.
I’d be running php scripts to retrieve these values – there’s no need to try and do it all in one complex query. A few, even several, very quick hits on the database, collect up the numbers, return the result. The script will run in very much less than one second.
So, how do you keep the counts in Table 2 updated? This is where you need Table 3, which holds counts of visits by each user on each day. Create Table 3 and populate it with COUNT values for the data in your enormous table of all visits, GROUP BY User and Date, so you have the number of visits by each user on each day. You only need to create and populate Table 3 once.
You now need a CRON job/script, or similar, to run once a day. This script will delete rows recording visits made the day before yesterday from Table 1. This script needs to:
Identify the counts of visits for each user the day before yesterday
Insert those counts in Table 3 with the ‘day before yesterday’ date.
Add the count values to the ‘D2toD7’ values for each user in Table 2.
Delete the 'day before yesterday' rows from Table 1.
Look up the value for (what just became) D8 for each user in Table 3. Decrement this value from the ‘D2 to D7’ value for each user.
For each of the ‘D8toD30’, ’D31toD183’ etc. fields, increment for the day that is now part of the time period, decrement as per the day that drops out of the time period. Using the values stored in Table 3.
Remember to keep a sense of proportion; a period of 183 days approximates to six months well enough for any real-world visit counting purpose.
Overview: you cannot count millions of rows quickly. Use the fact that these are historical figures that will never change. Because you have Table 1 for the up-to-the-minute counts, you only need to update the historic period counts once a day. Multiple (even dozens of) very, very fast queries will get you accurate results very quickly.
This not be the answer, but a suggestion.
If they do not require real-time data,
Can't we run a scheduler and insert these into a summary table every x minutes. then we can access that summary table for your count.
Note: We can add a sync time column to your table if you need a time-wise login count. (Then your summery table also getting increased dynamically)
Table column ex:
PK_Column, user ID, Numb of visit, sync_time
We can use asynchronous (reactive) implementation for your front end. That mean, Data will load after some time, but the user never will experience that delay in his work.
create a summary table and every day at 12.00 AM run a job and put the user wise and date wise last visited summery into that table.
user_visit_Summary Table:
PK_Column, User ID, Number_of_Visites, VISIT_Date
Note: Create indexes for User ID and the Date fields
When you're retrieving the data, you're going to access it by a DB function
Select count(*) + (Select Number_of_Visites from VISITS
where user_id = xxx were VISIT_Date <= ['DATE 12:00 AM' -1] PK_Column desc limit 1) as old_visits
where USER_ID = xxx and VISITED_IN > 'DATE 12:00 AM';
For any query of a day or longer, use a Summary table.
That is, build and maintain a Summary table with 3 columns user_id, date, count; PRIMARY KEY(user_id, date) For "all time" and "last month", the query will be
SELECT CUM(count) FROM summary WHERE user_id='...';
SELECT CUM(count) FROM summary
WHERE user_id='...'
AND date >= CURDATE() - INTERVAL 1 MONTH
At midnight each night, roll the your current table up into one row per user in the summary table, then clear the table. This table will continue to be used for shorter timespans.
This achieves speed for every user for every time range.
But, there is a "bug". I am forcing "day"/"week"/etc to be midnight to midnight, and not allowing you to really says "the past 24 hours".
I suggest the following compromise for that "bug":
For long timespans, use the summary table, plus count today's hits from the other table.
For allowing "24 hours" to reach into yesterday, change the other table to reach back to yesterday morning. That is, purge only after 24 hours, not 1 calendar day.
To fetch all counters at once, do all the work in subqueries. There are two approaches, probably equally fast, but the result is either in rows or columns:
-- rows:
SELECT 'hour', COUNT(*) FROM recent ...
UNION ALL
SELECT '24 hr', COUNT(*) FROM recent ...
UNION ALL
SELECT 'month', SUM(count) FROM summary ...
UNION ALL
SELECT 'all', SUM(count) FROM summary ...
;
-- columns:
SELECT
( SELECT COUNT(*) FROM recent ... ) AS 'hour'.
( SELECT COUNT(*) FROM recent ... ) AS '24 hr',
( SELECT SUM(count) FROM summary ... ) AS 'last month'
( SELECT SUM(count) FROM summary ... ) AS 'all time'
;
The "..." is
WHERE user_id = '...'
AND datetime >= ... -- except for "all time"
There is an advantage in rolling the several queries into a single query (either way) -- This avoids multiple round trips to the server and multiple invocations of the Optimizer.
forpas provided another approach https://stackoverflow.com/a/72424133/1766831 but it needs to be adjusted to reach into two different tables.

MySQL - group by interval query optimisation

Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).

Retrieve one out of every n records

I have a table containing thousands of records representing the temperature of a room in a certain moment. Up to now I have been rendering a client side graph of the temperature with JQuery. However, as the amount of records increases, I think it makes no sense to provide so much data to the view, if it is not going to be able to represent them all in a single graph.
I would like to know if there exists a single MySQL query that returns one out of every n records in the table. If so, I think I could get a representative sample of the temperatures measured during a certain lapse of time.
Any ideas? Thanks in advance.
Edit: add table structure.
CREATE TABLE IF NOT EXISTS `temperature` (
`nid` int(10) unsigned NOT NULL COMMENT 'Node identifier',
`temperature` float unsigned NOT NULL COMMENT 'Temperature in Celsius degrees',
`timestamp` int(10) unsigned NOT NULL COMMENT 'Unix timestamp of the temperature record',
PRIMARY KEY (`nid`,`timestamp`)
)
You could do this, where the subquery is your query, and you add a row number to it:
SET #rows=0;
SELECT * from(
SELECT #rows:=#rows+1 AS rowNumber,nid,temperature,`timestamp`
FROM temperature
) yourQuery
WHERE MOD(rowNumber, 5)=0
The mod would choose every 5th row: The 5 here is your n. so 5th row, then 10th, 15th etc.
Not really sure what your asking but you have multiple options
You can limit your results to n (n representing the amount of temperatures you want to display)
just a simple query with the limit in the end:
select * from tablename limit 1000
You could use a time/date restraint so you display only the results of the last n days.
Here is an example that uses date functions. The following query selects all rows with a date_col value from within the last 30 days:
mysql> SELECT something FROM tbl_name
-> WHERE DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= date_col;
You could select an average temperature of a certain period, the shorter the period the more results you'll get. You can group by date, yearweek, month etc. to "create the periods"

How to improve wind data SQL query performance

I'm looking for help on how to optimize (if possible) the performance of a SQL query used for reading wind information (see below) by changing the e.g. the database structure, query or something else?
I use a hosted database to store a table with more than 800,000 rows with wind information (speed and direction). New data is added each minute from an anemometer. The database is accessed using a PHP script which creates a web page for plotting the data using Google's visualization API.
The web page takes approximately 15 seconds to load. I've added some time measurements in both the PHP and Javascript part to profile the code and find possible areas for improvements.
One part where I hope to improve is the following query which takes approximately 4 seconds to execute. The purpose of the query is to group 15 minutes of wind speed (min/max/mean) and calculate the mean value and total min/max during this period of measurements.
SELECT AVG(d_mean) AS group_mean,
MAX(d_max) as group_max,
MIN(d_min) AS
group_min,
dir,
FROM_UNIXTIME(MAX(dt),'%Y-%m-%d %H:%i') AS group_dt
FROM (
SELECT #i:=#i+1,
FLOOR(#i/15) AS group_id,
CAST(mean AS DECIMAL(3,1)) AS d_mean,
CAST(min AS DECIMAL(3,1)) AS d_min,
CAST(max AS DECIMAL(3,1)) AS d_max,
dir,
UNIX_TIMESTAMP(STR_TO_DATE(dt, '%Y-%m-%d %H:%i')) AS dt
FROM table, (SELECT #i:=-1) VAR_INIT
ORDER BY id DESC
) AS T
GROUP BY group_id
LIMIT 0, 360
...
$oResult = mysql_query($sSQL);
The table has the following structure:
1 ID int(11) AUTO_INCREMENT
2 mean varchar(5) utf8_general_ci
3 max varchar(5) utf8_general_ci
4 min varchar(5) utf8_general_ci
5 dt varchar(20) utf8_general_ci // Date and time
6 dir varchar(5) utf8_general_ci
The following setup is used:
Database: MariaDB, 5.5.42-MariaDB-1~wheezy
Database client version: libmysql - 5.1.66
PHP version: 5.6
PHP extension: mysqli
I strongly agree with the comments so far -- Cleanse the data as you put it into the table.
Once you have done the cleansing, let's avoid the subquery by doing...
SELECT MIN(dt) as 'Start of 15 mins',
FORMAT(AVG(mean), 1) as 'Avg wind speed',
...
FROM table
GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / 900)
ORDER BY FLOOR(UNIX_TIMESTAMP(dt) / 900);
I don't understand the purpose of the LIMIT. I'll guess that you want to a few days at a time. For that, I recommend you add (after cleansing) between the FROM and the GROUP BY.
WHERE dt >= '2015-04-10'
AND dt < '2015-04-10' + INTERVAL 7 DAY
That would show 7 days, starting '2015-04-10' morning.
In order to handle a table of 800K, you would decidedly need (again, after cleansing):
INDEX(dt)
To cleanse the 800K rows, there are multiple approaches. I suggest creating a new table, copy the data in, test, and eventually swap over. Something like...
CREATE TABLE new (
dt DATETIME,
mean FLOAT,
...
PRIMARY KEY(dt) -- assuming you have only one row per minute?
) ENGINE=InnoDB;
INSERT INTO new (dt, mean, ...)
SELECT str_to_date(...),
mean, -- I suspect that the CAST is not needed
...;
Write the new select and test it.
By now new is missing the newer rows. You can either rebuild it and hope to finish everything in your one minute window, or play some other game. Let us know if you want help there.