Computing average values over sections of date/time - mysql

Problem:
I have a database of sensor readings with a timestamp for the time the sensor was read. Basically it looks like this:
Sensor | Timestamp | Value
Now I want to make a graph out of this data and I want to make several different graphs. Say I want one for the last day, one for the last week and one for the last month. The resolution of each graph will be different so for the day-graph the resolution would be 1 minute. For the week graph it would be one hour and for the month graph it would be one day, or quarter of a day.
So I would like an output that is the average of each resolution (eg. Day = Average over the minute, Week = Average over the hour and so on)
Ex:
Sensor | Start | End | Average
How do I do this easily and quickly in mySQL? I suspect it invoves creating a temporary table or sorts and joining the sensor data with that to get the average values of the sensor? But my knowledge of mySQL is limited at best.
Is there a really clever way to do this?

SELECT DAY(Timestamp), HOUR(Timestamp), MINUTE(Timestamp), AVG(value)
FROM mytable
GROUP BY
DAY(Timestamp), HOUR(Timestamp), MINUTE(Timestamp) WITH ROLLUP
WITH ROLLUP clause here produces extra rows with averages for each HOUR and DAY, like this:
SELECT DAY(ts), HOUR(ts), MINUTE(ts), COUNT(*)
FROM (
SELECT CAST('2009-06-02 20:00:00' AS DATETIME) AS ts
UNION ALL
SELECT CAST('2009-06-02 20:30:00' AS DATETIME) AS ts
UNION ALL
SELECT CAST('2009-06-02 21:30:00' AS DATETIME) AS ts
UNION ALL
SELECT CAST('2009-06-03 21:30:00' AS DATETIME) AS ts
) q
GROUP BY
DAY(ts), HOUR(ts), MINUTE(ts) WITH ROLLUP
2, 20, 0, 1
2, 20, 30, 1
2, 20, NULL, 2
2, 21, 30, 1
2, 21, NULL, 1
2, NULL, NULL, 3
3, 21, 30, 1
3, 21, NULL, 1
3, NULL, NULL, 1
NULL, NULL, NULL, 4
2, 20, NULL, 2 here means that COUNT(*) is 2 for DAY = 2, HOUR = 20 and all minutes.

Not quite the result table you wanted, but here's a starter for doing a 1 minute resolution:
SELECT sensor,minute(timestamp),avg(value)
FROM table
WHERE <time period specifier limits to a single hour>
GROUP BY sensor, minute(timestamp)

I've used code very similar to this (untested, but it's taking from working code)
set the variables:
$seconds = 3600;
$start = mktime(...); // say 2 hrs ago
$end = .... // 1 hour after $start
then run the query
SELECT MAX(`when`) AS top_When, MIN(`when`) AS low_When,
ROUND(AVG(sensor)) AS Avg_S,
(MAX(`when`) - MIN(`when`)) AS dur, /* the duration in seconds of the actual period */
((floor(UNIX_TIMESTAMP(`when`) / $seconds)) * $seconds) as Epoch
FROM `sensor_stats`
WHERE `when` >= '$start' AND `when` <= '$end' and duration=30
GROUP BY Epoch/*((floor(UNIX_TIMESTAMP(`when`) / $seconds)) * $seconds)*/
The advantage of this is that you can have whatever time periods you want - and not even required to have them on 'round numbers', like a complete clock-hour (even a clock-minute, 0-59).

Related

How to SELECT all rows within a certain date/time range with a certain timestamp step size in MySQL?

I have a table that contains sensor data with a column timestamp that holds the unix timestamp of the time the sensor measurement has been taken.
Now I would like to SELECT all measurements within a certain date/time range with a specific time step.
I figured the first part out myself like you can see in my posted code snippet below.
// With $date_start and $date_stop in the format: '2010-10-01 12:00:00'
$result = mysqli_query($connection, "SELECT sensor_1
FROM sensor_table
WHERE timestamp >= UNIX_TIMESTAMP($date_start)
AND timestamp < UNIX_TIMESTAMP($date_stop)
ORDER BY timestamp");
Now is there a convenient way in MySQL to include a time step size into the same SELECT query?
My table contains thousands of measurements over months with one measurement taken every 5 seconds.
Now let's say I would like to SELECT measurements in between 2010-10-01 12:00:00 and 2010-10-02 12:00:00 but in this date/time range only SELECT one measurement every 10 minutes? (as my table contains measurements taken every 5 seconds).
Any smart ideas how to solve this in a single query?
(also other ideas are very welcome :))
Since you take one measurement every 5 seconds, the difference between $date_start and the first matching measurement cannot be greater than 4. We then take one entry every 600 seconds (allowing for some discrepancy from clock to clock...)
SELECT sensor_1
FROM sensor_table
WHERE timestamp >= UNIX_TIMESTAMP($date_start)
AND
timestamp < UNIX_TIMESTAMP($date_stop)
AND
((timestamp - UNIX_TIMESTAMP($date_start)) % 600) BETWEEN 0 AND 4
ORDER BY timestamp;
It is not elegant, but you can do:
SELECT s.sensor_1
FROM sensor_table s
WHERE s.timestamp >= UNIX_TIMESTAMP($date_start) AND
s.timestamp < UNIX_TIMESTAMP($date_stop) AND
s.timestamp = (SELECT MIN(s2.timestamp)
FROM sensor_table s2
WHERE s2.timestamp >= 60 * 10 * FLOOR(UNIX_TIMESTAMP(s.timestamp) / (60 * 10)) AND
s2.timestamp < s2.timestamp >= 60 * 10 * (1 + FLOOR(UNIX_TIMESTAMP(s.timestamp) / (60 * 10)))
)
ORDER BY timestamp;
This selects the first in each 10 minute period.
I think that you could use a simple cursor in plSQL
CREATE TABLE StoreValuesId
(
valueId int primary key;
)
CREATE OR REPLACE procedure_store[date_start date,date_stop date]
DECLARE date_startUpdated date , date_stopUpdated date , date_diff TIME(7) = '00:10:00'
IS
BEGIN
SELECT date_start INTO date_startUpdated;
SELECT date_stop INTO date_stopUpdated;
IF timestamp BETWEEN date_start and date_stop then
INSERT INTO StoreValuesId values(timestamp)
date_startUpdated=DATEADD(SECOND, DATEDIFF(SECOND, 0, date_diff), date_startUpdated);
date_stopUpdated=DATEADD(SECOND, DATEDIFF(SECOND, 0, date_diff), date_stopUpdated);
END IF
COMMIT;
END
Then again the syntax might be wrong but I hope you'll get the idea (haven't played with sql in a while)

Re-selecting from a table based on query results

my sql knowledge is fairly basic and I would be grateful for some advice. I have a table with columns like:
date, time, readings, .... comments1, comments2
What I would like to do is filter the table to show the results when comments1 is equal to a string, which I can achieve. The tricky bit is I then want to find the readings when the time is between 5 and 7 hours after the times returned/identified by the initial query (comments1 = string"). Is there a way to do this with and what would be the best strategy?
Thank you.
You should really store date and time in a single column, otherwise midnight boundaries are extremely difficult to select across. My example assumes your "date" column is a datetime type that also stores the timestamp.
I believe something like this is what you're looking for:
WITH CommentTime AS (
SELECT TOP 1 date
FROM tblRecords
WHERE comments1 = 'The comment to find'
)
SELECT *
FROM tblRecords
WHERE date >= DATEADD(hour, 5, (SELECT date FROM CommentTime))
AND date < DATEADD(hour, 7, (SELECT date FROM CommentTime))
New Answer:
(Reworking Dans answer to instead use a variable)
DECLARE #CommentTime AS DateTime = (SELECT TOP 1 [date] FROM tblRecords WHERE comments1 = 'string')
SELECT * FROM tblRecords
WHERE [date] >= DATEADD(HOUR, 5, #CommentTime)
AND [date] < DATEADD(HOUR, 7, #CommentTime)

MySQL - Count only unique instances between specific dates

I've been looking at several other SO questions but I could not make out a solution from these. First, the description, then what I'm missing from the other threads. (Heads up: I'm very well aware of the non-normalised structure of our database, which is something I have addressed in meetings before but this is what we have and what I have to work with.)
Background description
We have a machine that manufactures products in 25 positions. These products' production data is being logged in a table that among other things logs current and voltage for every position. This is only logged when the machine is actually producing products (i.e. has a product in the machine). The time where no product is present, nothing is being logged.
This machine can run in two different production modes: full production and R&D production. Full production means that products are being inserted continuously so that every instance has a product at all times (i.e. 25 products are present in the machine at all times). The second mode, R&D production, only produces one product at a time (i.e. one product enters the machine, goes through the 25 instances one by one and when this one is finished, the second product enters the machine).
To clarify: every position logs data once every second whenever a product is present, which means 25 instances per second when full production is running. When R&D mode is running, position 1 will have ~20 instances for 20 consecutive seconds, position 2 will have ~20 instances for the next 20 consecutive seconds and so on.
Table structure
Productiondata:
id (autoincrement)
productID
position
time (timestamp for logged data)
current (amperes)
voltage (volts)
Question
We want to calculate the uptime of the machine, but we want to separate the uptime for production mode and R&D mode, and we want to separate this data on a weekly basis.
Guessed solution
Since we have instances logged every second I can count the amount of DISTINCT instances of time values we have in the table to find out the total uptime for both production and R&D mode. To find the R&D mode, I can safely say that whenever there is a time instance that has only one entry, I'm running in R&D mode (production mode would have 25 instances).
Progress so far
I have the following query which sums up all distinct instances to find both production and R&D mode:
SELECT YEARWEEK(time) AS YWeek, COUNT(DISTINCT time) AS Time_Seconds, ROUND(COUNT(DISTINCT time)/3600, 1) AS Time_Hours
FROM Database.productiondata
WHERE YEARWEEK(time) >= YEARWEEK(curdate()) - 21
GROUP BY YWeek;
This query finds out how many DISTINCT time instances there are in the table and counts the number and groups that by the week.
Problem
The above query counts the amount of instances that exist in the table, but I want to find ONLY the UNIQUE instances. Basically, I'm trying to find something like IF count(time) = 1, then count that instance, IF count(time) > 1 then don't count it at all (DISTINCT still counts this).
I looked at several other SO threads, but almost all explain how to find unique values with DISTINCT, which only accomplishes half of what I'm looking for. The closest I got was this which uses a HAVING clause. I'm currently stuck at the following:
SELECT YEARWEEK(time) as YWeek, COUNT(Distinct time) As Time_Seconds, ROUND(COUNT(Distinct time)/3600, 1) As Time_Hours
FROM
(SELECT * FROM Database.productiondata
WHERE time > '2014-01-01 00:00:00'
GROUP BY time
HAVING count(time) = 1) as temptime
GROUP BY YWeek
ORDER BY YWeek;
The problem here is that we have a GROUP BY time inside the nested select clause which takes forever (~5 million rows only for this year so I can understand that). I mean, syntactically I think that this is correct but it takes forever to exectue. Even EXPLAIN for this times out.
And that is where I am. Is this the correct approach or is there any other way that is smarter/requires less query time/avoids the group by time clause?
EDIT: As a sample, we have this table (apologies for formatting, don't know how to make a table format here on SO)
id position time
1 1 1
2 2 1
3 5 1
4 19 1
... ... ...
25 7 1
26 3 2
27 6 2
... ... ...
This table shows how it looks like when there is a production run going on. As you can see, there is no general structure for which position gets the first entry when logging the data in the table; what happens is that the 25 positions gets logged during every second and the data is then added to the table depending on how fast the PLC sends the data for every position. The following table shows how the table looks like when it runs in research mode.
id position time
245 1 1
246 1 2
247 1 3
... ... ...
269 1 25
270 2 26
271 2 27
... ... ...
Since all the data is consolidated into one single table, we want to find out how many instances there are when COUNT(time) is exactly equal to 1, or we could look for every instance when COUNT(time) is strictly larger than 1.
EDIT2: As a reply to Alan, the suggestion gives me
YWeek Time_Seconds Time_Hours
201352 1 0.0
201352 1 0.0
201352 1 0.0
... ... ...
201352 1 0.0 (1000 row limit)
Whereas my desired output is
Yweek Time_Seconds Time_Hours
201352 2146 35.8
201401 5789 96.5
... ... ...
201419 8924 148.7
EDIT3: I have gathered the tries and the results so far here with a description in gray above the queries.
You might achieve better results by eliminating your sub select:
SELECT YEARWEEK(time) as YWeek,
COUNT(time) As Time_Seconds,
ROUND(COUNT(time)/3600, 1) As Time_Hours
FROM Database.productiondata
WHERE time > '2014-01-01 00:00:00'
GROUP BY YWeek
HAVING count(time) = 1)
ORDER BY YWeek;
I'm assuming time has an index on it, but if it does not you could expect a significant improvement in performance by adding one.
UPDATE:
Per the recently added sample data, I'm not sure your approach is correct. The time column appears to be an INT representing seconds while you're treating it as a DATETIME with YEARWEEK. Below I have a working example in SQL that does exactly what you asked IF time is actually a DATETIME column:
DECLARE #table TABLE
(
id INT ,
[position] INT ,
[time] DATETIME
)
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -1, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -3, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -3, GETDATE()) )
SELECT CAST(DATEPART(year, [time]) AS VARCHAR)
+ CAST(DATEPART(week, [time]) AS VARCHAR) AS YWeek ,
COUNT([time]) AS Time_Seconds ,
ROUND(COUNT([time]) / 3600, 1) AS Time_Hours
FROM #table
WHERE [time] > '2014-01-01 00:00:00'
GROUP BY DATEPART(year, [time]) ,
DATEPART(week, [time])
HAVING COUNT([time]) > 0
ORDER BY YWeek;
SELECT pd1.*
FROM Database.productiondata pd1
LEFT JOIN Database.productiondata pd2 ON pd1.time=pd2.time AND pd1.id<pd2.id
WHERE pd1.time > '2014-01-01 00:00:00' AND pd2.time > '2014-01-01 00:00:00'
AND pd2.id IS NULL
You can LEFT JOIN to the same table and leave only the rows with no related
UPDATE The query works using the SQL fiddle
SELECT pd1.* From productiondata pd1
left Join productiondata pd2
ON pd1.time = pd2.time and pd1.id < pd2.id
Where pd1.time > '2014-01-01 00:00:00' and pd2.id IS NULL;

MySQL Number of Days inside a DateRange, inside a month (Booking Table)

I'm attempting to create a report for an accommodation service with the following information:
Number of Bookings (Easy, use the COUNT function)
Revenue Amount (Kind of easy).
Number of Room nights. (Rather Hard it seems)
Broken down into each month of the year.
Limitations - I'm currently using PHP/MySQL to create this report.
I'm pulling the data out of the booking system 1 month at a time, then using an ETL process to put it into MySQL.
Because of this, I have duplicate records, when a booking splits across the end of the Month. (eg BookingID = 9216 below - This is because for Revenue purposes we need to split the percentage of the revenue into the corresponding month).
The Question.
How do I write some SQL that will:
Calculate the number of room nights that was booked into a Property and Group it by the month. Taking into account that if a booking spans across the end of the month, that the room nights that are inside of the same month, as the checkin are counted towards that month, and room nights which the same month as checkout are in the same month as checkout.
At first I used this: DATEDIFF(Checkout, Checkin).
But that lead to one month having 48 room nights in a 31 day month. (because a) it counted 1 booking as 11 nights, even through it was split across the 2 months, and b) because it appears twice).
Then once I have the statement I need to integrate it back into my CrossTab SQL for the entire year.
Some resources that I have found, but can't seem to make work (MySql Query- Date Range within a Date Range & php mysql double date range)
Here is a Sample of the Table: (There are ~100,000 rows of similar data).
CREATE TABLE IF NOT EXISTS `bookingdata` (
`idBookingData` int(11) NOT NULL AUTO_INCREMENT,
`PropertyID` int(10) NOT NULL,
`Checkin` date DEFAULT NULL,
`Checkout` date DEFAULT NULL,
`Rent` decimal(10,2) DEFAULT NULL,
`BookingID` int(11) DEFAULT NULL,
PRIMARY KEY (`idBookingData`),
UNIQUE KEY `idBookingData_UNIQUE` (`idBookingData`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10472 ;
INSERT INTO `bookingdata` (`idBookingData`, `PropertyID`, `Checkin`, `Checkout`, `Rent`, `BookingID`) VALUES
(5148, 2, '2011-07-02', '2011-07-05', 1105.00, 10612),
(5149, 2, '2011-07-05', '2011-07-13', 2155.00, 10184),
(5151, 2, '2011-07-14', '2011-07-17', 1105.00, 11102),
(5153, 2, '2011-07-22', '2011-07-24', 930.00, 14256),
(5154, 2, '2011-07-24', '2011-08-04', 1832.73, 9216),
(5907, 2, '2011-07-24', '2011-08-04', 687.27, 9216),
(5910, 2, '2011-08-11', '2011-08-14', 1140.00, 13633),
(5911, 2, '2011-08-15', '2011-08-16', 380.00, 17770),
(5915, 2, '2011-08-25', '2011-08-29', 1350.00, 17719),
(5916, 2, '2011-08-30', '2011-09-01', 740.00, 16813);
You're on the right lines. You need to join your query with a table of the months for which you want data, which can either be permanent or (as shown in my example below) created dynamically in a UNION subquery:
SELECT YEAR(month.d),
MONTHNAME(month.d),
SUM(1 + DATEDIFF( -- add 1 because start&finish on same day is still 1 day
LEAST(Checkout, LAST_DAY(month.d)), GREATEST(Checkin, month.d)
)) AS days
FROM bookingdata
RIGHT JOIN (
SELECT 20110101 AS d
UNION ALL SELECT 20110201 UNION ALL SELECT 20110301
UNION ALL SELECT 20110401 UNION ALL SELECT 20110501
UNION ALL SELECT 20110601 UNION ALL SELECT 20110701
UNION ALL SELECT 20110801 UNION ALL SELECT 20110901
UNION ALL SELECT 20111001 UNION ALL SELECT 20111101
UNION ALL SELECT 20111201
) AS month ON
Checkin <= LAST_DAY(month.d)
AND month.d <= Checkout
GROUP BY month.d
See it on sqlfiddle.

two date columns and one date range , typical query?

I have a table
tbl_charge
id hotel_id start_date end_date charge_per_day ( in $)
1 6 2012-02-15 2010-02-15 20
2 6 2012-02-16 2010-02-18 30
4 6 2012-02-20 2010-02-25 50
Note: if any date is not in the table then we set 25$ for each days (i.e. default charge)
now if someone wants to book a hotel from 2012-02-15 to 2012-02-22 , then I want to calculate the total charges for dates
Date : 15+16+17+18+19+20+21+22
Charge : 20+30+30+30+25+50+50+50 = 285$
what i have done so far:
this query returns all rows successfully
SELECT * FROM `tbl_charge` WHERE
start_date BETWEEN '2012-02-15' AND '2012-02-22' OR
end_date BETWEEN '2012-02-15' AND '2012-02-22' OR
( start_date <'2012-02-15' AND end_date > '2012-02-22')
HAVING property_id=6
it returns all necessary rows but how do I sum the charges??
is ther any way to count days between given date range like last row is 20 -25 but i want only upto 22 then it return 3 days and we multiply charges by 3
is it good to create procedure for this or use simple query
I think this will do the trick:
select sum(DayDifference * charge_per_day) +
(RealDayDifference - sum(DayDifference)) * 25 as TotalPerPeriod
from (
select charge_per_day, datediff(
least(end_date, '2012-02-22'),
greatest(start_date, '2012-02-15')) + 1 as DayDifference,
datediff('2012-02-22', '2012-02-15') + 1 as RealDayDifference
from t1
where
((start_date between '2012-02-15' and '2012-02-22') or
(end_date between '2012-02-15' and '2012-02-22') or
(start_date < '2012-02-15' and end_date > '2012-02-22'))
and hotel_id=6
) S1
I've had to solve this same issue previously and it's a fun one, however since then I've learnt some better methods. At the time I believe I created a procedure or function to loop over the requested dates and return a price.
To return the required rows, you can simply select using the upper and lower limits. You can do a datediff within the select criteria to return the number of iterations of each to apply.
If all you are ultimately looking for is a single price I would advise combining this logic into a function
I've assumed a second table, tbh_hotel with id (int PK == hotel_id) and default_charge (int) with row (id=6,default_charge=20)
Further assumptions are that where your dates are "2010" you meant them to be "2012", and that this is for someone that is checking in in the 15th, and checking out on the 22nd (and so needs a hotel for 15th, 16th, 17th, 18th, 19th, 20th, 21st, 7 nights). I will also assume that you have logic in place that prevents the date ranges overlapping, so that there are no 2 rows in tbl_charge which match the date 14th Feb 2012 (for example)
So to get this started, a query to select the applicable rows
SELECT
*
FROM tbl_charge AS c
WHERE
(
c.end_date >= '2012-02-15'
OR
c.start_date < '2012-02-22'
)
This is pretty much what you have already, so now will add in some more fields to get the information for how many days each rule is applied for.
SET #StartDate = '2012-02-15';
SET #EndDate = SUBDATE('2012-02-22',1);
SELECT
c.id,
c.start_date,
c.end_date,
c.charge_per_day,
DATEDIFF(IF(c.end_date>#EndDate,#EndDate,c.end_date),SUBDATE(IF(c.start_date<#StartDate,#StartDate,c.start_date),1)) AS quantityOfThisRate
FROM tbl_charge AS c
WHERE c.end_date >=#StartDate OR c.start_date < #EndDate
I am SUBDATEing the end date, because if you check out on the 22nd, your final checkin date is the 21st. I am SUBDATING the start date on each DATEDIFF because if you are staying on 15th -> 16th, the subdate on END DATE makes this 15th-15th, and so this SUBDATE makes it get 14th-15th to return the correct value of 1. Output now looks a bit like this
id start_date end_date price quantityAtThisRate
1 2012-02-10 2012-02-15 20 1
2 2012-02-16 2012-02-18 30 3
3 2012-02-20 2012-02-29 50 2
So moving on I'll put this into a subquery and combine tbl_hotel to get a default charge
SET #StartDate = '2012-02-15';
SET #EndDate = SUBDATE('2012-02-22',1);
SET #NumberOfNights = DATEDIFF(ADDDATE(#EndDate,1),#StartDate);
SET #HotelID = 6;
SELECT
SUM(specificDates.charge_per_day*specificDates.quantityAtThisRate) AS specificCharges,
#NumberOfNights-SUM(specificDates.quantityAtThisRate) AS daysAtDefault,
h.default_charge * (#NumberOfNights-SUM(specificDates.quantityAtThisRate)) AS defaultCharges
FROM tbl_hotel AS h
INNER JOIN
(
SELECT
c.charge_per_day,
DATEDIFF(IF(c.end_date>#EndDate,#EndDate,c.end_date),SUBDATE(IF(c.start_date<#StartDate,#StartDate,c.start_date),1)) AS quantityAtThisRate
FROM tbl_charge AS c
WHERE (c.end_date >=#StartDate OR c.start_date < #EndDate) AND c.hotel_id = #HotelID
) AS specificDates
WHERE h.id = #HotelID
Realistically a single query will get rather .... complex so I'd settle at a stored procedure relying on the logic above (as if there are no specific rules the above query will return null due to the inner join)
Hope this is of help