I have some data which is very variable in it's update frequency (doesn't change for ages, then changes very often).
I sample it according to a schedule I create and end up with the table below:
periodStart
periodEnd
variable
01/10/2019 00:06
01/10/2019 01:00
0.61
01/10/2019 01:00
01/10/2019 02:00
0.61
01/10/2019 02:00
01/10/2019 03:00
0.61
01/10/2019 03:00
01/10/2019 04:00
0.61
01/10/2019 04:00
01/10/2019 05:00
0.61
01/10/2019 05:00
01/10/2019 06:00
0.61
01/10/2019 06:00
01/10/2019 07:00
0.61
01/10/2019 07:00
01/10/2019 08:00
0.61
01/10/2019 08:00
01/10/2019 09:00
0.59
01/10/2019 09:00
01/10/2019 10:00
0.59
01/10/2019 10:00
01/10/2019 11:00
0.59
01/10/2019 11:00
01/10/2019 12:00
0.58
I am trying to condense the database so stored it the alternative form below. I also need to retain access to the original sampling timestamps (periodStart and periodEnd), so created 'samplingInterval'. Using this you can determine all original periodStart and periodEnd timestamps from the new periodStart and periodEnd.
periodStart
periodEnd
samplingInterval(mins)
variable
01/10/2019 00:06
01/10/2019 01:00
54
0.61
01/10/2019 01:00
01/10/2019 08:00
60
0.61
01/10/2019 08:00
01/10/2019 11:00
60
0.59
01/10/2019 11:00
01/10/2019 12:00
60
0.58
The issue I am having is writing a query to SELECT the table in the original form from it's condensed form. I'm comfortable writing a query using a SQL variable, thinking I could use a loop (in a stored procedure, which is less than ideal) but I just don't know how to output the row without advancing to the next row. Is it possible? Should I be approaching it differently?
You may use a Recursive Query as the following:
With recursive cte as
(
select periodStart st,
adddate(periodStart , interval samplingInterval minute) en
,periodEnd,variable,samplingInterval si
From condensed
Union All
select adddate(st, interval si minute),
adddate(en, interval si minute),
periodEnd,variable,si from cte
where adddate(st , interval si minute)<periodEnd
)
select st as periodStart,en as periodEnd,variable from cte order by st;
See a deom from db-fiddle.
Related
I'm looking for a way to count the hours worked between a given time range.
For example to count from the MySQL data below the hours worked between 22:00 and 06:00.
Using date_start 2022-04-01 21:00:00 and date_end 2022-04-02 08:00:00 the user worked 11 hours total and 8 night hours.
Of course the data could also be something like 2022-04-01 05:00:00 and 2022-04-01 16:00:00 which will then need to output 2 night hours or 2022-04-01 18:00:00and 2022-04-02 03:00:00 which outputs 5 night hours.
MySQL table:
CREATE TABLE `tasks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date_start` datetime DEFAULT NULL,
`date_end` datetime DEFAULT NULL,
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tasks` (`date_start`,`date_end`) VALUES
('2022-04-01 04:00:00', '2022-04-01 16:00:00'), # 2:00 nighthours
('2022-04-02 05:00:00', '2022-04-02 23:30:00'), # 2:30 nighthours
('2022-04-03 06:00:00', '2022-04-03 18:00:00'), # 0:00 nighthours
('2022-04-04 12:00:00', '2022-04-05 00:00:00'), # 2:00 nighthours
('2022-04-05 19:00:00', '2022-04-06 07:00:00'); # 8:00 nighthours
Current MySQL:
# 21600 = 06:00 hours
# 79200 = 22:00 hours
SELECT t.date_start, t.date_end, DATE_FORMAT(TIMEDIFF(
(CASE WHEN DATE(t.date_start) != DATE(t.date_end) AND TIME_TO_SEC(t.date_end) > 21600 THEN DATE_FORMAT(t.date_end, '%Y-%m-%d 06:%i:%s')
WHEN TIME_TO_SEC(t.date_start) < 21600 THEN DATE_FORMAT(t.date_start, '%Y-%m-%d 06:%i:%s')
ELSE t.date_end END),
(CASE WHEN DATE(t.date_start) != DATE(t.date_end) AND TIME_TO_SEC(t.date_start) < 79200 THEN DATE_FORMAT(t.date_start, '%Y-%m-%d 22:%i:%s')
WHEN TIME_TO_SEC(t.date_end) > 79200 THEN DATE_FORMAT(t.date_start, '%Y-%m-%d 22:%i:%s')
WHEN DATE(t.date_start) = DATE(t.date_end) AND TIME_TO_SEC(t.date_end) <= 79200 AND TIME_TO_SEC(t.date_start) >= 21600 THEN t.date_end
ELSE t.date_start END)
), '%H:%i') AS night_time FROM tasks t;
Currently I still have a problem in my current MySQL when the start_date and end_date both start on the same day and both have night hours. for example 2022-04-02 05:00:00 and 2022-04-02 23:30:00 which has 01:00 night hour in start_date and 1:30 hour in end_date (total night: 02:30 hours)
I am not sure if my current MySQL is the best/fastest way to achieve my goal.
Calculating Time Overlaps
You can calculate the amount of time two date ranges overlap using:
MIN( EndDate1, EndDate2 ) - MAX( StartDate1, StartDate2 )
For example if the date ranges are:
Date_Start
Date_End
Night_Shift_Start
Night_Shift_End
2022-04-01 21:00:00
2022-04-02 08:00:00
2022-04-01 22:00:00 **
2022-04-02 06:00:00 **
The result would be 8 hours:
Min( EndDate ) - Max( StartDate )
.... As Unix Timestamps
Time Overlap
2022-04-02 06:00:00 (minus) 2022-04-01 22:00:00
1648875600 - 1648846800 = 28800 seconds
08:00:00 hours
Checking for Multiple Overlaps
Since technically a single shift could have both started and ended during "night hours" (22:00 to 06:00) you need check for overlaps on both sides.
Date_Start
Date_End
Night Hours
...
2022-04-05 05:00:00
2022-04-05 23:30:00
2.5 hours
(1 hour) : 2022-04-05 05:00 to 2022-04-05 06:00 (1.5 hours) : 2022-04-05 22:00 to 2022-04-05 23:30
One approach is using the base start/end times to calculate the previous and upcoming "night hour" periods:
SELECT *
, TIMESTAMP(DATE(date_start) - INTERVAL 1 DAY, '22:00:00') AS current_start
, TIMESTAMP(DATE(date_start), '06:00:00') AS current_end
, TIMESTAMP(DATE(date_start), '22:00:00') AS next_start
, TIMESTAMP(DATE(date_start) + INTERVAL 1 DAY, '06:00:00') AS next_end
FROM tasks
Results:
id
date_start
date_end
current_start
current_end
next_start
next_end
1
2022-04-01 04:00:00
2022-04-01 16:00:00
2022-03-31 22:00:00
2022-04-01 06:00:00
2022-04-01 22:00:00
2022-04-02 06:00:00
2
2022-04-02 05:00:00
2022-04-02 23:30:00
2022-04-01 22:00:00
2022-04-02 06:00:00
2022-04-02 22:00:00
2022-04-03 06:00:00
3
2022-04-03 06:00:00
2022-04-03 18:00:00
2022-04-02 22:00:00
2022-04-03 06:00:00
2022-04-03 22:00:00
2022-04-04 06:00:00
4
2022-04-04 12:00:00
2022-04-05 00:00:00
2022-04-03 22:00:00
2022-04-04 06:00:00
2022-04-04 22:00:00
2022-04-05 06:00:00
5
2022-04-05 19:00:00
2022-04-06 07:00:00
2022-04-04 22:00:00
2022-04-05 06:00:00
2022-04-05 22:00:00
2022-04-06 06:00:00
6
2022-04-01 04:00:00
2022-04-01 16:00:00
2022-03-31 22:00:00
2022-04-01 06:00:00
2022-04-01 22:00:00
2022-04-02 06:00:00
7
2022-04-05 19:00:00
2022-04-06 07:00:00
2022-04-04 22:00:00
2022-04-05 06:00:00
2022-04-05 22:00:00
2022-04-06 06:00:00
8
2022-04-05 05:00:00
2022-04-05 23:30:00
2022-04-04 22:00:00
2022-04-05 06:00:00
2022-04-05 22:00:00
2022-04-06 06:00:00
Total Overlap Time
Once you have the "night hour" ranges, calculate the overlapping time on both sides and add them together to get the total time worked during "night hours"
SELECT id
, date_start
, date_end
, SEC_TO_TIME(
GREATEST(0, start_overlap__in_seconds) -- ignore negative time, which means no overlap
+ GREATEST(0, end_overlap_in_seconds)
) AS time_overall
FROM (
SELECT *
, UNIX_TIMESTAMP(LEAST(date_end, current_end))
- UNIX_TIMESTAMP(GREATEST(date_start, current_start))
AS start_overlap__in_seconds
, UNIX_TIMESTAMP(LEAST(date_end,next_end))
- UNIX_TIMESTAMP(GREATEST(date_start,next_start))
AS end_overlap_in_seconds
FROM (
SELECT *
, TIMESTAMP(DATE(date_start) - INTERVAL 1 DAY, '22:00:00') AS current_start
, TIMESTAMP(DATE(date_start), '06:00:00') AS current_end
, TIMESTAMP(DATE(date_start), '22:00:00') AS next_start
, TIMESTAMP(DATE(date_start) + INTERVAL 1 DAY, '06:00:00') AS next_end
FROM tasks
) tmp
) t
Final Results:
id
date_start
date_end
time_overall
1
2022-04-01 04:00:00
2022-04-01 16:00:00
02:00:00
2
2022-04-02 05:00:00
2022-04-02 23:30:00
02:30:00
3
2022-04-03 06:00:00
2022-04-03 18:00:00
00:00:00
4
2022-04-04 12:00:00
2022-04-05 00:00:00
02:00:00
5
2022-04-05 19:00:00
2022-04-06 07:00:00
08:00:00
6
2022-04-01 04:00:00
2022-04-01 16:00:00
02:00:00
7
2022-04-05 19:00:00
2022-04-06 07:00:00
08:00:00
8
2022-04-05 05:00:00
2022-04-05 23:30:00
02:30:00
db<>fiddle here
I'm trying to count the number of values in a group of matching values. The first query does the count, the second lists the output of the data itself. Please note the count doesn't always match the output of the second query. I'm not sure why this error is occurring or how to fix it. I suspect the problem is with the GROUP BY but after trying many versions of it I don't have a solution.
SELECT CONCAT(date_format(timestamp,'%H'),':',LPAD(MINUTE(
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(timestamp)/300)*300)),2,0)) AS hrmn,
COUNT(CONCAT(date_format(timestamp,'%H'),':',LPAD(MINUTE(
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(timestamp)/300)*300)),2,0))) AS hrmncount
FROM TimeLog
WHERE netID = 3646
GROUP BY CONCAT(date_format(timestamp,'%H'),':',LPAD(MINUTE(
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(timestamp)/300)*300)),2,0))
;
SELECT CONCAT(date_format(timestamp,'%H'),':',LPAD(MINUTE(
FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(timestamp)/300)*300)),2,0)) AS hrmn
FROM TimeLog
WHERE netID = 3646
GROUP BY timestamp
ORDER BY hrmn ASC
;
This is the output of the first query:
hrmn hrmncount
00:50 2
01:00 10
01:05 9
01:10 12
01:15 5
01:20 7
01:25 6
01:30 3
01:35 1
But this is the output from the second, which shows that the first query is counting wrong in the 00:50, 01:10 and 01:20 values.
hrmn
00:50
01:00
01:00
01:00
01:00
01:00
01:00
01:00
01:00
01:00
01:00
01:05
01:05
01:05
01:05
01:05
01:05
01:05
01:05
01:05
01:10
01:10
01:10
01:10
01:10
01:10
01:10
01:10
01:10
01:10
01:10
01:15
01:15
01:15
01:15
01:15
01:20
01:20
01:20
01:20
01:20
01:20
01:20
01:25
01:25
01:25
01:25
01:25
01:25
01:30
01:30
01:30
01:35
I was able to track the root of the problem down to the GROUP BY in the SQL code. It wasn't needed, and by removing it all the records including the duplicate by time were exposed.
I should have seen this earlier but sometimes you get blind to the obvious.
I am looking to estimate a queue length for historical data at the time a record enters the queue. I would like to do this by counting how many of the rows in the data set have an enter time less than the enter time of the record, and an exit time greater than the enter time of the record.
I have created a data set that separates dates and times, which I thought would make it easier to work with, but I am having trouble getting a count of rows for each record in the data set. I have tried doing a simple aggregate count, which works for a single row, but I do not know how to make a query that will do the count for every row in the data set.
For Example I have a data set that looks like this:
RecordID | Enter_Date_Time | Exit_Date_Time
1 2020-09-01 6:00:00 AM 2020-09-02 7:00:00 AM
2 2020-09-01 6:00:00 AM 2020-09-02 8:00:00 AM
3 2020-09-03 4:00:00 AM 2020-09-03 3:00:00 PM
4 2020-09-02 4:00:00 AM 2020-09-04 6:00:00 AM
5 2020-09-02 6:00:00 AM 2020-09-02 8:00:00 AM
6 2020-09-05 6:00:00 AM 2020-09-07 7:00:00 PM
7 2020-09-07 3:00:00 AM 2020-09-07 9:00:00 AM
8 2020-09-07 6:00:00 AM 2020-09-08 8:00:00 AM
9 2020-09-08 6:00:00 AM 2020-09-08 9:00:00 PM
10 2020-09-08 4:00:00 AM 2020-09-09 6:00:00 AM
And I would like it to look like this:
RecordID | Enter_Date_Time | Exit_Date_Time | Queue_Length
1 2020-09-01 1:00:00 AM 2020-09-02 7:00:00 AM 1
2 2020-09-01 6:00:00 AM 2020-09-02 8:00:00 PM 2
3 2020-09-03 4:00:00 AM 2020-09-03 3:00:00 PM 2
4 2020-09-02 4:00:00 AM 2020-09-04 6:00:00 AM 2
5 2020-09-02 6:00:00 AM 2020-09-02 6:00:00 AM 3
6 2020-09-05 6:00:00 AM 2020-09-07 7:00:00 PM 1
7 2020-09-07 3:00:00 AM 2020-09-07 9:00:00 AM 2
8 2020-09-07 6:00:00 AM 2020-09-08 8:00:00 AM 3
9 2020-09-08 6:00:00 AM 2020-09-08 9:00:00 PM 2
10 2020-09-08 4:00:00 AM 2020-09-09 6:00:00 AM 1
My current query looks like this for one single record and manually entering the times for the row:
SELECT COUNT(*)
FROM tbl
WHERE Enter_Date_Time >= '2020-09-02 6:00:00 AM'
AND Exit_Date_Time <= '2020-09-02 6:00:00 AM'
I need a simple operation like this to be done for every row in the data set and have the times in the where clause be the enter time for the record.
Your expertise is greatly appreciated!
One option uses a correlated subquery:
select
t.*,
(
select 1 + count(*)
from mytable t1
where t1.enter_date_time < t.enter_date_time and t1.exit_date_time > t.enter_date_time
) queue_length
from mytable t
The data it is passed to my SSRS report contains total shift output value. it is separated by hour, But to show actual hour output I have to subtract from previous column value:
Current situation:
07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 ...
Line 1 25 55 79 105 130 155 185 205 ...
Expectation:
07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 ...
Line 1 25 30 24 26 25 25 30 30 ...
In the column value for the Group Hour I have:
=Fields!HourTotal.Value
I have table structure in mysql,
table_id no_people booking_date bookingend_time bookingstart_time
14 2 2014-10-31 2014-10-31 13:30:00 2014-10-31 11:00:00
5 4 2014-10-31 2014-10-31 16:30:00 2014-10-31 14:30:00
6 2 2014-10-31 2014-10-31 17:00:00 2014-10-31 16:00:00
2 4 2014-11-06 2014-11-06 12:30:00 2014-11-06 10:00:00
2 4 2014-10-31 2014-10-31 16:00:00 2014-10-31 14:00:00
3 4 2014-11-01 2014-11-01 09:00:00 2014-11-01 07:30:00
6 2 2014-11-01 2014-11-01 10:00:00 2014-11-01 07:30:00
2 4 2014-11-03 2014-11-03 10:30:00 2014-11-03 08:30:00
5 4 2014-11-04 2014-11-04 10:30:00 2014-11-04 08:30:00
3 4 2014-11-05 2014-11-05 09:30:00 2014-11-05 07:30:00
14 2 2014-11-05 2014-11-05 09:30:00 2014-11-05 07:30:00
I want to retrieve table_id data with 30 minutes of interval between start and end time.
Ex:
if i give booking start time 10:30 and end time 12:30 i should get 14 as row..
Similarly it should check all rows and return between two times ..
My query so far
SELECT `table_id` FROM `booking` WHERE bookingstart_time>='2014-10-31 10:30:00' AND bookingend_time<='2014-10-31 11:30:00'
Step 1: expand the input time frame by 30 minutes before and 30 minutes after. DATE_ADD() and DATE_SUB() can do that:
DATE_SUB(_input_start_date_here_, INTERVAL 30 MINUTE)
Step 2: rethink your problem in terms of start and end times. Here are the possible cases:
if the booking started during the (expanded) period, then you want this booking in your result
or if the booking started before the period, then you want this booking unless it also ended before the period
on the other hand, if the booking started after the period, then you do not want this booking
The first situation above could be expressed like this:
WHERE bookingstart_time >= DATE_SUB(_input_start_date_here_, INTERVAL 30 MINUTE)
AND bookingstart_time <= DATE_ADD(_input_end_date_here_, INTERVAL 30 MINUTE)
The second condition is left as an exercise. You can also rewrite the above with a more elegant BETWEEN operator.
SELECT restaurant_table FROM rest_restaurantbooking WHERE TIMESTAMPDIFF(SECOND, bookingstart_time, bookingend_time) > 1800.
FOR REFERENCE: HERE