How to calculate time outside of work hours - mysql

This seemed pretty straight forward initially, but has proved to be a real headache. Below is my table, data, expected output and SQL Fiddle of where I have got to in solving my problem.
Schema & Data:
CREATE TABLE IF NOT EXISTS `meetings` (
`id` int(6) unsigned NOT NULL,
`user_id` int(6) NOT NULL,
`start_time` DATETIME,
`end_time` DATETIME,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `meetings` (`id`, `user_id`, `start_time`, `end_time`) VALUES
('0', '1', '2018-05-09 04:30:00', '2018-05-09 17:30:00'),
('1', '1', '2018-05-10 06:30:00', '2018-05-10 17:30:00'),
('2', '1', '2018-05-10 12:30:00', '2018-05-10 16:00:00'),
('3', '1', '2018-05-11 17:00:00', '2018-05-12 11:00:00'),
('4', '2', '2018-05-11 07:00:00', '2018-05-12 11:00:00'),
('5', '2', '2018-05-11 04:30:00', '2018-05-11 15:00:00');
What I would like to get from the above is total time worked outside of 09:00 to 17:00, grouped by day and user_id. So the result from the above data would look like:
date | user_id | overtime_hours
---------------------------------------
2018-05-09 | 1 | 05:00:00
2018-05-10 | 1 | 03:00:00
2018-05-11 | 1 | 07:00:00
2018-05-12 | 1 | 09:00:00
2018-05-11 | 2 | 13:30:00
2018-05-12 | 2 | 09:00:00
As you can see the expected results are only summing overtime for each day and user for those hours outside of 9 to 5.
Below is the query and SQL Fiddle of where I am. The main issue comes when the start and ends straddle midnight (or multiple midnight's)
SELECT
SEC_TO_TIME(SUM(TIME_TO_SEC(TIME(end_time)) - TIME_TO_SEC(TIME(start_time)))), user_id, DATE(start_time)
FROM
(SELECT
start_time, CASE WHEN TIME(end_time) > '09:00:00' THEN DATE_ADD(DATE(end_time), INTERVAL 9 HOUR) ELSE end_time END AS end_time, user_id
FROM
meetings
WHERE
TIME(start_time) < '09:00:00'
UNION
SELECT
CASE WHEN TIME(start_time) < '17:00:00' THEN DATE_ADD(DATE(start_time), INTERVAL 17 HOUR) ELSE start_time END AS start_time, end_time, user_id
FROM
meetings
WHERE
TIME(end_time) > '17:00:00') AS clamped_times
GROUP BY user_id, DATE(start_time)
http://sqlfiddle.com/#!9/77bc85/1
Pastebin for when the fiddle decides to flake: https://pastebin.com/1YvLaKbT
As you can see the query grabs the easy overtime with start and ends on the same day, but does not work with the multiple day ones.

If the meeting is going to span across n days, and you are looking to compute "work hours" daywise within a particular meeting; it rings a bell, that we can use a number generator table.
(SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen
We will use the number generator table to consider separate rows for the individual dates ranging from the start_time to end_time. For this case, I have assumed that it is unlikely that meeting will span across more than 2 days. If it happens to span more number of days, you can easily extend the range by adding more UNION ALL SELECT 3 .. to the ngen Derived Table.
Based on this, we will determine "start time" and "end time" to consider for a specific "work date" in an ongoing meeting. This calculation is being done in a Derived Table, for a grouping of user_id and "work date".
Afterwards, we can SUM() up "working hours" per day of a user using some maths. Please find the query below. I have added extensive comments to it; do let me know if anything is still unclear.
Demo on DB Fiddle
Query #1
SELECT
dt.user_id,
dt.wd AS date,
SEC_TO_TIME(SUM(
CASE
/*When both start & end times are less than 9am OR more than 5pm*/
WHEN (st < TIME_TO_SEC('09:00:00') AND et < TIME_TO_SEC('09:00:00')) OR
(st > TIME_TO_SEC('17:00:00') AND et > TIME_TO_SEC('17:00:00'))
THEN et - st /* straightforward difference between the two times */
/* atleast one of the times is in 9am-5pm block, OR,
start < 9 am and end > 5pm.
Math of this can be worked out based on signum function */
ELSE GREATEST(0, TIME_TO_SEC('09:00:00') - st) +
GREATEST(0, et - TIME_TO_SEC('17:00:00'))
END
)) AS working_hours
FROM
(
SELECT
m.user_id,
/* Specific work date */
DATE(m.start_time) + INTERVAL ngen.gap DAY AS wd,
/* Start time to consider for this work date */
/* If the work date is on the same date as the actual start time
we consider this time */
CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.start_time)
THEN TIME_TO_SEC(TIME(m.start_time))
/* We are on the days after the start day */
ELSE 0 /* 0 seconds (start of the day) */
END AS st,
/* End time to consider for this work date */
/* If the work date is on the same date as the actual end time
we consider this time */
CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.end_time)
THEN TIME_TO_SEC(TIME(m.end_time))
/* More days to come still for this meeting,
we consider the end of this day as end time */
ELSE 86400 /* 24 hours * 3600 seconds (end of the day) */
END AS et
FROM meetings AS m
JOIN (SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen
ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)
) AS dt
GROUP BY dt.user_id, dt.wd;
Result
| user_id | date | working_hours |
| ------- | ---------- | ------------- |
| 1 | 2018-05-09 | 05:00:00 |
| 1 | 2018-05-10 | 03:00:00 |
| 1 | 2018-05-11 | 07:00:00 |
| 1 | 2018-05-12 | 09:00:00 |
| 2 | 2018-05-11 | 13:30:00 |
| 2 | 2018-05-12 | 09:00:00 |
Further Optimization Possibilities:
This query can do away with the usage of subquery (Derived Table) very easily. I just wrote it in this way, to convey the mathematics and process in a followable manner. However, you can easily merge the two SELECT blocks to a single query.
Maybe, more optimization possible in usage of Date/Time functions, as well as further simplification of mathematics in it. Function details available at: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
Some date calculations are done multiple times, e.g., DATE(m.start_time) + INTERVAL ngen.gap DAY. To avoid recalculation, we can utilize User-defined variables, which will also make the query less verbose.
Make this JOIN condition sargable: JOIN .. ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)

Related

Get the last row that was left outside of a timeframed SQL query

Is there a way to write a SQL query with a time-frame condition to include the latest row that is outside of the time-frame (besides solutions like counting the size of the result set and querying for the size+1, etc.) ?
Lets say I have a table A, which holds timestamped value changes.
I want to query for all the changes in the last 24 hours (assume that the time when the query was ran on 2019-08-08 00:00:00) - how do I include the last row that isn't included in 24-hours interval, i.e., row #2 (assuming I don't know when it occurred):
CREATE TABLE A(`timeframe` datetime, `value` int);
INSERT INTO A
(`timeframe`, `value`)
VALUES
('2019-06-08 18:00:00', 10),
('2019-06-09 02:00:00', 20),
('2019-07-08 17:00:00', 50),
('2019-07-08 19:00:00', 10),
('2019-07-09 01:35:00', 30),
('2019-07-09 02:00:00', 40);
| timestamp | value |
|------------------|-------|
| 2019-08-06 15:00 | 10 |
| 2019-08-06 23:00 | 20 |
| 2019-08-07 14:00 | 50 |
| 2019-08-07 16:00 | 10 |
| 2019-08-07 22:35 | 30 |
| 2019-08-07 23:00 | 40 |
SELECT value
, time
FROM A
WHERE time >= DATE_SUB(NOW(), INTERVAL 1 DAY)
The result set should include the value changes in last day (rows #3-#6) and the latest row outside the timeframe - row #2.
I'm looking for a generic solution, as the time-frame can changed.
Please try this below script where first part will return records from last 24 hours and the second script will return the latest row before 24 hours. Union all of result should give your expected output-
SELECT value,timeframe
FROM A
WHERE timeframe >= DATE_SUB(NOW(), INTERVAL 1 DAY)
UNION ALL
(
SELECT value,timeframe
FROM A
WHERE timeframe < DATE_SUB(NOW(), INTERVAL 1 DAY)
ORDER BY timeframe DESC
LIMIT 1
);
Assuming you have older rows you can do:
SELECT value, time
FROM A
WHERE a.time >= (SELECT MAX(a2.time)
FROM A a2
WHERE a2.time < DATE_SUB(NOW(), INTERVAL 1 DAY)
);
If there are no older rows, you have a bit of a challenge dealing with NULL. This is one place where ALL is handy:
SELECT value, time
FROM A
WHERE a.time >= ALL (SELECT a2.time
FROM A a2
WHERE a2.time < DATE_SUB(NOW(), INTERVAL 1 DAY)
);

Count active users using login timestamp in MySQL

While preparing for an interview, I have come across an SQL question and I hope to get some insight as to how to better answer it.
Given timestamps, userid, how to determine the number of users who are active everyday in a week?
There's very little to it, but that's the question in front of me.
I'm going to demonstrate such an idea based on what makes most sense to me and the way I would reply if the question was presented same as here:
First, let's assume a data set as such, we will name the table logins:
+---------+---------------------+
| user_id | login_timestamp |
+---------+---------------------+
| 1 | 2015-09-29 14:05:05 |
| 2 | 2015-09-29 14:05:08 |
| 1 | 2015-09-29 14:05:12 |
| 4 | 2015-09-22 14:05:18 |
| ... | ... |
+---------+---------------------+
There may be other columns, but we don't mind those.
First of all we should determine the borders of that week, for that we can use ADDDATE(). Combined with the idea that today's date-today's week-day (MySQL's DAYOFWEEK()), is sunday's date.
For instance: If today is Wednesday the 10th, Wed - 3 = Sun, thus 10 - 3 = 7, and we can expect Sunday to be the 7th.
We can get WeekStart and WeekEnd timestamps this way:
SELECT
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") WeekStart,
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59") WeekEnd;
Note: in PostgreSQL there's a DATE_TRUNC() function which returns the beginning of a specified time unit, given a date, such as week start, month, hour, and so on. But that's not available in MySQL.
Next, let's utilize WeekStart and weekEnd in order to clice our data set, in this example I'll just show how to filter, using hard coded dates:
SELECT *
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'
This should return our data set sliced, with only relevant results:
+---------+---------------------+
| user_id | login_timestamp |
+---------+---------------------+
| 2 | 2015-09-29 14:05:08 |
| 1 | 2015-09-29 14:05:12 |
+---------+---------------------+
We can then reduce our result set to only the user_ids, and filter out duplicates. then count, this way:
SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'
DISTINCT will filter out duplicates, and count will return just the amount.
Combined, this becomes:
SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp
BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00")
AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")
Replace CURDATE() with any timestamp in order to get that week's user login count.
But I need to break this down to days, I hear you cry. Of course! and this is how:
First, let's translate our over-informative timestamps to just the date data. We add DISTINCT because we don't mind the same user logging in twice the same day. we count users, not logins, right? (note we step back here):
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d")
FROM `logins`
This yields:
+---------+-----------------+
| user_id | login_timestamp |
+---------+-----------------+
| 1 | 2015-09-29 |
| 2 | 2015-09-29 |
| 4 | 2015-09-22 |
| ... | ... |
+---------+-----------------+
This query, we will wrap with a second, in order to count appearances of every date:
SELECT `login_timestamp`, count(*) AS 'count'
FROM (SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp` FROM `logins`) `loginsMod`
GROUP BY `login_timestamp`
We use count and a grouping in order to get the list by date, which returns:
+-----------------+-------+
| login_timestamp | count |
+-----------------+-------+
| 2015-09-29 | 1 +
| 2015-09-22 | 2 +
+-----------------+-------+
And after all the hard work, both combined:
SELECT `login_timestamp`, COUNT(*)
FROM (
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp`
FROM `logins`
WHERE login_timestamp BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")) `loginsMod`
GROUP BY `login_timestamp`;
Will give you a daily breakdown of logins per-day in this week. Again, replace CURDATE() to get a different week.
As for the users themselves who logged in, let's combine the same stuff in a different order:
SELECT `user_id`
FROM (
SELECT `user_id`, COUNT(*) AS `login_count`
FROM (
SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`) `logins`
GROUP BY `user_id`) `logincounts`
WHERE `login_count` > 6
I have two inner queries, the first is logins:
SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`
Will provide the list of users, and the days when they logged in on, without duplicates.
Then we have logincounts:
SELECT `user_id`, COUNT(*) AS `login_count`
FROM `logins` -- See previous subquery.
GROUP BY `user_id`) `logincounts`
Will return the same list, with a count of how many logins each user had.
And lastly:
SELECT user_id
FROM logincounts -- See previous subquery.
WHERE login_count > 6
Filtering our those who didn't login 7 times, and dropping the date column.
This kinda got long, but I think it's rife with ideas and I think it may definitely help answering in an interesting way in a work interview. :)
create table fbuser(id integer, date date);
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-02');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-01');
insert into fbuser(id,date)values(1,'2012-01-02');
insert into fbuser(id,date)values(1,'2012-01-03');
insert into fbuser(id,date)values(1,'2012-01-04');
insert into fbuser(id,date)values(1,'2012-01-05');
insert into fbuser(id,date)values(1,'2012-01-06');
insert into fbuser(id,date)values(1,'2012-01-07');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(4,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-08');
insert into fbuser(id,date)values(1,'2012-01-09');
select * from fbuser;
id | date
----+------------
1 | 2012-01-01
1 | 2012-01-02
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-01
1 | 2012-01-02
1 | 2012-01-03
1 | 2012-01-04
1 | 2012-01-05
1 | 2012-01-06
1 | 2012-01-07
2 | 2012-01-07
3 | 2012-01-07
4 | 2012-01-07
4 | 2012-01-08
4 | 2012-01-08
1 | 2012-01-08
1 | 2012-01-09
select id,count(DISTINCT date) from fbuser
where date BETWEEN '2012-01-01' and '2012-01-07'
group by id having count(DISTINCT date)=7
id | count
----+-------
1 | 7
(1 row)
Query counts unique dates logged in by user for the given period and returns id with 7 occurrences. If you have time also in your date you can use date_format.
With given data of: userid and timestamp; How does one calculate the number of "active users" on each day in a week?
The problem of course is that there might be no logins at all, or none on certain days in a week, so the basic solution to such a requirement is that you must have a series of dates to compare the logins against.
There are a wide variety of ways to generate the dates of a week and the method one chooses would depend on 2 main factors:
How often do I need these (or similar) results?
the platform I am using. (For example it is very easy to "generate a series" using Postgres but MySQL does not offer such a feature whereas recently MariaDB has introduced series tables to help solve such needs. So knowing your platform's capabilities will affect how you solve this.)
IF I need to do this regularly (which I assume will be true) then I would create a "calendar table" of one row per day for a reasonable extensive period (say 10 years) which is only approx 3652 rows, with its primary key as the date column. In this table we can also store the "week_number" using the week() function which makes week by week reporting simpler (and we can add other columns in this table as well).
So, assuming I have built the calendar table containing each date and a week number then we can take the week number from today's date, subtract 1, and gather the needed login data like this:
select
c.caldate, count(distinct l.userid) as user_logins
from calendar_table as c
left join login_table l on l.timestamp >= c.caldate and l.timestamp < date_add(c.caldate,INTERVAL 1 DAY)
where c.week_number = WEEK(curdate())-1
group by c.caldate
How did I create the calendar table?
Well as said earlier there are a variety of methods, and for MySQL there are options available here: How to populate a table with a range of dates?
I tried this in Teradata and here is the SQL. First, get the User unique to a date, then check, if the user is present for 7 days.
SELECT src.USER_ID
,COUNT(*) CNT
FROM (SELECT USER_ID
,CAST(LOGIN_TIMESTAMP AS DATE FORMAT 'YYYY-MM-DD') AS LOGIN_DT
FROM src_table
WHERE LOGIN_TIMESTAMP BETWEEN '2017-11-12 00:00:00' AND '2017-11-18 23:59:59'
GROUP BY 1,2
)src GROUP BY 1 HAVING CNT = 7;
INSERT INTO src_table VALUES (1,'2017-11-12 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-13 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-13 11:10:10');
INSERT INTO src_table VALUES (1,'2017-11-13 12:10:10');
INSERT INTO src_table VALUES (1,'2017-11-14 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-15 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-16 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-17 10:10:10');
INSERT INTO src_table VALUES (1,'2017-11-18 10:10:10');
INSERT INTO src_table VALUES (2,'2017-11-12 01:10:10');
INSERT INTO src_table VALUES (2,'2017-11-13 13:10:10');
INSERT INTO src_table VALUES (2,'2017-11-14 14:10:10');
INSERT INTO src_table VALUES (2,'2017-11-15 12:10:10');
INSERT INTO src_table VALUES (5,'2017-11-12 01:10:10');
INSERT INTO src_table VALUES (5,'2017-11-13 02:10:10');
INSERT INTO src_table VALUES (5,'2017-11-14 03:10:10');
INSERT INTO src_table VALUES (5,'2017-11-15 04:10:10');
INSERT INTO src_table VALUES (5,'2017-11-16 05:10:10');
INSERT INTO src_table VALUES (5,'2017-11-17 06:10:10');
INSERT INTO src_table VALUES (8,'2017-11-12 04:10:10');
INSERT INTO src_table VALUES (8,'2017-11-13 05:10:10');
INSERT INTO src_table VALUES (8,'2017-11-14 06:10:10');
INSERT INTO src_table VALUES (8,'2017-11-15 01:10:10');
INSERT INTO src_table VALUES (8,'2017-11-16 02:10:10');
INSERT INTO src_table VALUES (8,'2017-11-17 03:10:10');
INSERT INTO src_table VALUES (8,'2017-11-18 03:10:10');
This works for me
select a.user_id, count(a.user_id) as active_time_in_days
from
(
select user_id, login_time, lead(login_time) over (partition by user_id order by login_time asc ) as next_day
from dev.login_info
group by 1,2
order by user_id, login_time asc
)a where a.login_time + interval '1 day' = next_day
group by 1;
How about this? I tried it and it works.
select yearweek(ts) as yearwk, user_id,
count(user_id) as counts
from log
group by 1,2
having count(user_id) =7;

MYSQL - Sum Interval Dates

I came across the following problem:
I would like to sum the hours of each name, giving a total interval between START and END activities,
would be simple if I could subtract from each record the end of the beginning, more e.g., Mary, started 13th and was up to 15 and started another activity while 14 and 16, I would like the result of it was 3 (she used 3 hours of their time to perform both activities)
e.g.:
Name | START | END |
-----------------------------------------------------------
KATE | 2014-01-01 13:00:00 | 2014-01-01 14:00:00 |
MARY | 2014-01-01 13:00:00 | 2014-01-01 15:00:00 |
TOM | 2014-01-01 13:00:00 | 2014-01-01 16:00:00 |
KATE | 2014-01-01 12:00:00 | 2014-01-02 04:00:00 |
MARY | 2014-01-01 14:00:00 | 2014-01-01 16:00:00 |
TOM | 2014-01-01 12:00:00 | 2014-01-01 18:00:00 |
TOM | 2014-01-01 22:00:00 | 2014-01-02 02:00:00 |
result:
KATE 15 hours
MARY 3 hours
TOM 9 hours
Have you tried a group by and then an aggregate function?
SELECT Name, SUM(UNIX_TIMESTAMP(End) - UNIX_TIMESTAMP(Start)) FROM myTable
GROUP BY Name
Which will return a cumulative total of seconds from the intervals you have. You can then change the seconds to hours for display.
Also I would highly recommend grouping by a primary key or something instead of a string name, but I understand that this may have been just to simplify the question.
I found this problem interesting, so spent a little more time to develop a solution. What I came up with involves sorting the rows by name and start time, then using MySQL variables to account for overlapping ranges. I begin by sorting the table and supplementing it with columns that carry the name and times from one row to the next
SELECT [expounded below]
FROM (SELECT * FROM tbl ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='',
#beg := (SELECT MIN(START) FROM tbl),
#end := (SELECT MAX(END) FROM tbl)) AS t
This adds the name and the outer bounds of the time range to each row of the table, as well as sorting the table so that
names are together in order by starting time. For each row, we will now have #same_name, #beg, and #end carrying values forward from one line to the next, and #x and #gap will accumulate the hours.
Now we have to do some reasoning about the possible overlaps that can occur. For any two intervals, they are either disjoint or have an intersection:
Non-overlapping: beg--------end START-------END
Overlapping: beg-----------end beg---------end
START--------------END START-----------END
Subset: beg---------------------------------end
START-----END
Once the rows are adjacent, we can decide if two ranges overlap by comparing their start and end points. They overlap
if the start of one is before the end of the other and vice versa:
IF( #end >= START && #beg <= END,
If they do overlap, then the total interval is the difference between the outer edges of the two intervals:
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END))
If they don't overlap, then we can just add the new interval to the previous one.
We will also need to know the gap between intervals, which is the difference from the end of the first to the beginning of the second. This will be necessary to calculate the hours for a case of more than two intervals, where only some overlap.
1-----------2 3----------4
3--------------------5
Putting this together gets us a calculation per row, where each row calculates the union of the hours with the one
above it. For each variable, we have to reset it if the name changes:
SELECT Name, START, END,
#x := IF(#same_name = Name,
IF( #end >= START && #beg <= END, -- does it overlap?
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END)),
#x + TIMESTAMPDIFF(HOUR, START, END) ),
TIMESTAMPDIFF(HOUR,START,END) ) AS hr,
#gap := IF(#same_name = Name,
IF(#end >= START && #beg <= END, -- does it overlap?
#gap,
#gap + TIMESTAMPDIFF(HOUR, #end, START)),
0) AS gap,
#beg := IF(#same_name = Name,
CAST(LEAST(#beg, START) AS DATETIME), -- expand interval
START) AS beg, -- reset interval
#end := IF(#same_name = Name,
CAST(GREATEST(#end, END) AS DATETIME),
END) AS finish,
#same_name := Name AS sameName
FROM
(SELECT * FROM xt ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='', #beg := (SELECT MIN(START) FROM xt), #end := (SELECT MAX(END) FROM xt)) AS t
That still gives us as many rows as there were in the original table. The hours and gaps will accumulate for each name, so we have to select the highest values and group by Name:
SELECT Name, MAX(hr) - MAX(gap) AS HOURS
FROM ( [insert above query here] ) AS intermediateCalculcation
GROUP BY Name;
Edit
And of course a moment after hitting enter, it occurs to me that (a) there is a bug for names that have no overlapping intervals at all; and (b) all #x is really doing is building up the interval from MIN(START) to MAX(END) for eacdh name, which could be done with a simpler query and join. Um, exercise for the reader ? :-)

most active time of day based on start and end time

I'm logging statistics of the gamers in my community. For both their online and in-game states I'm registering when they "begin" and when they "end". In order to show the most active day and hour of the day I'd like to use an SQL statement that measures the most active moments based on the "begin" and "end" datetime values.
Looking at SQL - select most 'active' time from db I can see similarities, but I need to also include the moments between the start and end time.
Perhaps the easiest way is to write a cron that does the calculations, but I hope this question might teach me how to address this issue in SQL instead.
I've been searching for an SQL statement that allows to create a datetime period and use that to substract single hours and days. But to no avail.
--- update
As I'm thinking more about this, I'm wondering whether it might be wise to run 24 queries based on each hour of the day (for most active hour) and several queries for the most active day. But that seems like a waste of performance. But this solution might make a query possible like:
SELECT COUNT(`userID`), DATE_FORMAT("%H",started) AS starthour,
DATE_FORMAT("%H",ended) AS endhour
FROM gameactivity
WHERE starthour >= $hour
AND endhour <= $hour GROUP BY `userID`
($hour is added for example purposes, of course I'm using PDO. Columns are also just for example purposes, whatever you think is easy for you to use in explaining that is identifiable as start and end is ok with me)
Additional information; PHP 5.5+, PDO, MySQL 5+
Table layout for ingame would be: gameactivity: activityid, userid, gameid, started, ended
DDL:
CREATE TABLE IF NOT EXISTS `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
If I understood your requirements correctly, if this graph represents user activity:
Day
12/1 12/2 12/3 12/4 ...
Hour 0 xx x x xx
1 x xx xx
2 xxx x x xx
3 x x
4 x x
5 x x
6 x
...
You want to know that 02:00 is the time of the day with the highest average activity (a row with 7 x), and 12/4 was most active day (a column with 10 x). Note that this doesn't imply that 02:00 of 12/4 was the most active hour ever, as you can see in the example. If this is not what you want please clarify with concrete examples of input and desired result.
We make a couple assumptions:
An activity record can start on one date and finish on the next one. For instance: online 2013-12-02 23:35, offline 2013-12-03 00:13.
No activity record has a duration longer than 23 hours, or the number of such records is negligible.
And we need to define what does 'activity' mean. I picked the criteria that were easier to compute in each case. Both can be made more accurate if needed, at the cost of having more complex queries.
The most active time of day will be the hour with which more activity records overlap. Note that if a user starts and stops more than once during the hour it will be counted more than once.
The most active day will be the one for which there were more unique users that were active at any time of the day.
For the most active time of day we'll use a small auxiliary table holding the 24 possible hours. It can also be generated and joined on the fly with the techniques described in other answers.
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
Then the following queries give the required results:
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour BETWEEN hour(online) AND hour(offline)
OR hour(online) BETWEEN hour(offline) AND hour
OR hour(offline) BETWEEN hour AND hour(online) )
GROUP BY hour
ORDER BY activity DESC;
SELECT date, count(DISTINCT userID) AS activity
FROM (
SELECT userID, date(online) AS date
FROM steamonlineactivity
UNION
SELECT userID, date(offline) AS date
FROM steamonlineactivity
) AS x
GROUP BY date
ORDER BY activity DESC;
You need a sequence to get values for hours where there was no activity (e.g. hours where nobody starting or finishing, but there were people on-line who had started but had not finished in that time). Unfortunately there is no nice way to create a sequence in MySQL so you will have to create the sequence manually;
CREATE TABLE `hour_sequence` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`hour` datetime NOT NULL,
KEY (`hour`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
# this is not great
INSERT INTO `hour_sequence` (`hour`) VALUES
("2013-12-01 00:00:00"),
("2013-12-01 01:00:00"),
("2013-12-01 02:00:00"),
("2013-12-01 03:00:00"),
("2013-12-01 04:00:00"),
("2013-12-01 05:00:00"),
("2013-12-01 06:00:00"),
("2013-12-01 07:00:00"),
("2013-12-01 08:00:00"),
("2013-12-01 09:00:00"),
("2013-12-01 10:00:00"),
("2013-12-01 11:00:00"),
("2013-12-01 12:00:00");
Now create some test data
CREATE TABLE `log_table` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`userID` bigint(20) unsigned NOT NULL,
`started` datetime NOT NULL,
`finished` datetime NOT NULL,
KEY (`started`),
KEY (`finished`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET latin1;
INSERT INTO `log_table` (`userID`,`started`,`finished`) VALUES
(1, "2013-12-01 00:00:12", "2013-12-01 02:25:00"),
(2, "2013-12-01 07:25:00", "2013-12-01 08:23:00"),
(1, "2013-12-01 04:25:00", "2013-12-01 07:23:00");
Now the query - for every hour we keep a tally (accumulation/running total/integral etc) of how many people started a session hour-on-hour
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
And also how many people went off-line likewise
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
By subtracting the accumulation of people that had gone off-line at a point in time from the accumulation of people that have come on-line at that point in time we get the number of people who were on-line at that point in time (presuming there were zero people on-line when the data starts, of course).
SELECT
starts.period_starting,
starts.starts as users_started,
finishes.finishes as users_finished,
starts.starts - finishes.finishes as users_online
FROM
(
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
) starts
LEFT JOIN (
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
) finishes ON starts.period_starting = finishes.period_starting;
Now a few caveats. First of all you will need a process to keep your sequence table populated with the hourly timestamps as time progresses. Additionally the accumulators do not scale well with large amounts of log data due to the tenuous join - it would be wise to constrain access to the log table by timestamp in both the starts and finishes subquery, and the sequence table while you are at it.
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
WHERE
LT.finished BETWEEN ? AND ? AND HS.hour BETWEEN ? AND ?
GROUP BY
HS.hour
If you start constraining your log_table data to specific time ranges bear in mind you will have an offset issue if, at the point you start looking at the log data, there were already people on-line. If there were 1000 people on-line at the point where you start looking at your log data then you threw them all off the server from the query it would look like we went from 0 people on-line to -1000 people on-line!
#rsanchez had an amazing answer, but the query for most active time of day has a weird behaviour when handling session times that started and ended on the same hour (a short session). The query seems to calculate them to last for 24 hours.
With trial and error I corrected his query from that part to be following
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour >= HOUR(online) AND hour <= HOUR(offline)
OR HOUR(online) > HOUR(offline) AND HOUR(online) <= hour
OR HOUR(offline) >= hour AND HOUR(offline) < HOUR(online) )
GROUP BY hour
ORDER BY activity DESC;
So with following structure:
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
CREATE TABLE `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
);
INSERT INTO `steamonlineactivity` (`activityID`, `userID`, `online`, `offline`) VALUES
(1, '1', '2014-01-01 16:01:00', '2014-01-01 19:01:00'),
(2, '2', '2014-01-02 16:01:00', '2014-01-02 19:01:00'),
(3, '3', '2014-01-01 22:01:00', '2014-01-02 02:01:00'),
(4, '4', '2014-01-01 16:01:00', '2014-01-01 16:05:00');
The top query to get the most active times output following:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 2 |
| 18 | 2 |
| 19 | 2 |
| 22 | 1 |
| 23 | 1 |
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
+------+----------+
Instead of the original query which gives following erronous result:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 3 |
| 18 | 3 |
| 19 | 3 |
| 0 | 2 |
| 1 | 2 |
| 2 | 2 |
| 22 | 2 |
| 23 | 2 |
| 11 | 1 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1 |
| 15 | 1 |
| 3 | 1 |
| 4 | 1 |
| 20 | 1 |
| 5 | 1 |
| 21 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
+------+----------+
This query is for oracle, but you can get idea from it:
SELECT
H, M,
COUNT(BEGIN)
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in oracle you can use CONNECT BY clause which is designated to do recursive queries
(SELECT LEVEL - 1 DAYMIN, FLOOR((LEVEL - 1) / 60) H, MOD((LEVEL - 1), 60) M FROM dual CONNECT BY LEVEL <= 1440) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON 60 * TO_NUMBER(TO_CHAR(S.BEGIN, 'HH24')) + TO_NUMBER(TO_CHAR(S.BEGIN, 'MI')) <= T.DAYMIN AND
60 * TO_NUMBER(TO_CHAR(S.END, 'HH24')) + TO_NUMBER(TO_CHAR(S.END, 'MI')) > T.DAYMIN
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
Fiddle: http://sqlfiddle.com/#!4/e5e31/9
The idea is to have some temp table or view with one row for time point, and left join to it. In my example there is one row for every minute in day. In mysql you can use variables to create such view on-the-fly.
MySQL version:
SELECT
FLOOR(T.DAYMIN / 60), -- hour
MOD(T.DAYMIN, 60), -- minute
-- T.DAYMIN, -- minute of the day
COUNT(S.BEGIN) -- count not null stats
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in mysql you must have some table which has at least 1440 rows;
-- I use (INFORMATION_SCHEMA.COLLATIONSxINFORMATION_SCHEMA.COLLATIONS) for that purpose - it should be
-- in every database
(
SELECT
#counter := #counter + 1 AS DAYMIN
FROM
INFORMATION_SCHEMA.COLLATIONS A CROSS JOIN
INFORMATION_SCHEMA.COLLATIONS B CROSS JOIN
(SELECT #counter := -1) C
LIMIT 1440
) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON (
(60 * DATE_FORMAT(S.BEGIN, '%H')) + (1 * DATE_FORMAT(S.BEGIN, '%i')) <= T.DAYMIN AND
(60 * DATE_FORMAT(S.END, '%H')) + (1 * DATE_FORMAT(S.END, '%i')) > T.DAYMIN
)
GROUP BY T.DAYMIN
HAVING COUNT(S.BEGIN) > 0 -- filter empty counters
ORDER BY T.DAYMIN
Fiddle: http://sqlfiddle.com/#!2/de01c/1
I've been overthinking this question myself and based on everyone's answers I think it's obvious to conclude with the following;
In general it's probably easy to implement some kind of separate table that has the hours of the day and do inner selects from that separate table. Other examples without a separate table have many sub selects, even with four tiers, which makes me believe they will probably not scale. Cron solutions have come to my mind as well, but the question was asked - out of curiosity - to focus on SQL queries and not other solutions.
In my own case and completely outside the scope of my own question, I believe the best solution is to create a separate table with two fields (hour [Y-m-d H], onlinecount, playingcount) that counts the number of people online at a certain hour and the people playing at a certain hour. When a player stops playing or goes offline we update the count (+1) based on the start and end times. Thus I can easily deduce tables and graphs from this separate table.
Please, let me know whether you come to the same conclusion. My thanks to #lolo, #rsanchez and #abasterfield. I wish I could split the bounty :)
sqlFiddle, this query will give you the period that has the most userCount, the period could be between anytime, it just gives you the start time and end time that has the most userCount
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT starttime,(SELECT MIN(endtime) FROM
(SELECT DISTINCT started as endtime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as endtime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T1
WHERE T1.endtime > T2.starttime
)as endtime
FROM
(SELECT DISTINCT started as starttime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as starttime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T2
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
just change the date of '1970-01-01' occurences to the date you're trying to get data from.
What the query does it selects all the times in the inner queries and then create intervals out of them, then join with GameActivity and count occurrences of users within those intervals and return the interval with the most userCount(most activity).
here's an sqlFiddle with one less tier
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT DISTINCT started as starttime,(SELECT MIN(ended)as endtime FROM
gameactivity T1 WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
AND T1.ended > T2.started
)as endtime
FROM
gameactivity T2
WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
or according to your query in your question above, you don't seem to care about dates, but only hour statistics across all dates then the below query might do it (your query just looks at the HOUR of started and ended and ignore users that play longer than 1 hour.
the below query might do it for you sqlFiddle
SELECT COUNT(*) as UserCount,
HOURSTABLE.StartHour,
HOURSTABLE.EndHour
FROM
(SELECT #hour as StartHour,
#hour:=#hour + 1 as EndHour
FROM
gameActivity as OrAnyTableWith24RowsOrMore,
(SELECT #hour:=0)as InitialValue
LIMIT 24) as HOURSTABLE,
gameActivity GA
WHERE HOUR(GA.started) >= HOURSTABLE.StartHour
AND HOUR(GA.ended) <= HOURSTABLE.EndHour
GROUP BY HOURSTABLE.StartHour,HOURSTABLE.EndHour
ORDER BY UserCount DESC
LIMIT 1
just delete the LIMIT 1 if you want to see userCount for other hours as well.
The easiest solution is to run a cron at the top of each hour of who has a start time but no end time (null end time? if you reset it when they login) and log that count. This will give you a count of currently logged in at each hour without needing to do funky schema changes or wild queries.
Now when you check the next hour and they had logged out they would fall out of your results. This query would work if you reset end time when they login.
SELECT CONCAT(CURDATE(), ' ', HOUR(NOW()), ' ', COUNT(*)) FROM activity WHERE DATE(start) = CURDATE() AND end IS NULL;
Then you can log this at your hearts content to a file or to another table (Of course you might need to adjust the select per your log table). For example you can have a table that gets one entry per day and only gets updated once.
Assume a log table like:
current_date | peak_hour | peak_count
SELECT IF(peak_count< $peak_count, true, false) FROM log where DATE(current_date) = NOW();
where $peak_count is a variable coming from your cron. If you find that you have a new bigger peak count you do an update, if the record does not exist for the day do an insert into log. Otherwise, no you have not beat a peak_hour from earlier in the day, don't do an update. This means each day will give you only 1 row in your table. Then you don't need to do any aggregation, it is all right there for you to see the date and hour over the course of a week or month or whatever.

Find big enough gaps in booking table

A rental system uses a booking table to store all bookings and reservations:
booking | item | startdate | enddate
1 | 42 | 2013-10-25 16:00 | 2013-10-27 12:00
2 | 42 | 2013-10-27 14:00 | 2013-10-28 18:00
3 | 42 | 2013-10-30 09:00 | 2013-11-01 09:00
…
Let’s say a user wants to rent item 42 from 2013-10-27 12:00 until 2013-10-28 12:00 which is a period of one day. The system will tell him, that the item is not available in the given time frame, since booking no. 2 collides.
Now I want to suggest the earliest rental date and time when the selected item is available again. Of course considering the user’s requested period (1 day) beginning with the user’s desired date and time.
So in the case above, I’m looking for an SQL query that returns 2013-10-28 18:00, since the earliest date since 2013-10-27 12:00 at which item 42 will be available for 1 day, is from 2013-10-28 18:00 until 2013-10-29 18:00.
So I need to to find a gap between bookings, that is big enough to hold the user’s reservation and that is as close a possible to the desired start date.
Or in other words: I need to find the first booking for a given item, after which there’s enough free time to place the user’s booking.
Is this possible in plain SQL without having to iterate over every booking and its successor?
If you can't redesign your database to use something more efficient, this will get the answer. You'll obviously want to parameterize it. It says find either the desired date, or the earliest end date where the hire interval doesn't overlap an existing booking:
Select
min(startdate)
From (
select
cast('2013-10-27 12:00' as datetime) startdate
from
dual
union all
select
enddate
from
booking
where
enddate > cast('2013-10-27 12:00' as datetime) and
item = 42
) b1
Where
not exists (
select
'x'
from
booking b2
where
item = 42 and
b1.startdate < b2.enddate and
b2.startdate < date_add(b1.startdate, interval 24 hour)
);
Example Fiddle
SELECT startfree,secondsfree FROM (
SELECT
#lastenddate AS startfree,
UNIX_TIMESTAMP(startdate)-UNIX_TIMESTAMP(#lastenddate) AS secondsfree,
#lastenddate:=enddate AS ignoreme
FROM
(SELECT startdate,enddate FROM bookings WHERE item=42) AS schedule,
(SELECT #lastenddate:=NOW()) AS init
ORDER BY startdate
) AS baseview
WHERE startfree>='2013-10-27 12:00:00'
AND secondsfree>=86400
ORDER BY startfree
LIMIT 1
;
Some explanation: The inner query uses a variable to move the iteration into SQL, the outer query finds the needed row.
That said, I would not do this in SQL, if the DB structure is like the given. You could reduce the iteration count by using some smort WHERE in the inner query to a sane timespan, but chances are, this won't perform well.
EDIT
A caveat: I did not check, but I assume, this won't work, if there are no prior reservations in the list - this should not be a problem, as in this case your first reservation attempt (original time) will work.
EDIT
SQLfiddle
Searching for overlapping date ranges generally yields poor performance in SQL. For that reason having a "Calendar" of available slots often makes things a lot more efficient.
For example, the booking 2013-10-25 16:00 => 2013-10-27 12:00 would actually be represented by 44 records, each one hour long.
The "gap" until the next booking at 2013-10-27 14:00 would then be represented by 2 records, each one hours long.
Then, each record could also have the duration (in time, or number of slots) until the next change.
slot_start_time | booking | item | remaining_duration
------------------+---------+------+--------------------
2013-10-27 10:00 | 1 | 42 | 2
2013-10-27 11:00 | 1 | 42 | 1
2013-10-27 12:00 | NULL | 42 | 2
2013-10-27 13:00 | NULL | 42 | 1
2013-10-27 14:00 | 2 | 42 | 28
2013-10-27 15:00 | 2 | 42 | 27
... | ... | ... | ...
2013-10-28 17:00 | 2 | 42 | 1
2013-10-28 18:00 | NULL | 42 | 39
2013-10-28 19:00 | NULL | 42 | 38
Then your query just becomes:
SELECT
*
FROM
slots
WHERE
slot_start_time >= '2013-10-27 12:00'
AND remaining_duration >= 24
AND booking IS NULL
ORDER BY
slot_start_time ASC
LIMIT
1
OK this isn't pretty in MySQL. That's because we have to fake rownum values in subqueries.
The basic approach is to join the appropriate subset of the booking table to itself offset by one.
Here's the basic list of reservations for item 42, ordered by reservation time. We can't order by booking_id, because those aren't guaranteed to be in order of reservation time. (You're trying to insert a new reservation between two existing ones, eh?) http://sqlfiddle.com/#!2/62383/9/0
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
Here is that subset joined to itself. The trick is the a.rownum+1 = b.rownum, which joins each row to the one that comes right after it in the booking table subset. http://sqlfiddle.com/#!2/62383/8/0
SELECT a.booking_id, a.startdate asta, a.enddate aend,
b.startdate bsta, b.enddate bend
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
Here it is again, showing each reservation (except the last one) and the number of hours following it. http://sqlfiddle.com/#!2/62383/15/0
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
So, if you're looking for the starting time and ending time of the earliest twelve-hour slot you can use that result set to do this: http://sqlfiddle.com/#!2/62383/18/0
SELECT MIN(enddate) startdate, MIN(enddate) + INTERVAL 12 HOUR as enddate
FROM (
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
) AS gaps
WHERE gaphours >= 12
here is the query, it will return needed date, obvious condition - there should be some bookings in table, but as I see from question - you do this check:
SELECT min(enddate)
FROM
(
select a.enddate from table4 as a
where
a.item=42
and
DATE_ADD(a.enddate, INTERVAL 1 day) <= ifnull(
(select min(b.startdate)
from table4 as b where b.startdate>=a.enddate and a.item=b.item),
a.enddate)
and
a.enddate>=now()
union all
select greatest(ifnull(max(enddate), now()),now()) from table4
) as q
you change change INTERVAL 1 day to INTERVAL ### hour
If I have understood your requirements correctly, you could try self-JOINing book with itself, to get the "empty" spaces, and then fit. This is MySQL only (I believe it can be adapted to others - certainly PostgreSQL):
SELECT book.*, TIMESTAMPDIFF(MINUTE, book.enddate, book.best) AS width FROM
(
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 USING (item)
WHERE item = 42 AND book1.startdate >= book.enddate
GROUP BY book.booking
) AS book HAVING width > 110 ORDER BY startdate LIMIT 1;
In the above example, "110" is the looked-for minimum width in minutes.
Same thing, a bit less readable (for me), a SELECT removed (very fast SELECT, so little advantage):
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 ON (book.item = book1.item AND book.item = 42)
WHERE book1.startdate >= book.enddate
GROUP BY book.booking
HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) > 110
ORDER BY startdate LIMIT 1;
In your case, one day is 1440 minutes and
SELECT book.*, MIN(book1.startdate) AS best FROM book JOIN book AS book1 ON (book.item = book1.item AND book.item = 42) WHERE book1.startdate >= book.enddate GROUP BY book.booking HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) >= 1440 ORDER BY startdate LIMIT 1;
+---------+------+---------------------+---------------------+---------------------+
| booking | item | startdate | enddate | best |
+---------+------+---------------------+---------------------+---------------------+
| 2 | 42 | 2013-10-27 14:00:00 | 2013-10-28 18:00:00 | 2013-10-30 09:00:00 |
+---------+------+---------------------+---------------------+---------------------+
1 row in set (0.00 sec)
...the period returned is 2, i.e., at the end of booking 2, and until "best" which is booking 3, a period of at least 1440 minutes is available.
An issue could be that if no periods are available, the query returns nothing -- then you need another query to fetch the farthest enddate. You can do this with an UNION and LIMIT 1 of course, but I think it would be best to only run the 'recovery' query on demand, programmatically (i.e. if empty(query) then new_query...).
Also, in the inner WHERE you should add a check for NOW() to avoid dates in the past. If expired bookings are moved to inactive storage, this could be unnecessary.