most active time of day based on start and end time - mysql

I'm logging statistics of the gamers in my community. For both their online and in-game states I'm registering when they "begin" and when they "end". In order to show the most active day and hour of the day I'd like to use an SQL statement that measures the most active moments based on the "begin" and "end" datetime values.
Looking at SQL - select most 'active' time from db I can see similarities, but I need to also include the moments between the start and end time.
Perhaps the easiest way is to write a cron that does the calculations, but I hope this question might teach me how to address this issue in SQL instead.
I've been searching for an SQL statement that allows to create a datetime period and use that to substract single hours and days. But to no avail.
--- update
As I'm thinking more about this, I'm wondering whether it might be wise to run 24 queries based on each hour of the day (for most active hour) and several queries for the most active day. But that seems like a waste of performance. But this solution might make a query possible like:
SELECT COUNT(`userID`), DATE_FORMAT("%H",started) AS starthour,
DATE_FORMAT("%H",ended) AS endhour
FROM gameactivity
WHERE starthour >= $hour
AND endhour <= $hour GROUP BY `userID`
($hour is added for example purposes, of course I'm using PDO. Columns are also just for example purposes, whatever you think is easy for you to use in explaining that is identifiable as start and end is ok with me)
Additional information; PHP 5.5+, PDO, MySQL 5+
Table layout for ingame would be: gameactivity: activityid, userid, gameid, started, ended
DDL:
CREATE TABLE IF NOT EXISTS `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

If I understood your requirements correctly, if this graph represents user activity:
Day
12/1 12/2 12/3 12/4 ...
Hour 0 xx x x xx
1 x xx xx
2 xxx x x xx
3 x x
4 x x
5 x x
6 x
...
You want to know that 02:00 is the time of the day with the highest average activity (a row with 7 x), and 12/4 was most active day (a column with 10 x). Note that this doesn't imply that 02:00 of 12/4 was the most active hour ever, as you can see in the example. If this is not what you want please clarify with concrete examples of input and desired result.
We make a couple assumptions:
An activity record can start on one date and finish on the next one. For instance: online 2013-12-02 23:35, offline 2013-12-03 00:13.
No activity record has a duration longer than 23 hours, or the number of such records is negligible.
And we need to define what does 'activity' mean. I picked the criteria that were easier to compute in each case. Both can be made more accurate if needed, at the cost of having more complex queries.
The most active time of day will be the hour with which more activity records overlap. Note that if a user starts and stops more than once during the hour it will be counted more than once.
The most active day will be the one for which there were more unique users that were active at any time of the day.
For the most active time of day we'll use a small auxiliary table holding the 24 possible hours. It can also be generated and joined on the fly with the techniques described in other answers.
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
Then the following queries give the required results:
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour BETWEEN hour(online) AND hour(offline)
OR hour(online) BETWEEN hour(offline) AND hour
OR hour(offline) BETWEEN hour AND hour(online) )
GROUP BY hour
ORDER BY activity DESC;
SELECT date, count(DISTINCT userID) AS activity
FROM (
SELECT userID, date(online) AS date
FROM steamonlineactivity
UNION
SELECT userID, date(offline) AS date
FROM steamonlineactivity
) AS x
GROUP BY date
ORDER BY activity DESC;

You need a sequence to get values for hours where there was no activity (e.g. hours where nobody starting or finishing, but there were people on-line who had started but had not finished in that time). Unfortunately there is no nice way to create a sequence in MySQL so you will have to create the sequence manually;
CREATE TABLE `hour_sequence` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`hour` datetime NOT NULL,
KEY (`hour`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
# this is not great
INSERT INTO `hour_sequence` (`hour`) VALUES
("2013-12-01 00:00:00"),
("2013-12-01 01:00:00"),
("2013-12-01 02:00:00"),
("2013-12-01 03:00:00"),
("2013-12-01 04:00:00"),
("2013-12-01 05:00:00"),
("2013-12-01 06:00:00"),
("2013-12-01 07:00:00"),
("2013-12-01 08:00:00"),
("2013-12-01 09:00:00"),
("2013-12-01 10:00:00"),
("2013-12-01 11:00:00"),
("2013-12-01 12:00:00");
Now create some test data
CREATE TABLE `log_table` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`userID` bigint(20) unsigned NOT NULL,
`started` datetime NOT NULL,
`finished` datetime NOT NULL,
KEY (`started`),
KEY (`finished`),
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET latin1;
INSERT INTO `log_table` (`userID`,`started`,`finished`) VALUES
(1, "2013-12-01 00:00:12", "2013-12-01 02:25:00"),
(2, "2013-12-01 07:25:00", "2013-12-01 08:23:00"),
(1, "2013-12-01 04:25:00", "2013-12-01 07:23:00");
Now the query - for every hour we keep a tally (accumulation/running total/integral etc) of how many people started a session hour-on-hour
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
And also how many people went off-line likewise
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
By subtracting the accumulation of people that had gone off-line at a point in time from the accumulation of people that have come on-line at that point in time we get the number of people who were on-line at that point in time (presuming there were zero people on-line when the data starts, of course).
SELECT
starts.period_starting,
starts.starts as users_started,
finishes.finishes as users_finished,
starts.starts - finishes.finishes as users_online
FROM
(
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS starts
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.started
GROUP BY
HS.hour
) starts
LEFT JOIN (
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
GROUP BY
HS.hour
) finishes ON starts.period_starting = finishes.period_starting;
Now a few caveats. First of all you will need a process to keep your sequence table populated with the hourly timestamps as time progresses. Additionally the accumulators do not scale well with large amounts of log data due to the tenuous join - it would be wise to constrain access to the log table by timestamp in both the starts and finishes subquery, and the sequence table while you are at it.
SELECT
HS.hour as period_starting,
COUNT(LT.userID) AS finishes
FROM `hour_sequence` HS
LEFT JOIN `log_table` LT ON HS.hour > LT.finished
WHERE
LT.finished BETWEEN ? AND ? AND HS.hour BETWEEN ? AND ?
GROUP BY
HS.hour
If you start constraining your log_table data to specific time ranges bear in mind you will have an offset issue if, at the point you start looking at the log data, there were already people on-line. If there were 1000 people on-line at the point where you start looking at your log data then you threw them all off the server from the query it would look like we went from 0 people on-line to -1000 people on-line!

#rsanchez had an amazing answer, but the query for most active time of day has a weird behaviour when handling session times that started and ended on the same hour (a short session). The query seems to calculate them to last for 24 hours.
With trial and error I corrected his query from that part to be following
SELECT hour, count(*) AS activity
FROM steamonlineactivity, hour
WHERE ( hour >= HOUR(online) AND hour <= HOUR(offline)
OR HOUR(online) > HOUR(offline) AND HOUR(online) <= hour
OR HOUR(offline) >= hour AND HOUR(offline) < HOUR(online) )
GROUP BY hour
ORDER BY activity DESC;
So with following structure:
CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
INSERT hour (hour)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
, (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
, (21), (22), (23);
CREATE TABLE `steamonlineactivity` (
`activityID` int(13) NOT NULL AUTO_INCREMENT,
`userID` varchar(255) NOT NULL,
`online` datetime DEFAULT NULL,
`offline` datetime DEFAULT NULL,
PRIMARY KEY (`activityID`)
);
INSERT INTO `steamonlineactivity` (`activityID`, `userID`, `online`, `offline`) VALUES
(1, '1', '2014-01-01 16:01:00', '2014-01-01 19:01:00'),
(2, '2', '2014-01-02 16:01:00', '2014-01-02 19:01:00'),
(3, '3', '2014-01-01 22:01:00', '2014-01-02 02:01:00'),
(4, '4', '2014-01-01 16:01:00', '2014-01-01 16:05:00');
The top query to get the most active times output following:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 2 |
| 18 | 2 |
| 19 | 2 |
| 22 | 1 |
| 23 | 1 |
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
+------+----------+
Instead of the original query which gives following erronous result:
+------+----------+
| hour | activity |
+------+----------+
| 16 | 3 |
| 17 | 3 |
| 18 | 3 |
| 19 | 3 |
| 0 | 2 |
| 1 | 2 |
| 2 | 2 |
| 22 | 2 |
| 23 | 2 |
| 11 | 1 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1 |
| 15 | 1 |
| 3 | 1 |
| 4 | 1 |
| 20 | 1 |
| 5 | 1 |
| 21 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
+------+----------+

This query is for oracle, but you can get idea from it:
SELECT
H, M,
COUNT(BEGIN)
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in oracle you can use CONNECT BY clause which is designated to do recursive queries
(SELECT LEVEL - 1 DAYMIN, FLOOR((LEVEL - 1) / 60) H, MOD((LEVEL - 1), 60) M FROM dual CONNECT BY LEVEL <= 1440) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON 60 * TO_NUMBER(TO_CHAR(S.BEGIN, 'HH24')) + TO_NUMBER(TO_CHAR(S.BEGIN, 'MI')) <= T.DAYMIN AND
60 * TO_NUMBER(TO_CHAR(S.END, 'HH24')) + TO_NUMBER(TO_CHAR(S.END, 'MI')) > T.DAYMIN
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
GROUP BY H, M
HAVING COUNT(BEGIN) > 0
ORDER BY H, M
Fiddle: http://sqlfiddle.com/#!4/e5e31/9
The idea is to have some temp table or view with one row for time point, and left join to it. In my example there is one row for every minute in day. In mysql you can use variables to create such view on-the-fly.
MySQL version:
SELECT
FLOOR(T.DAYMIN / 60), -- hour
MOD(T.DAYMIN, 60), -- minute
-- T.DAYMIN, -- minute of the day
COUNT(S.BEGIN) -- count not null stats
FROM
-- temporary table that should return numbers from 0 to 1439
-- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
-- in mysql you must have some table which has at least 1440 rows;
-- I use (INFORMATION_SCHEMA.COLLATIONSxINFORMATION_SCHEMA.COLLATIONS) for that purpose - it should be
-- in every database
(
SELECT
#counter := #counter + 1 AS DAYMIN
FROM
INFORMATION_SCHEMA.COLLATIONS A CROSS JOIN
INFORMATION_SCHEMA.COLLATIONS B CROSS JOIN
(SELECT #counter := -1) C
LIMIT 1440
) T LEFT JOIN
-- join stats to each row from T by converting discarding date and converting time to minute of a day
STATS S ON (
(60 * DATE_FORMAT(S.BEGIN, '%H')) + (1 * DATE_FORMAT(S.BEGIN, '%i')) <= T.DAYMIN AND
(60 * DATE_FORMAT(S.END, '%H')) + (1 * DATE_FORMAT(S.END, '%i')) > T.DAYMIN
)
GROUP BY T.DAYMIN
HAVING COUNT(S.BEGIN) > 0 -- filter empty counters
ORDER BY T.DAYMIN
Fiddle: http://sqlfiddle.com/#!2/de01c/1

I've been overthinking this question myself and based on everyone's answers I think it's obvious to conclude with the following;
In general it's probably easy to implement some kind of separate table that has the hours of the day and do inner selects from that separate table. Other examples without a separate table have many sub selects, even with four tiers, which makes me believe they will probably not scale. Cron solutions have come to my mind as well, but the question was asked - out of curiosity - to focus on SQL queries and not other solutions.
In my own case and completely outside the scope of my own question, I believe the best solution is to create a separate table with two fields (hour [Y-m-d H], onlinecount, playingcount) that counts the number of people online at a certain hour and the people playing at a certain hour. When a player stops playing or goes offline we update the count (+1) based on the start and end times. Thus I can easily deduce tables and graphs from this separate table.
Please, let me know whether you come to the same conclusion. My thanks to #lolo, #rsanchez and #abasterfield. I wish I could split the bounty :)

sqlFiddle, this query will give you the period that has the most userCount, the period could be between anytime, it just gives you the start time and end time that has the most userCount
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT starttime,(SELECT MIN(endtime) FROM
(SELECT DISTINCT started as endtime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as endtime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T1
WHERE T1.endtime > T2.starttime
)as endtime
FROM
(SELECT DISTINCT started as starttime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
UNION
SELECT DISTINCT ended as starttime FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T2
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
just change the date of '1970-01-01' occurences to the date you're trying to get data from.
What the query does it selects all the times in the inner queries and then create intervals out of them, then join with GameActivity and count occurrences of users within those intervals and return the interval with the most userCount(most activity).
here's an sqlFiddle with one less tier
SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
(
SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
(SELECT DISTINCT started as starttime,(SELECT MIN(ended)as endtime FROM
gameactivity T1 WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
AND T1.ended > T2.started
)as endtime
FROM
gameactivity T2
WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
)T3,
GameActivity GA
WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
AND T3.EndTime BETWEEN GA.Started AND GA.Ended
)FinalTable
GROUP BY StartTime,EndTime
ORDER BY UserCount DESC
LIMIT 1
or according to your query in your question above, you don't seem to care about dates, but only hour statistics across all dates then the below query might do it (your query just looks at the HOUR of started and ended and ignore users that play longer than 1 hour.
the below query might do it for you sqlFiddle
SELECT COUNT(*) as UserCount,
HOURSTABLE.StartHour,
HOURSTABLE.EndHour
FROM
(SELECT #hour as StartHour,
#hour:=#hour + 1 as EndHour
FROM
gameActivity as OrAnyTableWith24RowsOrMore,
(SELECT #hour:=0)as InitialValue
LIMIT 24) as HOURSTABLE,
gameActivity GA
WHERE HOUR(GA.started) >= HOURSTABLE.StartHour
AND HOUR(GA.ended) <= HOURSTABLE.EndHour
GROUP BY HOURSTABLE.StartHour,HOURSTABLE.EndHour
ORDER BY UserCount DESC
LIMIT 1
just delete the LIMIT 1 if you want to see userCount for other hours as well.

The easiest solution is to run a cron at the top of each hour of who has a start time but no end time (null end time? if you reset it when they login) and log that count. This will give you a count of currently logged in at each hour without needing to do funky schema changes or wild queries.
Now when you check the next hour and they had logged out they would fall out of your results. This query would work if you reset end time when they login.
SELECT CONCAT(CURDATE(), ' ', HOUR(NOW()), ' ', COUNT(*)) FROM activity WHERE DATE(start) = CURDATE() AND end IS NULL;
Then you can log this at your hearts content to a file or to another table (Of course you might need to adjust the select per your log table). For example you can have a table that gets one entry per day and only gets updated once.
Assume a log table like:
current_date | peak_hour | peak_count
SELECT IF(peak_count< $peak_count, true, false) FROM log where DATE(current_date) = NOW();
where $peak_count is a variable coming from your cron. If you find that you have a new bigger peak count you do an update, if the record does not exist for the day do an insert into log. Otherwise, no you have not beat a peak_hour from earlier in the day, don't do an update. This means each day will give you only 1 row in your table. Then you don't need to do any aggregation, it is all right there for you to see the date and hour over the course of a week or month or whatever.

Related

Count maximum number of overlapping date ranges in MySQL 5.6

I am creating a vehicle rental application. I was trying find overlapping booking in given dates. I come across a similar question Count maximum number of overlapping date ranges in MySQL but this only answered for MySQL 8.0.
I modified above question for my problem.
I need solution for MySQL 5.6 without window function.
create table if not exists BOOKING
(
start datetime null,
end datetime null,
vehicle_id varchar(255),
id int auto_increment
primary key
);
INSERT INTO BOOKING (start, end, vehicle_id)
VALUES
('2020-02-06 10:33:55', '2020-02-07 10:34:41', 111),
('2020-02-08 10:33:14', '2020-02-10 10:33:57', 111),
('2020-02-06 10:32:55', '2020-02-07 10:33:32', 222),
('2020-08-06 10:33:03', '2020-02-11 10:33:12', 111),
('2020-02-12 10:31:38', '2020-02-15 10:32:41', 111),
('2020-02-09 09:48:44', '2020-02-10 09:50:37', 222);
Suppose If I give start as 2020-02-05 and end as 2020-02-11, this should return 2, as maximum usage of vehicle 111, is 2 from 2020-02-06 to 2020-02-10
5 6 7 8 9 10 11
<--> <------>
<----------------> (Vehicle Id 111, ANSWER should be 2)
for vehicle id 222, (For same query)
5 6 7 8 9 10 11
<--> <---> (Vehicle Id 222, ANSWER should be 1)
So overall output I am expecting for input start(2020-02-05) and end(2020-02-11)
+---------+-------+
| vehicle | usage |
+---------+-------+
| 111 | 2 |
| 222 | 1 |
+---------+-------+
I need solution which covers followings
on passing start_date and end_date my query will return data only for that range
If no data found should return vehicle_id 0
The maximum number of overlaps occurs when a rental starts (although it might persist for a period of time, this is all you care about).
You can calculate this for each start using:
SELECT b.vehicle_id, b.start, COUNT(*)
FROM booking b JOIN
booking b2
ON b2.vehicle_id = b.vehicle_id AND
b2.start <= b.start AND
b2.end > b.start
WHERE b.start <= $end and b.end >= $start
GROUP BY b.vehicle_id, b.start;
Then for the maximum:
SELECT vehicle_id, MAX(overlaps)
FROM (SELECT b.vehicle_id, b.start, COUNT(*) as overlaps
FROM booking b JOIN
booking b2
ON b2.vehicle_id = b.vehicle_id AND
b2.start <= b.start AND b2.end > b.start
GROUP BY b.vehicle_id, b.start
) b
GROUP BY vehicle_id;
Here is a db<>fiddle.
Performance on this type of query is never going to be as good as using window functions. However, an index on (vehicle_id, start, end) would help.
SELECT vehicle_id vehicle, MAX(cnt) `usage`
FROM ( SELECT booking.vehicle_id, timepoints.dt, COUNT(*) cnt
FROM booking
JOIN ( SELECT start dt FROM booking
UNION ALL
SELECT `end` FROM booking ) timepoints ON timepoints.dt BETWEEN booking.start AND booking.`end`
GROUP BY booking.vehicle_id, timepoints.dt ) subquery
GROUP BY vehicle_id;
fiddle
PS. Misprint in 4th row is corrected.

How to find datetimes where some conditions hold in MySQL?

We have a MySQL database containing bookings on different courts. Table properties (shortened):
CREATE TABLE `booking` (
`startDate` datetime NOT NULL,
`endDate` datetime NOT NULL,
`courtId` varchar(36),
FOREIGN KEY (`courtId`) REFERENCES `court` (`id`) ON DELETE CASCADE
)
Usually, bookings are paid, but under certain conditions (which I can check in the WHERE-part of a query), bookings can be free.
Given a court and booking duration, I want to query the next datetime at which the booking can be created for free. The conditions are not the problem, the problem is how to query not for entities but for datetime values.
How to realize this efficiently in MySQL?
EDIT: Maybe it helps to outline the conditions under which bookings are free:
The conditions under which bookings are free are dependent on how many courts are offered at the startDate by someone (courts are always offered except if there are special "not-offered"-bookings on that court) and how many other bookings overlapping the startDate are already free. This means bookings can be (and probably are) free even if there are no bookings at all in the database.
Solution
Finding available slot before the last booking :
Find the difference between each booking with it's following one. If the difference is greater than the number of days of the new booking, you can use that slot.
Finding available slot after the last booking :
If there is no such slot, you can assign a day after the end date of the last booking.
If this query returns null, it means there is no booking for the court. You can handle that in the client side.
Code
SET #c := 1; # Court id
SET #n := 2; # Number of days
/*
Previous booking
*/
SET #i := 0;
CREATE TEMPORARY TABLE bp AS
SELECT #i := #i + 1 AS id, startDate, endDate FROM booking
WHERE courtId = #c
ORDER BY startDate;
/*
Next booking
*/
SET #i := -1;
CREATE TEMPORARY TABLE bn AS
SELECT #i := #i + 1 AS id, startDate, endDate FROM booking
WHERE courtId = #c
ORDER BY startDate;
/*
Finding available slot before the last booking (Intermediate slot).
*/
SELECT DATE_ADD(MIN(bp.endDate), INTERVAL 1 DAY) INTO #si FROM
bp
JOIN
bn
ON bn.id = bp.id
WHERE DATEDIFF(bn.startDate, bp.endDate) > #n;
/*
Finding available slot after the last booking
*/
SELECT DATE_ADD(MAX(endDate), INTERVAL 1 DAY) INTO #sa FROM bn;
SELECT IFNULL(#si, #sa);
Using the code
Just replace the values of the variables #c and #n.
An idea to solve this is to rephrase it as: for the given :court_id parameter, give me the smallest future end_time for which no other booking starts within the given :duration parameter.
This can be expressed in different ways in SQL.
With a not exists condition and a correlated subquery that ensures that no further booking on the same court starts within :duration minutes.
select min(b.end_date) next_possible_start_date
from bookings b
where
b.court_id = :court_id
and b.end_date > now()
and not exists (
select 1
from bookings b1
where
b.court_id = :court_id
and b1.start_date > b.end_date
and b1.start_date < DATE_ADD(b.end_date, interval :duration minute)
)
Note: if you have additional conditions, they must be repeated in the where clause of the query and of the subquery.
The same logic as not exists can be impemented with a left join antipattern
select min(b.end_date) next_possible_start_date
from bookings b
left join booking b1
on b1.court_id = b.court_id
and bi1.start_date > b.end_date
and b1.start < DATE_ADD(b.end_date, interval :duration minute)
where
b.court_id = :court_id
and b.end_date > now()
and b1.court_id is null
In MySQL 8.0, it is also possible to use window functions: lag() retrieves the start_date of the next booking, which can then be compared with the end_date of the current booking.
select min(end_date) next_possible_start_date
from (
select
end_date,
lead(start_date) over(partition by court_id order by start_date) next_start_date
from booking b
where court_id = :court_id
)
where
next_start_date is null
or next_start_date >= DATE_ADD(end_date, interval :duration minute)
Edit
Here is a new version of the query that adresses the use case when the court is immediatly free at the time when the search is performed:
select
court_id,
greatest(min(b.end_date), now()) next_possible_start_date
from bookings b
where
-- b.court_id = :court_id and
not exists (
select 1
from bookings b1
where
b1.court_id = b.court_id
and b1.start_date > b.end_date
and b1.start_date < date_add(greatest(b.end_date, now()), interval ::duration minute)
)
group by court_id
Note: this searches for all available courts at once; you can uncomment the where clause to filter on a specific court.
Given this sample data:
court_id | start_date | end_date
-------: | :------------------ | :------------------
1 | 2019-10-29 13:00:00 | 2019-10-29 13:30:00
1 | 2019-10-29 14:00:00 | 2019-10-29 15:00:00
2 | 2019-10-29 23:14:05 | 2019-10-30 00:14:05
2 | 2019-10-30 01:14:05 | 2019-10-30 02:14:05
Court 1 is immedialty free. Court 2 is booked for next hour, then there is a 60 minutes vacancy before the next booking.
If we run the query for a duration of 60 minutes, we get:
court_id | next_possible_start_date
-------: | :-----------------------
1 | 2019-10-29 23:14:05 -- available right now
2 | 2019-10-30 00:14:05 -- available in 1 hour
While for 90 minutes, we get:
court_id | next_possible_start_date
-------: | :-----------------------
1 | 2019-10-29 23:14:05 -- available right now
2 | 2019-10-30 02:14:05 -- available in 3 hours
Demo on DB Fiddle

How to calculate time outside of work hours

This seemed pretty straight forward initially, but has proved to be a real headache. Below is my table, data, expected output and SQL Fiddle of where I have got to in solving my problem.
Schema & Data:
CREATE TABLE IF NOT EXISTS `meetings` (
`id` int(6) unsigned NOT NULL,
`user_id` int(6) NOT NULL,
`start_time` DATETIME,
`end_time` DATETIME,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `meetings` (`id`, `user_id`, `start_time`, `end_time`) VALUES
('0', '1', '2018-05-09 04:30:00', '2018-05-09 17:30:00'),
('1', '1', '2018-05-10 06:30:00', '2018-05-10 17:30:00'),
('2', '1', '2018-05-10 12:30:00', '2018-05-10 16:00:00'),
('3', '1', '2018-05-11 17:00:00', '2018-05-12 11:00:00'),
('4', '2', '2018-05-11 07:00:00', '2018-05-12 11:00:00'),
('5', '2', '2018-05-11 04:30:00', '2018-05-11 15:00:00');
What I would like to get from the above is total time worked outside of 09:00 to 17:00, grouped by day and user_id. So the result from the above data would look like:
date | user_id | overtime_hours
---------------------------------------
2018-05-09 | 1 | 05:00:00
2018-05-10 | 1 | 03:00:00
2018-05-11 | 1 | 07:00:00
2018-05-12 | 1 | 09:00:00
2018-05-11 | 2 | 13:30:00
2018-05-12 | 2 | 09:00:00
As you can see the expected results are only summing overtime for each day and user for those hours outside of 9 to 5.
Below is the query and SQL Fiddle of where I am. The main issue comes when the start and ends straddle midnight (or multiple midnight's)
SELECT
SEC_TO_TIME(SUM(TIME_TO_SEC(TIME(end_time)) - TIME_TO_SEC(TIME(start_time)))), user_id, DATE(start_time)
FROM
(SELECT
start_time, CASE WHEN TIME(end_time) > '09:00:00' THEN DATE_ADD(DATE(end_time), INTERVAL 9 HOUR) ELSE end_time END AS end_time, user_id
FROM
meetings
WHERE
TIME(start_time) < '09:00:00'
UNION
SELECT
CASE WHEN TIME(start_time) < '17:00:00' THEN DATE_ADD(DATE(start_time), INTERVAL 17 HOUR) ELSE start_time END AS start_time, end_time, user_id
FROM
meetings
WHERE
TIME(end_time) > '17:00:00') AS clamped_times
GROUP BY user_id, DATE(start_time)
http://sqlfiddle.com/#!9/77bc85/1
Pastebin for when the fiddle decides to flake: https://pastebin.com/1YvLaKbT
As you can see the query grabs the easy overtime with start and ends on the same day, but does not work with the multiple day ones.
If the meeting is going to span across n days, and you are looking to compute "work hours" daywise within a particular meeting; it rings a bell, that we can use a number generator table.
(SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen
We will use the number generator table to consider separate rows for the individual dates ranging from the start_time to end_time. For this case, I have assumed that it is unlikely that meeting will span across more than 2 days. If it happens to span more number of days, you can easily extend the range by adding more UNION ALL SELECT 3 .. to the ngen Derived Table.
Based on this, we will determine "start time" and "end time" to consider for a specific "work date" in an ongoing meeting. This calculation is being done in a Derived Table, for a grouping of user_id and "work date".
Afterwards, we can SUM() up "working hours" per day of a user using some maths. Please find the query below. I have added extensive comments to it; do let me know if anything is still unclear.
Demo on DB Fiddle
Query #1
SELECT
dt.user_id,
dt.wd AS date,
SEC_TO_TIME(SUM(
CASE
/*When both start & end times are less than 9am OR more than 5pm*/
WHEN (st < TIME_TO_SEC('09:00:00') AND et < TIME_TO_SEC('09:00:00')) OR
(st > TIME_TO_SEC('17:00:00') AND et > TIME_TO_SEC('17:00:00'))
THEN et - st /* straightforward difference between the two times */
/* atleast one of the times is in 9am-5pm block, OR,
start < 9 am and end > 5pm.
Math of this can be worked out based on signum function */
ELSE GREATEST(0, TIME_TO_SEC('09:00:00') - st) +
GREATEST(0, et - TIME_TO_SEC('17:00:00'))
END
)) AS working_hours
FROM
(
SELECT
m.user_id,
/* Specific work date */
DATE(m.start_time) + INTERVAL ngen.gap DAY AS wd,
/* Start time to consider for this work date */
/* If the work date is on the same date as the actual start time
we consider this time */
CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.start_time)
THEN TIME_TO_SEC(TIME(m.start_time))
/* We are on the days after the start day */
ELSE 0 /* 0 seconds (start of the day) */
END AS st,
/* End time to consider for this work date */
/* If the work date is on the same date as the actual end time
we consider this time */
CASE WHEN DATE(m.start_time) + INTERVAL ngen.gap DAY = DATE(m.end_time)
THEN TIME_TO_SEC(TIME(m.end_time))
/* More days to come still for this meeting,
we consider the end of this day as end time */
ELSE 86400 /* 24 hours * 3600 seconds (end of the day) */
END AS et
FROM meetings AS m
JOIN (SELECT 0 AS gap UNION ALL SELECT 1 UNION ALL SELECT 2) AS ngen
ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)
) AS dt
GROUP BY dt.user_id, dt.wd;
Result
| user_id | date | working_hours |
| ------- | ---------- | ------------- |
| 1 | 2018-05-09 | 05:00:00 |
| 1 | 2018-05-10 | 03:00:00 |
| 1 | 2018-05-11 | 07:00:00 |
| 1 | 2018-05-12 | 09:00:00 |
| 2 | 2018-05-11 | 13:30:00 |
| 2 | 2018-05-12 | 09:00:00 |
Further Optimization Possibilities:
This query can do away with the usage of subquery (Derived Table) very easily. I just wrote it in this way, to convey the mathematics and process in a followable manner. However, you can easily merge the two SELECT blocks to a single query.
Maybe, more optimization possible in usage of Date/Time functions, as well as further simplification of mathematics in it. Function details available at: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
Some date calculations are done multiple times, e.g., DATE(m.start_time) + INTERVAL ngen.gap DAY. To avoid recalculation, we can utilize User-defined variables, which will also make the query less verbose.
Make this JOIN condition sargable: JOIN .. ON DATE(start_time) + INTERVAL ngen.gap DAY <= DATE(end_time)

SQL query to find free rooms in hotel

I use a query to find free rooms from hotel DB. I wrote a query which select rooms that not in booking table:
SELECT * FROM room WHERE roomId NOT IN
(SELECT roomId FROM booking b WHERE STR_TO_DATE('${endDate}', '%m-%d-%Y') <= b.endDate AND
STR_TO_DATE('${startDate}', '%m-%d-%Y') >= b.startDate);
My booking table looks like:
+-----------+------------+------------+--------+---------+
| bookingId | startDate | endDate | roomId | guestId |
+-----------+------------+------------+--------+---------+
| 1 | 2016-03-12 | 2016-03-22 | 1 | 1 |
+-----------+------------+------------+--------+---------+
But if my startDate is 2016-03-10 and endDate is 2016-03-25 - I've got already booked room from 2016-03-12 to 2016-03-22. How I can fix it? I don't need to show room that booked between my dates.
General approach for the problem of finding free rooms in range ($BOOKING_BEGIN <=> $BOOKING_END) would be like:
SELECT
rooms.room_id
FROM
rooms
LEFT JOIN
bookings
ON (
bookings.room_id = rooms.room_id AND
NOT (
(bookings.begin < $BOOKING_BEGIN and bookings.end < $BOOKING_BEGIN)
OR
(bookings.begin > $BOOKING_END and bookings.end > $BOOKING_END)
)
)
WHERE
bookings.room_id IS NULL;
Which simply means 'take all the rooms in the hotel, and join them with ones which are already booked. If there's null, that means room is free in a given range (Join didn't find existing booking).
Here is the query that works, and has been tested for all combinations of vacancy before any other. vacancy after anything. Start date before, on, after existing start. End date before, on, after existing end date. Totally straddling outside another booking. And totally within another booking.
select
r.RoomID
from
Room r
LEFT JOIN
( select
b.RoomID
from
booking b,
( select #parmStartDate := '2016-01-21',
#parmEndDate := '2016-01-23' ) sqlvars
where
b.EndDate >= #parmStartDate
AND b.StartDate <= #parmEndDate
AND ( timestampdiff( day, b.StartDate, #parmEndDate )
* timestampdiff( day, #parmStartDate, b.EndDate )) > 0 ) Occupied
ON r.RoomID = Occupied.RoomID
where
Occupied.RoomID IS NULL;
The sample booking data I created included
BookID RoomID StartDate EndDate
1 1 2016-02-03 2016-02-04
2 1 2016-02-04 2016-02-08
3 1 2016-02-12 2016-02-16
4 1 2016-02-20 2016-02-28
I then tested with the following booking dates and came up with the following valid vacancy vs conflict and already occupied. This test is just for a single room, but obviously applicable for any room in the hotel.
Both dates before anything on file... Room available
2016-01-10 - 2016-01-15
Both dates after anything on file... Room available
2016-03-10 - 2016-03-15
Occupied ID 1 -- Same start date
2016-02-03 - 2016-02-04
Occupied ID 2 -- Same start date, but less than existing occupied end date
2016-02-04 - 2016-02-05
Occupied ID 2 -- Same start, Exceeds end occupancy date
2016-02-04 - 2016-02-09
Occupied ID 3 -- Start before, but end date WITHIN existing booking
2016-02-09 - 2016-02-13
Available. The END Date is the START Date of the existing booking
(Between 2 & 3 booking)
2016-02-09 - 2016-02-12
Occupied ID 3 -- Started within date, but end outside existing booking
2016-02-15 - 2016-02-17
Available. End of existing booking and nothing booked on 2/17
2016-02-16 - 2016-02-17
Occupied ID 3 -- Completely encompasses booking entry
2016-02-11 - 2016-02-17
Occupied ID 4 -- totally WITHIN another entry
2016-02-21 - 2016-02-23
Now, to explain what is going on. I did with a LEFT-JOIN and looking for NULL (ie: no conflict of another booking) which is quite similar to your NOT IN subselect. So I will skip that part.
First, the FROM clause. So I dont have to "declare" variables like a stored procedure, I am doing them IN-LINE via the #parmStartDate, #parmEndDate and assigning the alias sqlvars just for declaration purposes. Since this returns one row, having a Cartesian applied to the booking table is no problem.
from
booking b,
( select #parmStartDate := '2016-01-21',
#parmEndDate := '2016-01-23' ) sqlvars
Now, the WHERE clause. If your table has years worth of bookings after time, and 100's of rooms, this could get quite large quickly, so I want to pre-start with only those dates where existing bookings would come into place this is the
where
b.EndDate >= #parmStartDate
AND b.StartDate <= #parmEndDate
At a minimum, I only care about those bookings where an existing checkout date is AT LEAST the date you are trying to find availability. Ex: You are looking for a check-in date of July 4th. Why would you even care if someone checked out in Feb, Mar, Apr, etc... So now, how far out do you go... You also only care for those records where the next existing booking has a START Date UP TO the day you would be checking out. So, if checking out July 6th, you don't care about any bookings for July 7th or after. So far, so good.
Now, comes the how do I know if a room is occupied or not. I was having difficulties comparing existing Start Date to looking for dates and was getting false answers, so I had to resort to date math and comparing start to end and end to start, and if the multiplier result was positive, there is a conflict.
AND ( timestampdiff( day, b.StartDate, #parmEndDate )
* timestampdiff( day, #parmStartDate, b.EndDate )) > 0 )
Since we already know we have records within the POSSIBLE date range, this is doing a conflict check in either direction for full outside, inside, conflict left or conflict right. it just works.
You would have to see it to understand it better and this is the query that I ran so you could look at the results for yourself. Just plug in the respective start / end dates you are looking for.
select
b.BookID,
b.RoomID,
b.StartDate,
b.EndDate,
#parmStartDate as pStart,
#parmEndDate as pEnd,
( timestampdiff( day, b.StartDate, #parmEndDate )
* timestampdiff( day, #parmStartDate, b.EndDate )) <= 0 as Available,
( timestampdiff( day, b.StartDate, #parmEndDate )
* timestampdiff( day, #parmStartDate, b.EndDate )) > 0 as Occupied
from
booking b,
( select #parmStartDate := '2016-01-21',
#parmEndDate := '2016-01-23' ) sqlvars
Good Luck...

MySQL find number of students in attendance broken down by time

I have a table containing arriving and departing times for students attending a class. Given something like this data:
CREATE TABLE `attendance` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`class_id` int(11) DEFAULT NULL,
`student_id` int(11) NOT NULL DEFAULT '0',
`arrival` datetime DEFAULT NULL,
`departure` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `attendance` (`id`, `class_id`, `student_id`, `arrival`, `departure`)
VALUES
(1,1,1,'2013-01-01 16:00:00','2013-01-01 17:00:00'),
(2,1,2,'2013-01-01 16:00:00','2013-01-01 18:00:00'),
(3,1,3,'2013-01-01 17:00:00','2013-01-01 19:00:00'),
(4,1,4,'2013-01-01 17:00:00','2013-01-01 19:00:00'),
(5,1,5,'2013-01-01 17:30:00','2013-01-01 18:30:00');
I'm trying to get a breakdown of time in minutes, and how many students are present for that time period. A result something like this from the above data:
Time Students
60 2 (the first hour from 16:00 to 17:00 has students 1 & 2)
30 3 (the next 30 minutes from 17:00 to 17:30 has students 2, 3 & 4)
30 4 (etc...)
30 3
30 2
The select statement I have so far is getting some way towards the answer but I can't quite get it working:
SELECT a.id, a.arrival, b.id, LEAST(a.departure,b.departure) AS departure,
TIMEDIFF((LEAST(a.departure,b.departure)),(a.arrival)) AS subtime
FROM attendance a
JOIN attendance b ON (a.id <> b.id and a.class_id=b.class_id
and a.arrival >= b.arrival and a.arrival < b.departure)
WHERE a.class_id=1
ORDER BY a.arrival, departure, b.id;
Thank you in advance to anyone who can help me get this right.
Using correlated sub-queries you can create virtual tables (not the same as a temporary table, but kinda the same idea). You can then query against these virtual tables just as if they really existed.
select clocks.clock, count( att.student_id ) as numStudents
from
(
( select arrival as clock from attendance )
union distinct
( select departure as clock from attendance )
)
as clocks
left outer join attendance att on att.arrival <= clocks.clock and clocks.clock < att.departure
group by clocks.clock
order by 1,2
;
Almost what you are looking for. Rather than group by elapsed time, this uses the actual 'event' timestamps (arrivals and departures) and gives you a useful report.
clock numStudents
------------------- -----------
2013-01-01 16:00:00 2
2013-01-01 17:00:00 3
2013-01-01 17:30:00 4
2013-01-01 18:00:00 3
2013-01-01 18:30:00 2
2013-01-01 19:00:00 0
The report shows how many students are still 'here' at each event time.
Hopefully this is useful for you.