Find big enough gaps in booking table - mysql

A rental system uses a booking table to store all bookings and reservations:
booking | item | startdate | enddate
1 | 42 | 2013-10-25 16:00 | 2013-10-27 12:00
2 | 42 | 2013-10-27 14:00 | 2013-10-28 18:00
3 | 42 | 2013-10-30 09:00 | 2013-11-01 09:00
…
Let’s say a user wants to rent item 42 from 2013-10-27 12:00 until 2013-10-28 12:00 which is a period of one day. The system will tell him, that the item is not available in the given time frame, since booking no. 2 collides.
Now I want to suggest the earliest rental date and time when the selected item is available again. Of course considering the user’s requested period (1 day) beginning with the user’s desired date and time.
So in the case above, I’m looking for an SQL query that returns 2013-10-28 18:00, since the earliest date since 2013-10-27 12:00 at which item 42 will be available for 1 day, is from 2013-10-28 18:00 until 2013-10-29 18:00.
So I need to to find a gap between bookings, that is big enough to hold the user’s reservation and that is as close a possible to the desired start date.
Or in other words: I need to find the first booking for a given item, after which there’s enough free time to place the user’s booking.
Is this possible in plain SQL without having to iterate over every booking and its successor?

If you can't redesign your database to use something more efficient, this will get the answer. You'll obviously want to parameterize it. It says find either the desired date, or the earliest end date where the hire interval doesn't overlap an existing booking:
Select
min(startdate)
From (
select
cast('2013-10-27 12:00' as datetime) startdate
from
dual
union all
select
enddate
from
booking
where
enddate > cast('2013-10-27 12:00' as datetime) and
item = 42
) b1
Where
not exists (
select
'x'
from
booking b2
where
item = 42 and
b1.startdate < b2.enddate and
b2.startdate < date_add(b1.startdate, interval 24 hour)
);
Example Fiddle

SELECT startfree,secondsfree FROM (
SELECT
#lastenddate AS startfree,
UNIX_TIMESTAMP(startdate)-UNIX_TIMESTAMP(#lastenddate) AS secondsfree,
#lastenddate:=enddate AS ignoreme
FROM
(SELECT startdate,enddate FROM bookings WHERE item=42) AS schedule,
(SELECT #lastenddate:=NOW()) AS init
ORDER BY startdate
) AS baseview
WHERE startfree>='2013-10-27 12:00:00'
AND secondsfree>=86400
ORDER BY startfree
LIMIT 1
;
Some explanation: The inner query uses a variable to move the iteration into SQL, the outer query finds the needed row.
That said, I would not do this in SQL, if the DB structure is like the given. You could reduce the iteration count by using some smort WHERE in the inner query to a sane timespan, but chances are, this won't perform well.
EDIT
A caveat: I did not check, but I assume, this won't work, if there are no prior reservations in the list - this should not be a problem, as in this case your first reservation attempt (original time) will work.
EDIT
SQLfiddle

Searching for overlapping date ranges generally yields poor performance in SQL. For that reason having a "Calendar" of available slots often makes things a lot more efficient.
For example, the booking 2013-10-25 16:00 => 2013-10-27 12:00 would actually be represented by 44 records, each one hour long.
The "gap" until the next booking at 2013-10-27 14:00 would then be represented by 2 records, each one hours long.
Then, each record could also have the duration (in time, or number of slots) until the next change.
slot_start_time | booking | item | remaining_duration
------------------+---------+------+--------------------
2013-10-27 10:00 | 1 | 42 | 2
2013-10-27 11:00 | 1 | 42 | 1
2013-10-27 12:00 | NULL | 42 | 2
2013-10-27 13:00 | NULL | 42 | 1
2013-10-27 14:00 | 2 | 42 | 28
2013-10-27 15:00 | 2 | 42 | 27
... | ... | ... | ...
2013-10-28 17:00 | 2 | 42 | 1
2013-10-28 18:00 | NULL | 42 | 39
2013-10-28 19:00 | NULL | 42 | 38
Then your query just becomes:
SELECT
*
FROM
slots
WHERE
slot_start_time >= '2013-10-27 12:00'
AND remaining_duration >= 24
AND booking IS NULL
ORDER BY
slot_start_time ASC
LIMIT
1

OK this isn't pretty in MySQL. That's because we have to fake rownum values in subqueries.
The basic approach is to join the appropriate subset of the booking table to itself offset by one.
Here's the basic list of reservations for item 42, ordered by reservation time. We can't order by booking_id, because those aren't guaranteed to be in order of reservation time. (You're trying to insert a new reservation between two existing ones, eh?) http://sqlfiddle.com/#!2/62383/9/0
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
Here is that subset joined to itself. The trick is the a.rownum+1 = b.rownum, which joins each row to the one that comes right after it in the booking table subset. http://sqlfiddle.com/#!2/62383/8/0
SELECT a.booking_id, a.startdate asta, a.enddate aend,
b.startdate bsta, b.enddate bend
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
Here it is again, showing each reservation (except the last one) and the number of hours following it. http://sqlfiddle.com/#!2/62383/15/0
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
So, if you're looking for the starting time and ending time of the earliest twelve-hour slot you can use that result set to do this: http://sqlfiddle.com/#!2/62383/18/0
SELECT MIN(enddate) startdate, MIN(enddate) + INTERVAL 12 HOUR as enddate
FROM (
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
) AS gaps
WHERE gaphours >= 12

here is the query, it will return needed date, obvious condition - there should be some bookings in table, but as I see from question - you do this check:
SELECT min(enddate)
FROM
(
select a.enddate from table4 as a
where
a.item=42
and
DATE_ADD(a.enddate, INTERVAL 1 day) <= ifnull(
(select min(b.startdate)
from table4 as b where b.startdate>=a.enddate and a.item=b.item),
a.enddate)
and
a.enddate>=now()
union all
select greatest(ifnull(max(enddate), now()),now()) from table4
) as q
you change change INTERVAL 1 day to INTERVAL ### hour

If I have understood your requirements correctly, you could try self-JOINing book with itself, to get the "empty" spaces, and then fit. This is MySQL only (I believe it can be adapted to others - certainly PostgreSQL):
SELECT book.*, TIMESTAMPDIFF(MINUTE, book.enddate, book.best) AS width FROM
(
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 USING (item)
WHERE item = 42 AND book1.startdate >= book.enddate
GROUP BY book.booking
) AS book HAVING width > 110 ORDER BY startdate LIMIT 1;
In the above example, "110" is the looked-for minimum width in minutes.
Same thing, a bit less readable (for me), a SELECT removed (very fast SELECT, so little advantage):
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 ON (book.item = book1.item AND book.item = 42)
WHERE book1.startdate >= book.enddate
GROUP BY book.booking
HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) > 110
ORDER BY startdate LIMIT 1;
In your case, one day is 1440 minutes and
SELECT book.*, MIN(book1.startdate) AS best FROM book JOIN book AS book1 ON (book.item = book1.item AND book.item = 42) WHERE book1.startdate >= book.enddate GROUP BY book.booking HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) >= 1440 ORDER BY startdate LIMIT 1;
+---------+------+---------------------+---------------------+---------------------+
| booking | item | startdate | enddate | best |
+---------+------+---------------------+---------------------+---------------------+
| 2 | 42 | 2013-10-27 14:00:00 | 2013-10-28 18:00:00 | 2013-10-30 09:00:00 |
+---------+------+---------------------+---------------------+---------------------+
1 row in set (0.00 sec)
...the period returned is 2, i.e., at the end of booking 2, and until "best" which is booking 3, a period of at least 1440 minutes is available.
An issue could be that if no periods are available, the query returns nothing -- then you need another query to fetch the farthest enddate. You can do this with an UNION and LIMIT 1 of course, but I think it would be best to only run the 'recovery' query on demand, programmatically (i.e. if empty(query) then new_query...).
Also, in the inner WHERE you should add a check for NOW() to avoid dates in the past. If expired bookings are moved to inactive storage, this could be unnecessary.

Related

How to make time buckets with a start and end time column?

I have 3 columns, employee_id, start_time and end_time I want to make bucks of 1 hour to show me how many employees were working in each hour. For example, employee A worked from 12 pm to 3 pm and employee B worked from 2 pm to 4 pm so, at 12 pm (1 employee was working) 1 pm (1 employee) 2 pm (2 employees were working) 3 pm (2 employees) and 4 pm (1 employee), how can I make this in SQL? Let me show you a picture of the start and end time columns.
Sample input would be:
Expected outcome would be something like
I want to create a bucket in order to know how many people were working in each hour of the day.
SELECT
Employee_id,
TIME(shift_start_at,timezone) AS shift_start,
TIME(shift_end_at,timezone) AS shift_end,
FROM
`employee_shifts` AS shifts
WHERE
DATE(shifts.shift_start_at_local) >= "2022-05-01"
GROUP BY
1,
2,
3
Assuming you are on mysql version 8 or above generate all the buckets , left join to shifts to infill times in start-endtime ranges , filter out those that are not applicable then count eg:-
DROP TABLE IF EXISTS t;
create table t (id int, startts datetime, endts datetime);
insert into t values
(1,'2022-06-19 08:30:00','2022-06-19 10:00:00'),
(2,'2022-06-19 08:30:00','2022-06-19 08:45:00'),
(3,'2022-06-19 07:00:00','2022-06-19 07:59:00');
with cte as
(select 7 as bucket union select 8 union select 9 union select 10 union select 11),
cte1 as
(select bucket,t.*,
floor(hour(startts)) starthour, floor(hour(endts)) endhour
from cte
left join t on cte.bucket between floor(hour(startts)) and floor(hour(endts))
)
select bucket,count(id) nof from cte1 group by bucket
;
+--------+-----+
| bucket | nof |
+--------+-----+
| 7 | 1 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
| 11 | 0 |
+--------+-----+
5 rows in set (0.001 sec)
If you have a limited number of time bucket maybe you can use it this way
WITH CTE AS
(SELECT
COUNTRY,
MONTH,
TIMESTAMP_DIFF(time_b, time_a, MINUTE) dt,
METRIC_a,
METRIC_b
FROM
TABLE_NAME)
SELECT
CASE
WHEN dt BETWEEN 0 AND 10 THEN "0-10"
WHEN dt BETWEEN 10 AND 20 THEN "11-20"
WHEN dt BETWEEN 20 AND 30 THEN "21-30"
WHEN dt BETWEEN 30 AND 40 THEN "31-40"
WHEN dt > 40 THEN ">40"
END as time_bucket,
AVG(METRIC_a),
SUM(METRIC_b)
FROM CTE
Althought, I should emphasize that this solution works if you have a limited bucket. If you have a lot of buckets, you can create a base table with your buckets then LEFT JOIN it to get your results.
Just use a subquery for each column mentioning the required timestamp in between, also make sure your start_time and end_time columns are timestamp types. For more information, please share the table structure, sample data, and expected output
If I understood well, this would be
SELECT HOUR, (SELECT COUNT(*)
FROM employee
WHERE start_time <= HOUR
AND end_time >= HOUR) AS working
FROM schedule HOUR
Where schedule is a table with employee schedules.

How to find datetimes where some conditions hold in MySQL?

We have a MySQL database containing bookings on different courts. Table properties (shortened):
CREATE TABLE `booking` (
`startDate` datetime NOT NULL,
`endDate` datetime NOT NULL,
`courtId` varchar(36),
FOREIGN KEY (`courtId`) REFERENCES `court` (`id`) ON DELETE CASCADE
)
Usually, bookings are paid, but under certain conditions (which I can check in the WHERE-part of a query), bookings can be free.
Given a court and booking duration, I want to query the next datetime at which the booking can be created for free. The conditions are not the problem, the problem is how to query not for entities but for datetime values.
How to realize this efficiently in MySQL?
EDIT: Maybe it helps to outline the conditions under which bookings are free:
The conditions under which bookings are free are dependent on how many courts are offered at the startDate by someone (courts are always offered except if there are special "not-offered"-bookings on that court) and how many other bookings overlapping the startDate are already free. This means bookings can be (and probably are) free even if there are no bookings at all in the database.
Solution
Finding available slot before the last booking :
Find the difference between each booking with it's following one. If the difference is greater than the number of days of the new booking, you can use that slot.
Finding available slot after the last booking :
If there is no such slot, you can assign a day after the end date of the last booking.
If this query returns null, it means there is no booking for the court. You can handle that in the client side.
Code
SET #c := 1; # Court id
SET #n := 2; # Number of days
/*
Previous booking
*/
SET #i := 0;
CREATE TEMPORARY TABLE bp AS
SELECT #i := #i + 1 AS id, startDate, endDate FROM booking
WHERE courtId = #c
ORDER BY startDate;
/*
Next booking
*/
SET #i := -1;
CREATE TEMPORARY TABLE bn AS
SELECT #i := #i + 1 AS id, startDate, endDate FROM booking
WHERE courtId = #c
ORDER BY startDate;
/*
Finding available slot before the last booking (Intermediate slot).
*/
SELECT DATE_ADD(MIN(bp.endDate), INTERVAL 1 DAY) INTO #si FROM
bp
JOIN
bn
ON bn.id = bp.id
WHERE DATEDIFF(bn.startDate, bp.endDate) > #n;
/*
Finding available slot after the last booking
*/
SELECT DATE_ADD(MAX(endDate), INTERVAL 1 DAY) INTO #sa FROM bn;
SELECT IFNULL(#si, #sa);
Using the code
Just replace the values of the variables #c and #n.
An idea to solve this is to rephrase it as: for the given :court_id parameter, give me the smallest future end_time for which no other booking starts within the given :duration parameter.
This can be expressed in different ways in SQL.
With a not exists condition and a correlated subquery that ensures that no further booking on the same court starts within :duration minutes.
select min(b.end_date) next_possible_start_date
from bookings b
where
b.court_id = :court_id
and b.end_date > now()
and not exists (
select 1
from bookings b1
where
b.court_id = :court_id
and b1.start_date > b.end_date
and b1.start_date < DATE_ADD(b.end_date, interval :duration minute)
)
Note: if you have additional conditions, they must be repeated in the where clause of the query and of the subquery.
The same logic as not exists can be impemented with a left join antipattern
select min(b.end_date) next_possible_start_date
from bookings b
left join booking b1
on b1.court_id = b.court_id
and bi1.start_date > b.end_date
and b1.start < DATE_ADD(b.end_date, interval :duration minute)
where
b.court_id = :court_id
and b.end_date > now()
and b1.court_id is null
In MySQL 8.0, it is also possible to use window functions: lag() retrieves the start_date of the next booking, which can then be compared with the end_date of the current booking.
select min(end_date) next_possible_start_date
from (
select
end_date,
lead(start_date) over(partition by court_id order by start_date) next_start_date
from booking b
where court_id = :court_id
)
where
next_start_date is null
or next_start_date >= DATE_ADD(end_date, interval :duration minute)
Edit
Here is a new version of the query that adresses the use case when the court is immediatly free at the time when the search is performed:
select
court_id,
greatest(min(b.end_date), now()) next_possible_start_date
from bookings b
where
-- b.court_id = :court_id and
not exists (
select 1
from bookings b1
where
b1.court_id = b.court_id
and b1.start_date > b.end_date
and b1.start_date < date_add(greatest(b.end_date, now()), interval ::duration minute)
)
group by court_id
Note: this searches for all available courts at once; you can uncomment the where clause to filter on a specific court.
Given this sample data:
court_id | start_date | end_date
-------: | :------------------ | :------------------
1 | 2019-10-29 13:00:00 | 2019-10-29 13:30:00
1 | 2019-10-29 14:00:00 | 2019-10-29 15:00:00
2 | 2019-10-29 23:14:05 | 2019-10-30 00:14:05
2 | 2019-10-30 01:14:05 | 2019-10-30 02:14:05
Court 1 is immedialty free. Court 2 is booked for next hour, then there is a 60 minutes vacancy before the next booking.
If we run the query for a duration of 60 minutes, we get:
court_id | next_possible_start_date
-------: | :-----------------------
1 | 2019-10-29 23:14:05 -- available right now
2 | 2019-10-30 00:14:05 -- available in 1 hour
While for 90 minutes, we get:
court_id | next_possible_start_date
-------: | :-----------------------
1 | 2019-10-29 23:14:05 -- available right now
2 | 2019-10-30 02:14:05 -- available in 3 hours
Demo on DB Fiddle

MySQL get count of periods where date in row

I have an MySQL table, similar to this example:
c_id date value
66 2015-07-01 1
66 2015-07-02 777
66 2015-08-01 33
66 2015-08-20 200
66 2015-08-21 11
66 2015-09-14 202
66 2015-09-15 204
66 2015-09-16 23
66 2015-09-17 0
66 2015-09-18 231
What I need to get is count of periods where dates are in row. I don't have fixed start or end date, there can be any.
For example: 2015-07-01 - 2015-07-02 is one priod, 2015-08-01 is second period, 2015-08-20 - 2015-08-21 is third period and 2015-09-14 - 2015-09-18 as fourth period. So in this example there is four periods.
SELECT
SUM(value) as value_sum,
... as period_count
FROM my_table
WHERE cid = 66
Cant figure this out all day long.. Thx.
I don't have enough reputation to comment to the above answer.
If all you need is the NUMBER of splits, then you can simply reword your question: "How many entries have a date D, such that the date D - 1 DAY does not have an entry?"
In which case, this is all you need:
SELECT
COUNT(*) as PeriodCount
FROM
`periods`
WHERE
DATE_ADD(`date`, INTERVAL - 1 DAY) NOT IN (SELECT `date` from `periods`);
In your PHP, just select the "PeriodCount" column from the first row.
You had me working on some crazy stored procedure approach until that clarification :P
I should get deservedly flamed for this, but anyway, consider the following...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(date DATE NOT NULL PRIMARY KEY
,value INT NOT NULL
);
INSERT INTO my_table VALUES
('2015-07-01',1),
('2015-07-02',777),
('2015-08-01',33),
('2015-08-20',200),
('2015-08-21',11),
('2015-09-14',202),
('2015-09-15',204),
('2015-09-16',23),
('2015-09-17',0),
('2015-09-18',231);
SELECT x.*
, SUM(y.value) total
FROM
( SELECT a.date start
, MIN(c.date) end
FROM my_table a
LEFT
JOIN my_table b
ON b.date = a.date - INTERVAL 1 DAY
LEFT
JOIN my_table c
ON c.date >= a.date
LEFT
JOIN my_table d
ON d.date = c.date + INTERVAL 1 DAY
WHERE b.date IS NULL
AND c.date IS NOT NULL
AND d.date IS NULL
GROUP
BY a.date
) x
JOIN my_table y
ON y.date BETWEEN x.start AND x.end
GROUP
BY x.start;
+------------+------------+-------+
| start | end | total |
+------------+------------+-------+
| 2015-07-01 | 2015-07-02 | 778 |
| 2015-08-01 | 2015-08-01 | 33 |
| 2015-08-20 | 2015-08-21 | 211 |
| 2015-09-14 | 2015-09-18 | 660 |
+------------+------------+-------+
4 rows in set (0.00 sec) -- <-- This is the number of periods
there is a simpler way of doing this, see here SQLfiddle:
SELECT min(date) start,max(date) end,sum(value) total FROM
(SELECT #i:=#i+1 i,
ROUND(Unix_timestamp(date)/(24*60*60))-#i diff,
date,value
FROM tbl, (SELECT #i:=0)n WHERE c_id=66 ORDER BY date) t
GROUP BY diff
This select groups over the same difference between sequential number and date value.
Edit
As Strawberry remarked quite rightly, there was a flaw in my apporach, when a period spans a month change or indeed a change into the next year. The unix_timestamp() function can cure this though: It returns the seconds since 1970-1-1, so by dividing this number by 24*60*60 you get the days since that particular date. The rest is simple ...
If you only need the count, as your last comment stated, you can do it even simpler:
SELECT count(distinct diff) period_count FROM
(SELECT #i:=#i+1 i,
ROUND(Unix_timestamp(date)/(24*60*60))-#i diff,
date,value
FROM tbl,(SELECT #i:=0)n WHERE c_id=66 ORDER BY date) t
Tnx. #cars10 solution worked in MySQL, but could not manage to get period count to echo in PHP. It returned 0. Got it working tnx to #jarkinstall. So my final select looks something like this:
SELECT
sum(coalesce(count_tmp,coalesce(count_reserved,0))) as sum
,(SELECT COUNT(*) FROM my_table WHERE cid='.$cid.' AND DATE_ADD(date, INTERVAL - 1 DAY) NOT IN (SELECT date from my_table WHERE cid='.$cid.' AND coalesce(count_tmp,coalesce(count_reserved,0))>0)) as periods
,count(*) as count
,(min(date)) as min_date
,(max(date)) as max_date
FROM my_table WHERE cid=66
AND coalesce(count_tmp,coalesce(count_reserved,0))>0
ORDER BY date;

MYSQL - Sum Interval Dates

I came across the following problem:
I would like to sum the hours of each name, giving a total interval between START and END activities,
would be simple if I could subtract from each record the end of the beginning, more e.g., Mary, started 13th and was up to 15 and started another activity while 14 and 16, I would like the result of it was 3 (she used 3 hours of their time to perform both activities)
e.g.:
Name | START | END |
-----------------------------------------------------------
KATE | 2014-01-01 13:00:00 | 2014-01-01 14:00:00 |
MARY | 2014-01-01 13:00:00 | 2014-01-01 15:00:00 |
TOM | 2014-01-01 13:00:00 | 2014-01-01 16:00:00 |
KATE | 2014-01-01 12:00:00 | 2014-01-02 04:00:00 |
MARY | 2014-01-01 14:00:00 | 2014-01-01 16:00:00 |
TOM | 2014-01-01 12:00:00 | 2014-01-01 18:00:00 |
TOM | 2014-01-01 22:00:00 | 2014-01-02 02:00:00 |
result:
KATE 15 hours
MARY 3 hours
TOM 9 hours
Have you tried a group by and then an aggregate function?
SELECT Name, SUM(UNIX_TIMESTAMP(End) - UNIX_TIMESTAMP(Start)) FROM myTable
GROUP BY Name
Which will return a cumulative total of seconds from the intervals you have. You can then change the seconds to hours for display.
Also I would highly recommend grouping by a primary key or something instead of a string name, but I understand that this may have been just to simplify the question.
I found this problem interesting, so spent a little more time to develop a solution. What I came up with involves sorting the rows by name and start time, then using MySQL variables to account for overlapping ranges. I begin by sorting the table and supplementing it with columns that carry the name and times from one row to the next
SELECT [expounded below]
FROM (SELECT * FROM tbl ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='',
#beg := (SELECT MIN(START) FROM tbl),
#end := (SELECT MAX(END) FROM tbl)) AS t
This adds the name and the outer bounds of the time range to each row of the table, as well as sorting the table so that
names are together in order by starting time. For each row, we will now have #same_name, #beg, and #end carrying values forward from one line to the next, and #x and #gap will accumulate the hours.
Now we have to do some reasoning about the possible overlaps that can occur. For any two intervals, they are either disjoint or have an intersection:
Non-overlapping: beg--------end START-------END
Overlapping: beg-----------end beg---------end
START--------------END START-----------END
Subset: beg---------------------------------end
START-----END
Once the rows are adjacent, we can decide if two ranges overlap by comparing their start and end points. They overlap
if the start of one is before the end of the other and vice versa:
IF( #end >= START && #beg <= END,
If they do overlap, then the total interval is the difference between the outer edges of the two intervals:
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END))
If they don't overlap, then we can just add the new interval to the previous one.
We will also need to know the gap between intervals, which is the difference from the end of the first to the beginning of the second. This will be necessary to calculate the hours for a case of more than two intervals, where only some overlap.
1-----------2 3----------4
3--------------------5
Putting this together gets us a calculation per row, where each row calculates the union of the hours with the one
above it. For each variable, we have to reset it if the name changes:
SELECT Name, START, END,
#x := IF(#same_name = Name,
IF( #end >= START && #beg <= END, -- does it overlap?
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END)),
#x + TIMESTAMPDIFF(HOUR, START, END) ),
TIMESTAMPDIFF(HOUR,START,END) ) AS hr,
#gap := IF(#same_name = Name,
IF(#end >= START && #beg <= END, -- does it overlap?
#gap,
#gap + TIMESTAMPDIFF(HOUR, #end, START)),
0) AS gap,
#beg := IF(#same_name = Name,
CAST(LEAST(#beg, START) AS DATETIME), -- expand interval
START) AS beg, -- reset interval
#end := IF(#same_name = Name,
CAST(GREATEST(#end, END) AS DATETIME),
END) AS finish,
#same_name := Name AS sameName
FROM
(SELECT * FROM xt ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='', #beg := (SELECT MIN(START) FROM xt), #end := (SELECT MAX(END) FROM xt)) AS t
That still gives us as many rows as there were in the original table. The hours and gaps will accumulate for each name, so we have to select the highest values and group by Name:
SELECT Name, MAX(hr) - MAX(gap) AS HOURS
FROM ( [insert above query here] ) AS intermediateCalculcation
GROUP BY Name;
Edit
And of course a moment after hitting enter, it occurs to me that (a) there is a bug for names that have no overlapping intervals at all; and (b) all #x is really doing is building up the interval from MIN(START) to MAX(END) for eacdh name, which could be done with a simpler query and join. Um, exercise for the reader ? :-)

MySQL: change user variable for each selected row

I'm trying to select the first ten empty time slots between appointments in a MySQL database.
The appointment table has basically 3 fields: appointment_id INT, startDateTime DATETIME and endDateTime DATETIME.
We can imagine some data like this (for simplicity's sake, I've left the date part out of the datetime so let's consider these hours are in the same day). Also the data is ordered by startDateTime:
4 | 09:15:00 | 09:30:00
5 | 09:30:00 | 09:45:00
8 | 10:00:00 | 10:15:00
3 | 10:30:00 | 10:45:00
7 | 10:45:00 | 11:00:00
2 | 11:00:00 | 11:15:00
1 | 11:30:00 | 12:00:00
So my goal is to extract:
00:00:00 | 09:15:00
09:45:00 | 10:00:00
10:15:00 | 10:30:00
11:15:00 | 11:30:00
In ended up doing this:
SET #myStart = '2012-10-01 09:15:00';
SET #myEnd = NULL;
SET #prevEnd = NULL;
SELECT a.endDateTime, b.startDateTime, #myStart := a.endDateTime
FROM appointment a, appointment b, (
SELECT #myEnd := min(c.startDateTime)
FROM appointment c
WHERE c.startDateTime >= #myStart
ORDER BY startDateTime ASC
) as var ,
(SELECT #prevEnd := NULL) v
WHERE a.appointment_id = (
SELECT appointment_id
FROM (
SELECT appointment_id, max(endDateTime), #prevEnd := endDateTime
FROM appointment d
WHERE (#prevEnd IS NULL OR #prevEnd = d.startDateTime)
AND d.startDateTime >= #myEnd
) as z
)
AND b.startDateTime > a.endDateTime
ORDER BY b.startDateTime ASC LIMIT 0,10;
This doesn't return any result. I guess it's because of an incorrect initialization of my user defined variables (just discovered them and I may be using them completely wrong).
If I run only the first subquery whose goal is to initialize #myEnd at the first appointment after #myStart, I can see that it in fact returns 09:15:00.
The second subquery (SELECT #prevEnd := NULL) v is meant to set #prevEnd back to NULL each time a row is selected in the main query. I'm not quite sure it works like that...
The last subquery is meant, starting with a null #prevEnd and an initialized #myEnd, to select the appointment after which there is a gap. I could verify that it works too if separated from the rest of the query.
Do you have any advice on what I could do to fix the query, on how I could/should do it otherwise or on wheter it's even possible or not?
Thanks very much in advance.
Edit: I have edited it like this:
SELECT *
FROM (
SELECT COALESCE( s1.endDateTime, '0000-00-00 00:00:00' ) AS myStart, MIN( s2.startDateTime ) AS minSucc
FROM appointment s1
RIGHT JOIN appointment s2 ON s1.endDateTime < s2.startDateTime
AND s1.radiologyroom_id = s2.radiologyroom_id
WHERE s1.startDateTime >= '2012-10-01 00:00:00'
AND s1.radiologyroom_id =174
AND s1.endDateTime < '2013-01-01 00:00:00'
GROUP BY myStart
ORDER BY s1.startDateTime
)s
WHERE NOT
EXISTS (
SELECT NULL
FROM appointment
WHERE startDateTime >= myStart
AND endDateTime <= minSucc
AND radiologyroom_id =174
ORDER BY startDateTime
)
and it retrieves 369 rows in 14.6 seconds out 6530 records
If there are no gaps between ids, and id is always increasing, you could use this:
SELECT coalesce(s1.endDateTime, '0000-00-00 00:00:00'), s2.startDateTime
FROM
slots s1 right join slots s2
on s1.appointment_id=s2.appointment_id-1
WHERE coalesce(s1.endDateTime, '0000-00-00 00:00:00')<s2.startDateTime
LIMIT 10
EDIT: you can also try this:
SELECT * FROM
(SELECT
coalesce(s1.endDateTime, '0000-00-00 00:00:00') as start,
min(s2.startDateTime) minSucc
from slots s1 right join slots s2
on s1.endDateTime<s2.startDateTime
group by start) s
WHERE
not exists (select null
from slots
where startDateTime>=start
and endDateTime<=minSucc)
EDIT2: I admit that I am not much pratical with queries with variables, but this looks like that it could work:
select d1, d2 from (
select
#previous_end as d1,
s.startDateTime as d2,
#previous_end:=s.endDateTime
from (select startDateTime, endDateTime from slots order by startDateTime) s,
(select #previous_end := '0000-00-00 00:00:00') t) s
where d1<d2