SQL Count events with duration per hour - mysql

I have data of an event with duration (say, eating a meal at a restaurant) and I want to know for any given hour how many events were taking place. The data looks like this:
Event | Start Time | End Time
-----------------------------------------
1 | 12:03 | 14:20
2 | 12:30 | 12:50
3 | 13:05 | 14:45
4 | 14:01 | 14:49
I also have "Duration" available as an alternative to "End Time". The result I'm looking for would be like this:
Hour | Count
-----------------------
12 | 2
13 | 2
14 | 3
During hour 12, there were two events happening (1 & 2), hour 13 also had two events (1 & 3) and hour 14 had three events (1, 3, & 4).
I can do this programmatically with a loop. I can count when the events start (or end) in SQL. But I'd really like to bridge the gap and do this in SQL, but I can't think of a way.

One possible solution (works with MySQL v5.6+ and SQLite3):
create table hours(Hour int);
insert into hours values
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),
(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
create table log(Event int,StartTime varchar(5),EndTime varchar(5));
insert into log values
(1,'12:03','14:20'),
(2,'12:30','12:50'),
(3,'13:05','14:45'),
(4,'14:01','14:49');
-- ------------------------------------------------------------------------------
select Hour,count(Event) Count
from log join hours
on Hour between substr(StartTime,1,2) and substr(EndTime,1,2)
group by Hour;

If you are running MySQL 8.0, you could use UNION ALL, window functions and aggregation, like so:
select hr, sum(sum(cnt)) over(order by hr) cnt
from (
select hour(start_time) hr, 1 cnt from mytable
union all select hour(end_time) + 1, -1 from mytable
) t
group by hr
Demo on DB Fiddle:
hr | cnt
-: | --:
12 | 2
13 | 2
14 | 3
15 | 0

If you do not have MySql 8, then create a table hour:
CREATE TABLE hour (
hr INT PRIMARY KEY
);
INSERT INTO hour(hr) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
And then:
select h.hr, count(*) as cnt from hour h
join mytable m on h.hr between hour(m.Start_Time) and hour(m.End_Time)
group by hr
order by hr
;
See Db-Fiddle

Related

How to make time buckets with a start and end time column?

I have 3 columns, employee_id, start_time and end_time I want to make bucks of 1 hour to show me how many employees were working in each hour. For example, employee A worked from 12 pm to 3 pm and employee B worked from 2 pm to 4 pm so, at 12 pm (1 employee was working) 1 pm (1 employee) 2 pm (2 employees were working) 3 pm (2 employees) and 4 pm (1 employee), how can I make this in SQL? Let me show you a picture of the start and end time columns.
Sample input would be:
Expected outcome would be something like
I want to create a bucket in order to know how many people were working in each hour of the day.
SELECT
Employee_id,
TIME(shift_start_at,timezone) AS shift_start,
TIME(shift_end_at,timezone) AS shift_end,
FROM
`employee_shifts` AS shifts
WHERE
DATE(shifts.shift_start_at_local) >= "2022-05-01"
GROUP BY
1,
2,
3
Assuming you are on mysql version 8 or above generate all the buckets , left join to shifts to infill times in start-endtime ranges , filter out those that are not applicable then count eg:-
DROP TABLE IF EXISTS t;
create table t (id int, startts datetime, endts datetime);
insert into t values
(1,'2022-06-19 08:30:00','2022-06-19 10:00:00'),
(2,'2022-06-19 08:30:00','2022-06-19 08:45:00'),
(3,'2022-06-19 07:00:00','2022-06-19 07:59:00');
with cte as
(select 7 as bucket union select 8 union select 9 union select 10 union select 11),
cte1 as
(select bucket,t.*,
floor(hour(startts)) starthour, floor(hour(endts)) endhour
from cte
left join t on cte.bucket between floor(hour(startts)) and floor(hour(endts))
)
select bucket,count(id) nof from cte1 group by bucket
;
+--------+-----+
| bucket | nof |
+--------+-----+
| 7 | 1 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
| 11 | 0 |
+--------+-----+
5 rows in set (0.001 sec)
If you have a limited number of time bucket maybe you can use it this way
WITH CTE AS
(SELECT
COUNTRY,
MONTH,
TIMESTAMP_DIFF(time_b, time_a, MINUTE) dt,
METRIC_a,
METRIC_b
FROM
TABLE_NAME)
SELECT
CASE
WHEN dt BETWEEN 0 AND 10 THEN "0-10"
WHEN dt BETWEEN 10 AND 20 THEN "11-20"
WHEN dt BETWEEN 20 AND 30 THEN "21-30"
WHEN dt BETWEEN 30 AND 40 THEN "31-40"
WHEN dt > 40 THEN ">40"
END as time_bucket,
AVG(METRIC_a),
SUM(METRIC_b)
FROM CTE
Althought, I should emphasize that this solution works if you have a limited bucket. If you have a lot of buckets, you can create a base table with your buckets then LEFT JOIN it to get your results.
Just use a subquery for each column mentioning the required timestamp in between, also make sure your start_time and end_time columns are timestamp types. For more information, please share the table structure, sample data, and expected output
If I understood well, this would be
SELECT HOUR, (SELECT COUNT(*)
FROM employee
WHERE start_time <= HOUR
AND end_time >= HOUR) AS working
FROM schedule HOUR
Where schedule is a table with employee schedules.

MySQL - How to check continuity of data

I have a database containing information about sick days of employees in the following structure ( example ):
date || login
2018-01-02 || TestLogin1
2018-01-03 || TestLogin2
2018-01-04 || TestLogin5
2018-01-05 || TestLogin1
2018-01-06 || TestLogin2
And I want to check whether someone had 23 Sick Days in a row within previous 60 days.
I know how to do this in PHP, using loops , but was wondering whether there is a possibility to create this app in raw MySQL.
This is the output I want to achieve:
login || NumberOfDaysOnSickLeaveWithinPrevious2Month
TestLogin4 || 32
TestLogin7 || 30
TestLogin12 || 20
TestLogin3 || 15
TestLogin1 || 10
Will be thankful for the support,
Thanks in advance,
Your sample data suggests that you just want aggregation:
select login,
count(*) as NumberOfDaysOnSickLeaveWithinPrevious2Month
from t
where date >= curdate() - interval 2 month
group by login;
That has nothing to do with "consecutive days". But your sample data doesn't even show two days in a row with the same login -- nor even any dates within the past two months.
It's a lot easier to develop this if you shrink the numbers for example 2 or more continuous days absent in the last 5 days.
drop table if exists t;
create table t(employee_id int, dt date);
insert into t values
(1,'2018-07-10'),(1,'2018-07-11'),(1,'2018-07-12'),
(2,'2018-07-10'),(2,'2018-07-15'),
(3,'2018-07-10'),(3,'2018-07-11'),(3,'2018-07-13'),(3,'2018-07-14')
;
select employee_id, bn, count(*)
from
(
select t.*, concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt))
- #p = 1 diff,
if(
concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt))
- #p = 1 ,#bn:=#bn,#bn:=#bn+1) bn,
#p:=concat(employee_id,year(dt) * 10000 + month(dt) * 100 + day(dt)) p
from t
cross join (select #bn:=0,#p:=0) b
where dt >= date_add(date(now()), interval -5 day)
order by employee_id,dt
) s
group by employee_id,bn having count(*) >= 2 ;
+-------------+------+----------+
| employee_id | bn | count(*) |
+-------------+------+----------+
| 1 | 1 | 3 |
| 3 | 4 | 2 |
| 3 | 5 | 2 |
+-------------+------+----------+
3 rows in set (0.06 sec)
Note the use of variables to work out a block number ,and the having clause. Concating employee and date creates a psuedo key and simplifies calculation.

How to group total hours/day for start/finish data (multiple entries per day, some crossing midnight)?

I want to know how to get the exact hours between two dates in current date.
my table;
id | start | s_clock | finish | f_clock
---------------------------------------------------
1 | 2017-11-10 | 22:00 | 2017-11-11 | 03:00
2 | 2017-11-11 | 09:00 | 2017-11-11 | 10:00
Expected result:
day | total_hours
--------------------------
2017-11-10 | 02:00 -- sum of all hours spent on 2017-11-10
2017-11-11 | 04:00 -- sum of all hours spent on 2017-11-11
Thanks for your help.
You should avoid using reserved keywords as table or field names.
start is one of them.
create table times ( startt date, s_time time, finish date, f_time time);
insert into times
values
( "2017-11-10" , "22:00", "2017-11-11" , "03:00"),
( "2017-11-11" , "09:00", "2017-11-11" , "10:00");
select time(sum(delta_on_day)),on_day -- comment line to see ungrouped results
from -- comment line to see ungrouped results
( -- comment line to see ungrouped results
(
select *
, timediff(time("24:00"), s_time) as delta_on_day
, startt as on_day
from times
where startt < finish
)
UNION ALL
(
select *
, timediff(f_time, time("0:0")) as delta_on_day
, finish as on_day
from times
where startt < finish
)
UNION ALL
(
select *
, timediff(f_time, s_time) as delta_on_day
, startt as on_day
from times
where startt = finish
)
) as tbl -- comment line to see ungrouped results
group by on_day -- comment line to see ungrouped results
see fiddle: http://sqlfiddle.com/#!9/189e21f/38

Get largest values from multiple columns from latest timestamps in MySql

I'm trying to get a list of the*usedpc values across multiple similar columns, and order desc to get worst offenders. Also, I need to only select the values from the most recent timestamp for each sys_id.
Example data:
Sys_id | timestamp | disk0_usedpc | disk1_usedpc | disk2_usedpc
---
1 | 2016-05-06 15:24:10 | 75 | 45 | 35
1 | 2016-04-06 15:24:10 | 70 | 40 | 30
2 | 2016-05-06 15:24:10 | 23 | 28 | 32
3 | 2016-05-06 15:24:10 | 50 | 51 | 55
Desired result (assuming limit 2 for example):
1 | 2016-05-06 15:24:10 | disk0_usedpc | 75
3 | 2016-05-06 15:24:10 | disk2_usedpc | 55
I know I can get the max from each column using greatest, max and group timestamp to get only the latest values, but I can't figure out how to get the whole ordered list (not just max/greatest from each column, but the "5 highest values across all 3 disk columns").
EDIT: I set up a SQLFiddle page:
http://sqlfiddle.com/#!9/82202/1/0
EDIT2: I'm very sorry about the delay. I was able to get all three solutions to work, thank you. If #PetSerAl can put his solution in an answer, I'll mark it as accepted, as this solution allowed me to very smoothly customise further.
You can join vm_disk table with three row table to create separate row for each of yours disks. Then, as you have row per disk now, you can easily filter or sort them.
select
`sys_id`,
`timestamp`,
concat('disk', `disk`, '_usedpc') as `name`,
case `disk`
when 0 then `disk0_usedpc`
when 1 then `disk1_usedpc`
when 2 then `disk2_usedpc`
end as `usedpc`
from
`vm_disk` join
(
select 0 as `disk`
union all
select 1
union all
select 2
) as `t`
where
(`sys_id`, `timestamp`) in (
select
`sys_id`,
max(`timestamp`)
from `vm_disk`
group by `sys_id`
)
order by `usedpc` desc
limit 5
Maybe something like this would work... I know it may look pretty redundant but it could save overhead caused by doing multiple joins to the same table:
SELECT md.Sys_id,
md.timestamp,
CASE
WHEN
md.disk0_usedpc > md.disk1_usedpc
AND
md.disk0_usedpc > md.disk2_usedpc
THEN 'disk0_usedpc'
WHEN
md.disk1_usedpc > md.disk0_usedpc
AND
md.disk1_usedpc > md.disk2_usedpc
THEN 'disk1_usedpc'
ELSE 'disk2_usedpc'
END AS pcname,
CASE
WHEN
md.disk0_usedpc > md.disk1_usedpc
AND
md.disk0_usedpc > md.disk2_usedpc
THEN md.disk0_usedpc
WHEN
md.disk1_usedpc > md.disk0_usedpc
AND
md.disk1_usedpc > md.disk2_usedpc
THEN md.disk1_usedpc
ELSE md.disk2_usedpc
END AS pcusage
FROM mydatabase md
GROUP BY md.Sys_id HAVING MAX(md.timestamp)
ORDER BY pcusage DESC
Try this:
select
t1.sys_id, t1.`timestamp`,
case locate(greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc), concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc))
when 1 then 'disk0_usedpc'
when 1 + length(concat(disk0_usedpc, ',')) then 'disk1_usedpc'
when 1 + length(concat(disk0_usedpc, ',', disk1_usedpc, ',')) then 'disk2_usedpc'
end as usedpc,
greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc) as amount
from yourtable t1
join (
select max(`timestamp`) as `timestamp`, sys_id
from yourtable
group by sys_id
) t2 on t1.sys_id = t2.sys_id and t1.`timestamp` = t2.`timestamp`
order by t1.`timestamp` desc
-- limit 2
SQLFiddle Demo
How it works, the sub query here is try to get the latest row for each group sys_id, as one kind of way in many solutions. Then you should get the greatest column in disk0_usedpc ,disk1_usedpc ,disk2_usedpc, as you wrote in your question, the function greatest is the plan. So greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc) as amount can help you get the amount.
But also you want that column's name, here I used locate and concat, concat_ws(which avoids writing so many separators, here is comma ,).
Let's take row 1 | 2016-05-06 15:24:10 | 75 | 45 | 35 as an example:
concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc) will give us "75,45,35", here 75's index in this string is 1, 45 is 4, 35 is 7.
As you see, locate(greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc), concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc)) will return 1, so the greatest row is disk0_usedpc, here it makes.

Find big enough gaps in booking table

A rental system uses a booking table to store all bookings and reservations:
booking | item | startdate | enddate
1 | 42 | 2013-10-25 16:00 | 2013-10-27 12:00
2 | 42 | 2013-10-27 14:00 | 2013-10-28 18:00
3 | 42 | 2013-10-30 09:00 | 2013-11-01 09:00
…
Let’s say a user wants to rent item 42 from 2013-10-27 12:00 until 2013-10-28 12:00 which is a period of one day. The system will tell him, that the item is not available in the given time frame, since booking no. 2 collides.
Now I want to suggest the earliest rental date and time when the selected item is available again. Of course considering the user’s requested period (1 day) beginning with the user’s desired date and time.
So in the case above, I’m looking for an SQL query that returns 2013-10-28 18:00, since the earliest date since 2013-10-27 12:00 at which item 42 will be available for 1 day, is from 2013-10-28 18:00 until 2013-10-29 18:00.
So I need to to find a gap between bookings, that is big enough to hold the user’s reservation and that is as close a possible to the desired start date.
Or in other words: I need to find the first booking for a given item, after which there’s enough free time to place the user’s booking.
Is this possible in plain SQL without having to iterate over every booking and its successor?
If you can't redesign your database to use something more efficient, this will get the answer. You'll obviously want to parameterize it. It says find either the desired date, or the earliest end date where the hire interval doesn't overlap an existing booking:
Select
min(startdate)
From (
select
cast('2013-10-27 12:00' as datetime) startdate
from
dual
union all
select
enddate
from
booking
where
enddate > cast('2013-10-27 12:00' as datetime) and
item = 42
) b1
Where
not exists (
select
'x'
from
booking b2
where
item = 42 and
b1.startdate < b2.enddate and
b2.startdate < date_add(b1.startdate, interval 24 hour)
);
Example Fiddle
SELECT startfree,secondsfree FROM (
SELECT
#lastenddate AS startfree,
UNIX_TIMESTAMP(startdate)-UNIX_TIMESTAMP(#lastenddate) AS secondsfree,
#lastenddate:=enddate AS ignoreme
FROM
(SELECT startdate,enddate FROM bookings WHERE item=42) AS schedule,
(SELECT #lastenddate:=NOW()) AS init
ORDER BY startdate
) AS baseview
WHERE startfree>='2013-10-27 12:00:00'
AND secondsfree>=86400
ORDER BY startfree
LIMIT 1
;
Some explanation: The inner query uses a variable to move the iteration into SQL, the outer query finds the needed row.
That said, I would not do this in SQL, if the DB structure is like the given. You could reduce the iteration count by using some smort WHERE in the inner query to a sane timespan, but chances are, this won't perform well.
EDIT
A caveat: I did not check, but I assume, this won't work, if there are no prior reservations in the list - this should not be a problem, as in this case your first reservation attempt (original time) will work.
EDIT
SQLfiddle
Searching for overlapping date ranges generally yields poor performance in SQL. For that reason having a "Calendar" of available slots often makes things a lot more efficient.
For example, the booking 2013-10-25 16:00 => 2013-10-27 12:00 would actually be represented by 44 records, each one hour long.
The "gap" until the next booking at 2013-10-27 14:00 would then be represented by 2 records, each one hours long.
Then, each record could also have the duration (in time, or number of slots) until the next change.
slot_start_time | booking | item | remaining_duration
------------------+---------+------+--------------------
2013-10-27 10:00 | 1 | 42 | 2
2013-10-27 11:00 | 1 | 42 | 1
2013-10-27 12:00 | NULL | 42 | 2
2013-10-27 13:00 | NULL | 42 | 1
2013-10-27 14:00 | 2 | 42 | 28
2013-10-27 15:00 | 2 | 42 | 27
... | ... | ... | ...
2013-10-28 17:00 | 2 | 42 | 1
2013-10-28 18:00 | NULL | 42 | 39
2013-10-28 19:00 | NULL | 42 | 38
Then your query just becomes:
SELECT
*
FROM
slots
WHERE
slot_start_time >= '2013-10-27 12:00'
AND remaining_duration >= 24
AND booking IS NULL
ORDER BY
slot_start_time ASC
LIMIT
1
OK this isn't pretty in MySQL. That's because we have to fake rownum values in subqueries.
The basic approach is to join the appropriate subset of the booking table to itself offset by one.
Here's the basic list of reservations for item 42, ordered by reservation time. We can't order by booking_id, because those aren't guaranteed to be in order of reservation time. (You're trying to insert a new reservation between two existing ones, eh?) http://sqlfiddle.com/#!2/62383/9/0
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
Here is that subset joined to itself. The trick is the a.rownum+1 = b.rownum, which joins each row to the one that comes right after it in the booking table subset. http://sqlfiddle.com/#!2/62383/8/0
SELECT a.booking_id, a.startdate asta, a.enddate aend,
b.startdate bsta, b.enddate bend
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
Here it is again, showing each reservation (except the last one) and the number of hours following it. http://sqlfiddle.com/#!2/62383/15/0
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
So, if you're looking for the starting time and ending time of the earliest twelve-hour slot you can use that result set to do this: http://sqlfiddle.com/#!2/62383/18/0
SELECT MIN(enddate) startdate, MIN(enddate) + INTERVAL 12 HOUR as enddate
FROM (
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
) AS gaps
WHERE gaphours >= 12
here is the query, it will return needed date, obvious condition - there should be some bookings in table, but as I see from question - you do this check:
SELECT min(enddate)
FROM
(
select a.enddate from table4 as a
where
a.item=42
and
DATE_ADD(a.enddate, INTERVAL 1 day) <= ifnull(
(select min(b.startdate)
from table4 as b where b.startdate>=a.enddate and a.item=b.item),
a.enddate)
and
a.enddate>=now()
union all
select greatest(ifnull(max(enddate), now()),now()) from table4
) as q
you change change INTERVAL 1 day to INTERVAL ### hour
If I have understood your requirements correctly, you could try self-JOINing book with itself, to get the "empty" spaces, and then fit. This is MySQL only (I believe it can be adapted to others - certainly PostgreSQL):
SELECT book.*, TIMESTAMPDIFF(MINUTE, book.enddate, book.best) AS width FROM
(
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 USING (item)
WHERE item = 42 AND book1.startdate >= book.enddate
GROUP BY book.booking
) AS book HAVING width > 110 ORDER BY startdate LIMIT 1;
In the above example, "110" is the looked-for minimum width in minutes.
Same thing, a bit less readable (for me), a SELECT removed (very fast SELECT, so little advantage):
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 ON (book.item = book1.item AND book.item = 42)
WHERE book1.startdate >= book.enddate
GROUP BY book.booking
HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) > 110
ORDER BY startdate LIMIT 1;
In your case, one day is 1440 minutes and
SELECT book.*, MIN(book1.startdate) AS best FROM book JOIN book AS book1 ON (book.item = book1.item AND book.item = 42) WHERE book1.startdate >= book.enddate GROUP BY book.booking HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) >= 1440 ORDER BY startdate LIMIT 1;
+---------+------+---------------------+---------------------+---------------------+
| booking | item | startdate | enddate | best |
+---------+------+---------------------+---------------------+---------------------+
| 2 | 42 | 2013-10-27 14:00:00 | 2013-10-28 18:00:00 | 2013-10-30 09:00:00 |
+---------+------+---------------------+---------------------+---------------------+
1 row in set (0.00 sec)
...the period returned is 2, i.e., at the end of booking 2, and until "best" which is booking 3, a period of at least 1440 minutes is available.
An issue could be that if no periods are available, the query returns nothing -- then you need another query to fetch the farthest enddate. You can do this with an UNION and LIMIT 1 of course, but I think it would be best to only run the 'recovery' query on demand, programmatically (i.e. if empty(query) then new_query...).
Also, in the inner WHERE you should add a check for NOW() to avoid dates in the past. If expired bookings are moved to inactive storage, this could be unnecessary.