How to make time buckets with a start and end time column? - mysql

I have 3 columns, employee_id, start_time and end_time I want to make bucks of 1 hour to show me how many employees were working in each hour. For example, employee A worked from 12 pm to 3 pm and employee B worked from 2 pm to 4 pm so, at 12 pm (1 employee was working) 1 pm (1 employee) 2 pm (2 employees were working) 3 pm (2 employees) and 4 pm (1 employee), how can I make this in SQL? Let me show you a picture of the start and end time columns.
Sample input would be:
Expected outcome would be something like
I want to create a bucket in order to know how many people were working in each hour of the day.
SELECT
Employee_id,
TIME(shift_start_at,timezone) AS shift_start,
TIME(shift_end_at,timezone) AS shift_end,
FROM
`employee_shifts` AS shifts
WHERE
DATE(shifts.shift_start_at_local) >= "2022-05-01"
GROUP BY
1,
2,
3

Assuming you are on mysql version 8 or above generate all the buckets , left join to shifts to infill times in start-endtime ranges , filter out those that are not applicable then count eg:-
DROP TABLE IF EXISTS t;
create table t (id int, startts datetime, endts datetime);
insert into t values
(1,'2022-06-19 08:30:00','2022-06-19 10:00:00'),
(2,'2022-06-19 08:30:00','2022-06-19 08:45:00'),
(3,'2022-06-19 07:00:00','2022-06-19 07:59:00');
with cte as
(select 7 as bucket union select 8 union select 9 union select 10 union select 11),
cte1 as
(select bucket,t.*,
floor(hour(startts)) starthour, floor(hour(endts)) endhour
from cte
left join t on cte.bucket between floor(hour(startts)) and floor(hour(endts))
)
select bucket,count(id) nof from cte1 group by bucket
;
+--------+-----+
| bucket | nof |
+--------+-----+
| 7 | 1 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
| 11 | 0 |
+--------+-----+
5 rows in set (0.001 sec)

If you have a limited number of time bucket maybe you can use it this way
WITH CTE AS
(SELECT
COUNTRY,
MONTH,
TIMESTAMP_DIFF(time_b, time_a, MINUTE) dt,
METRIC_a,
METRIC_b
FROM
TABLE_NAME)
SELECT
CASE
WHEN dt BETWEEN 0 AND 10 THEN "0-10"
WHEN dt BETWEEN 10 AND 20 THEN "11-20"
WHEN dt BETWEEN 20 AND 30 THEN "21-30"
WHEN dt BETWEEN 30 AND 40 THEN "31-40"
WHEN dt > 40 THEN ">40"
END as time_bucket,
AVG(METRIC_a),
SUM(METRIC_b)
FROM CTE
Althought, I should emphasize that this solution works if you have a limited bucket. If you have a lot of buckets, you can create a base table with your buckets then LEFT JOIN it to get your results.

Just use a subquery for each column mentioning the required timestamp in between, also make sure your start_time and end_time columns are timestamp types. For more information, please share the table structure, sample data, and expected output

If I understood well, this would be
SELECT HOUR, (SELECT COUNT(*)
FROM employee
WHERE start_time <= HOUR
AND end_time >= HOUR) AS working
FROM schedule HOUR
Where schedule is a table with employee schedules.

Related

MySQL query for records that existed at any point each week

I have a table with created_at and deleted_at timestamps. I need to know, for each week, how many records existed at any point that week:
week
records
2022-01
4
2022-02
5
...
...
Essentially, records that were created before the end of the week and deleted after the beginning of the week.
I've tried various variations of the following but it's under-reporting and I can't work out why:
SELECT
DATE_FORMAT(created_at, '%Y-%U') AS week,
COUNT(*)
FROM records
WHERE
deleted_at > DATE_SUB(deleted_at, INTERVAL (WEEKDAY(deleted_at)+1) DAY)
AND created_at < DATE_ADD(created_at, INTERVAL 7 - WEEKDAY(created_at) DAY)
GROUP BY week
ORDER BY week
Any help would be massively appreciated!
I would create a table wktable that looks like so (for the last 5 weeks of last year):
yrweek | wkstart | wkstart
-------+------------+------------
202249 | 2022-11-27 | 2022-12-03
202250 | 2022-12-04 | 2022-12-10
202251 | 2022-12-11 | 2022-12-17
202252 | 2022-12-18 | 2022-12-24
202253 | 2022-12-25 | 2022-12-31
To get there, find a way to create 365 consecutive integers, make all the dates of 2022 out of that, and group them by year-week.
This is an example:
CREATE TABLE wk AS
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,
i(i) AS (
SELECT hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
)
,
dt(dt) AS (
SELECT
DATE_ADD(DATE '2022-01-01', INTERVAL i DAY)
FROM i
WHERE i < 365
)
SELECT
YEAR(dt)*100 + WEEK(dt) AS yrweek
, MIN(dt) AS wkstart
, MAX(dt) AS wkend
FROM dt
GROUP BY yrweek
ORDER BY yrweek;
With that table, go:
SELECT
yrweek
, COUNT(*) AS records
FROM wk
JOIN input_table ON wk.wkstart < input_table.deleted_at
AND wk.wkend > input_table.created_at
GROUP BY
yrweek
;
I first build a list with the records, their open count, and the closed count
SELECT
created_at,
deleted_at,
(SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at ) as new,
(SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at) as closed
FROM records r1
ORDER BY r1.created_at;
After that it's just adding a GROUP BY:
SELECT
date_format(created_at,'%Y-%U') as week,
MAX((SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at )) as new,
MAX((SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at)) as closed
FROM records r1
GROUP BY week
ORDER BY week;
see: DBFIDDLE
NOTE: Because I use random times, the results will change when re-run. A sample output is:
week
new
closed
2022-00
31
0
2022-01
298
64
2022-02
570
212
2022-03
800
421

SQL Count events with duration per hour

I have data of an event with duration (say, eating a meal at a restaurant) and I want to know for any given hour how many events were taking place. The data looks like this:
Event | Start Time | End Time
-----------------------------------------
1 | 12:03 | 14:20
2 | 12:30 | 12:50
3 | 13:05 | 14:45
4 | 14:01 | 14:49
I also have "Duration" available as an alternative to "End Time". The result I'm looking for would be like this:
Hour | Count
-----------------------
12 | 2
13 | 2
14 | 3
During hour 12, there were two events happening (1 & 2), hour 13 also had two events (1 & 3) and hour 14 had three events (1, 3, & 4).
I can do this programmatically with a loop. I can count when the events start (or end) in SQL. But I'd really like to bridge the gap and do this in SQL, but I can't think of a way.
One possible solution (works with MySQL v5.6+ and SQLite3):
create table hours(Hour int);
insert into hours values
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),
(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
create table log(Event int,StartTime varchar(5),EndTime varchar(5));
insert into log values
(1,'12:03','14:20'),
(2,'12:30','12:50'),
(3,'13:05','14:45'),
(4,'14:01','14:49');
-- ------------------------------------------------------------------------------
select Hour,count(Event) Count
from log join hours
on Hour between substr(StartTime,1,2) and substr(EndTime,1,2)
group by Hour;
If you are running MySQL 8.0, you could use UNION ALL, window functions and aggregation, like so:
select hr, sum(sum(cnt)) over(order by hr) cnt
from (
select hour(start_time) hr, 1 cnt from mytable
union all select hour(end_time) + 1, -1 from mytable
) t
group by hr
Demo on DB Fiddle:
hr | cnt
-: | --:
12 | 2
13 | 2
14 | 3
15 | 0
If you do not have MySql 8, then create a table hour:
CREATE TABLE hour (
hr INT PRIMARY KEY
);
INSERT INTO hour(hr) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
And then:
select h.hr, count(*) as cnt from hour h
join mytable m on h.hr between hour(m.Start_Time) and hour(m.End_Time)
group by hr
order by hr
;
See Db-Fiddle

Given a table with time periods, query for a list of sum per day

Let's say I have a table that says how many items of something are valid between two dates.
Additionally, there may be multiple such periods.
For example, given a table:
itemtype | count | start | end
A | 10 | 2014-01-01 | 2014-01-10
A | 10 | 2014-01-05 | 2014-01-08
This means that there are 10 items of type A valid 2014-01-01 - 2014-01-10 and additionally, there are 10 valid 2014-01-05 - 2014-01-08.
So for example, the sum of valid items at 2014-01-06 are 20.
How can I query the table to get the sum per day? I would like a result such as
2014-01-01 10
2014-01-02 10
2014-01-03 10
2014-01-04 10
2014-01-05 20
2014-01-06 20
2014-01-07 20
2014-01-08 20
2014-01-09 10
2014-01-10 10
Can this be done with SQL? Either Oracle or MySQL would be fine
The basic syntax you are looking for is as follows:
For my example below I've defined a new table called DateTimePeriods which has a column for StartDate and EndDate both of which are DATE columns.
SELECT
SUM(NumericColumnName)
, DateTimePeriods.StartDate
, DateTimePeriods.EndDate
FROM
TableName
INNER JOIN DateTimePeriods ON TableName.dateColumnName BETWEEN DateTimePeriods.StartDate and DateTimePeriods.EndDate
GROUP BY
DateTimePeriods.StartDate
, DateTimePeriods.EndDate
Obviously the above code won't work on your database but should give you a reasonable place to start. You should look into GROUP BY and Aggregate Functions. I'm also not certain of how universal BETWEEN is for each database type, but you could do it using other comparisons such as <= and >=.
There are several ways to go about this. First, you need a list of dense dates to query. Using a row generator statement can provide that:
select date '2014-01-01' + level -1 d
from dual
connect by level <= 15;
Then for each date, select the sum of inventory:
with
sample_data as
(select 'A' itemtype, 10 item_count, date '2014-01-01' start_date, date '2014-01-10' end_date from dual union all
select 'A', 10, date '2014-01-05', date '2014-01-08' from dual),
periods as (select date '2014-01-01' + level -1 d from dual connect by level <= 15)
select
periods.d,
(select sum(item_count) from sample_data where periods.d between start_date and end_date) available
from periods
where periods.d = date '2014-01-06';
You would need to dynamically set the number of date rows to generate.
If you only needed a single row, then a query like this would work:
with
sample_data as
(select 'A' itemtype, 10 item_count, date '2014-01-01' start_date, date '2014-01-10' end_date from dual union all
select 'A', 10, date '2014-01-05', date '2014-01-08' from dual)
select sum(item_count)
from sample_data
where date '2014-01-06' between start_date and end_date;

Find big enough gaps in booking table

A rental system uses a booking table to store all bookings and reservations:
booking | item | startdate | enddate
1 | 42 | 2013-10-25 16:00 | 2013-10-27 12:00
2 | 42 | 2013-10-27 14:00 | 2013-10-28 18:00
3 | 42 | 2013-10-30 09:00 | 2013-11-01 09:00
…
Let’s say a user wants to rent item 42 from 2013-10-27 12:00 until 2013-10-28 12:00 which is a period of one day. The system will tell him, that the item is not available in the given time frame, since booking no. 2 collides.
Now I want to suggest the earliest rental date and time when the selected item is available again. Of course considering the user’s requested period (1 day) beginning with the user’s desired date and time.
So in the case above, I’m looking for an SQL query that returns 2013-10-28 18:00, since the earliest date since 2013-10-27 12:00 at which item 42 will be available for 1 day, is from 2013-10-28 18:00 until 2013-10-29 18:00.
So I need to to find a gap between bookings, that is big enough to hold the user’s reservation and that is as close a possible to the desired start date.
Or in other words: I need to find the first booking for a given item, after which there’s enough free time to place the user’s booking.
Is this possible in plain SQL without having to iterate over every booking and its successor?
If you can't redesign your database to use something more efficient, this will get the answer. You'll obviously want to parameterize it. It says find either the desired date, or the earliest end date where the hire interval doesn't overlap an existing booking:
Select
min(startdate)
From (
select
cast('2013-10-27 12:00' as datetime) startdate
from
dual
union all
select
enddate
from
booking
where
enddate > cast('2013-10-27 12:00' as datetime) and
item = 42
) b1
Where
not exists (
select
'x'
from
booking b2
where
item = 42 and
b1.startdate < b2.enddate and
b2.startdate < date_add(b1.startdate, interval 24 hour)
);
Example Fiddle
SELECT startfree,secondsfree FROM (
SELECT
#lastenddate AS startfree,
UNIX_TIMESTAMP(startdate)-UNIX_TIMESTAMP(#lastenddate) AS secondsfree,
#lastenddate:=enddate AS ignoreme
FROM
(SELECT startdate,enddate FROM bookings WHERE item=42) AS schedule,
(SELECT #lastenddate:=NOW()) AS init
ORDER BY startdate
) AS baseview
WHERE startfree>='2013-10-27 12:00:00'
AND secondsfree>=86400
ORDER BY startfree
LIMIT 1
;
Some explanation: The inner query uses a variable to move the iteration into SQL, the outer query finds the needed row.
That said, I would not do this in SQL, if the DB structure is like the given. You could reduce the iteration count by using some smort WHERE in the inner query to a sane timespan, but chances are, this won't perform well.
EDIT
A caveat: I did not check, but I assume, this won't work, if there are no prior reservations in the list - this should not be a problem, as in this case your first reservation attempt (original time) will work.
EDIT
SQLfiddle
Searching for overlapping date ranges generally yields poor performance in SQL. For that reason having a "Calendar" of available slots often makes things a lot more efficient.
For example, the booking 2013-10-25 16:00 => 2013-10-27 12:00 would actually be represented by 44 records, each one hour long.
The "gap" until the next booking at 2013-10-27 14:00 would then be represented by 2 records, each one hours long.
Then, each record could also have the duration (in time, or number of slots) until the next change.
slot_start_time | booking | item | remaining_duration
------------------+---------+------+--------------------
2013-10-27 10:00 | 1 | 42 | 2
2013-10-27 11:00 | 1 | 42 | 1
2013-10-27 12:00 | NULL | 42 | 2
2013-10-27 13:00 | NULL | 42 | 1
2013-10-27 14:00 | 2 | 42 | 28
2013-10-27 15:00 | 2 | 42 | 27
... | ... | ... | ...
2013-10-28 17:00 | 2 | 42 | 1
2013-10-28 18:00 | NULL | 42 | 39
2013-10-28 19:00 | NULL | 42 | 38
Then your query just becomes:
SELECT
*
FROM
slots
WHERE
slot_start_time >= '2013-10-27 12:00'
AND remaining_duration >= 24
AND booking IS NULL
ORDER BY
slot_start_time ASC
LIMIT
1
OK this isn't pretty in MySQL. That's because we have to fake rownum values in subqueries.
The basic approach is to join the appropriate subset of the booking table to itself offset by one.
Here's the basic list of reservations for item 42, ordered by reservation time. We can't order by booking_id, because those aren't guaranteed to be in order of reservation time. (You're trying to insert a new reservation between two existing ones, eh?) http://sqlfiddle.com/#!2/62383/9/0
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
Here is that subset joined to itself. The trick is the a.rownum+1 = b.rownum, which joins each row to the one that comes right after it in the booking table subset. http://sqlfiddle.com/#!2/62383/8/0
SELECT a.booking_id, a.startdate asta, a.enddate aend,
b.startdate bsta, b.enddate bend
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
Here it is again, showing each reservation (except the last one) and the number of hours following it. http://sqlfiddle.com/#!2/62383/15/0
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
So, if you're looking for the starting time and ending time of the earliest twelve-hour slot you can use that result set to do this: http://sqlfiddle.com/#!2/62383/18/0
SELECT MIN(enddate) startdate, MIN(enddate) + INTERVAL 12 HOUR as enddate
FROM (
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
) AS gaps
WHERE gaphours >= 12
here is the query, it will return needed date, obvious condition - there should be some bookings in table, but as I see from question - you do this check:
SELECT min(enddate)
FROM
(
select a.enddate from table4 as a
where
a.item=42
and
DATE_ADD(a.enddate, INTERVAL 1 day) <= ifnull(
(select min(b.startdate)
from table4 as b where b.startdate>=a.enddate and a.item=b.item),
a.enddate)
and
a.enddate>=now()
union all
select greatest(ifnull(max(enddate), now()),now()) from table4
) as q
you change change INTERVAL 1 day to INTERVAL ### hour
If I have understood your requirements correctly, you could try self-JOINing book with itself, to get the "empty" spaces, and then fit. This is MySQL only (I believe it can be adapted to others - certainly PostgreSQL):
SELECT book.*, TIMESTAMPDIFF(MINUTE, book.enddate, book.best) AS width FROM
(
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 USING (item)
WHERE item = 42 AND book1.startdate >= book.enddate
GROUP BY book.booking
) AS book HAVING width > 110 ORDER BY startdate LIMIT 1;
In the above example, "110" is the looked-for minimum width in minutes.
Same thing, a bit less readable (for me), a SELECT removed (very fast SELECT, so little advantage):
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 ON (book.item = book1.item AND book.item = 42)
WHERE book1.startdate >= book.enddate
GROUP BY book.booking
HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) > 110
ORDER BY startdate LIMIT 1;
In your case, one day is 1440 minutes and
SELECT book.*, MIN(book1.startdate) AS best FROM book JOIN book AS book1 ON (book.item = book1.item AND book.item = 42) WHERE book1.startdate >= book.enddate GROUP BY book.booking HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) >= 1440 ORDER BY startdate LIMIT 1;
+---------+------+---------------------+---------------------+---------------------+
| booking | item | startdate | enddate | best |
+---------+------+---------------------+---------------------+---------------------+
| 2 | 42 | 2013-10-27 14:00:00 | 2013-10-28 18:00:00 | 2013-10-30 09:00:00 |
+---------+------+---------------------+---------------------+---------------------+
1 row in set (0.00 sec)
...the period returned is 2, i.e., at the end of booking 2, and until "best" which is booking 3, a period of at least 1440 minutes is available.
An issue could be that if no periods are available, the query returns nothing -- then you need another query to fetch the farthest enddate. You can do this with an UNION and LIMIT 1 of course, but I think it would be best to only run the 'recovery' query on demand, programmatically (i.e. if empty(query) then new_query...).
Also, in the inner WHERE you should add a check for NOW() to avoid dates in the past. If expired bookings are moved to inactive storage, this could be unnecessary.

Compute outstanding amounts in MySQL

I am having an issue with a SELECT command in MySQL. I have a database of securities exchanged daily with maturity from 1 to 1000 days (>1 mio rows). I would like to get the outstanding amount per day (and possibly per category). To give an example, suppose this is my initial dataset:
DATE VALUE MATURITY
1 10 3
1 15 2
2 10 1
3 5 1
I would like to get the following output
DATE OUTSTANDING_AMOUNT
1 25
2 35
3 15
Outstanding amount is calculated as the total of securities exchanged still 'alive'. That means, in day 2 there is a new exchange for 10 and two old exchanges (10 and 15) still outstanding as their maturity is longer than one day, for a total outstanding amount of 35 on day 2. In day 3 instead there is a new exchange for 5 and an old exchange from day 1 of 10. That is, 15 of outstanding amount.
Here's a more visual explanation:
Monday Tuesday Wednesday
10 10 10 (Day 1, Value 10, matures in 3 days)
15 15 (Day 1, 15, 2 days)
10 (Day 2, 10, 1 day)
5 (Day 3, 5, 3 days with remainder not shown)
-------------------------------------
25 35 15 (Outstanding amount on each day)
Is there a simple way to get this result?
First of all in the main subquery we find SUM of all Values for current date. Then add to them values from previous dates according their MATURITY (the second subquery).
SQLFiddle demo
select T1.Date,T1.SumValue+
IFNULL((select SUM(VALUE)
from T
where
T1.Date between
T.Date+1 and T.Date+Maturity-1 )
,0)
FROM
(
select Date,
sum(Value) as SumValue
from T
group by Date
) T1
order by DATE
I'm not sure if this is what you are looking for, perhaps if you give more detail
select
DATE
,sum(VALUE) as OUTSTANDING_AMOUNT
from
NameOfYourTable
group by
DATE
Order by
DATE
I hope this helps
Each date considers each row for inclusion in the summation of value
SELECT d.DATE, SUM(m.VALUE) AS OUTSTANDING_AMOUNT
FROM yourTable AS d JOIN yourtable AS m ON d.DATE >= m.MATURITY
GROUP BY d.DATE
ORDER BY d.DATE
A possible solution with a tally (numbers) table
SELECT date, SUM(value) outstanding_amount
FROM
(
SELECT date + maturity - n.n date, value, maturity
FROM table1 t JOIN
(
SELECT 1 n UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) n ON n.n <= maturity
) q
GROUP BY date
Output:
| DATE | OUTSTANDING_AMOUNT |
-----------------------------
| 1 | 25 |
| 2 | 35 |
| 3 | 15 |
Here is SQLFiddle demo