How to count concurrently bookings in sql in time interval per minute? - mysql

If I have a start and stop time for a booking, how can I calculate the number of bookings there are each minute? I made a simplified version of my database table looks like here:
Start time | End time | booking |
--------------------------------------------------
2020-09-01 10:00 | 2020-09-01 10:10 | Booking 1 |
2020-09-01 10:00 | 2020-09-01 10:05 | Booking 2 |
2020-09-01 10:05 | 2020-09-01 10:10 | Booking 3 |
2020-09-01 10:09 | 2020-09-01 10:10 | Booking 4 |
I want to have the bookings between a given time interval like 10:02 - 10:09. It should be something like this as result:
Desired result
Time | count
-----------
10:02 | 2 |
10:03 | 2 |
10:04 | 2 |
10:05 | 3 |
10:06 | 2 |
10:07 | 2 |
10:08 | 2 |
10:09 | 3 |
Question
How can this be achieved? Today I export it to python however I think it should be possible to achieve directly in SQL.

You can use a recursive CTE directly on your data:
with recursive cte as (
select start_time, end_time
from t
union all
select start_time + interval 1 minute, end_time
from cte
where start_time < end_time
)
select start_time, count(*)
from cte
group by start_time
order by start_time;
Here is a db<>fiddle.
EDIT:
In earlier versions of MySQL, it helps to have a tally table. You can create one on the fly, using something like:
(select #rn := #rn + 1 as n
from t cross join
(select #rn := 0) params
) tally
You need enough numbers for your maximum span, but then you can do:
select t.start_time + interval tally.n hour, count(*)
from t join
(select #rn := #rn + 1 as n
from t cross join
(select #rn := -1) params -- so it starts from 0
limit 100
) tally
on t.start_time + interval tally.n hour <= t.end_time
group by t.start_time + interval tally.n hour;

You can use a recursive query to generate the timestamp range, then unpivot the table and join:
with recursive dates (ts) as(
select '2020-09-01'
union all
select ts + interval 1 minute
from dates
where ts + itnerval 1 minute < '2020-09-02'
)
select d.ts, sum(t.cnt) over(order by d.ts) cnt
from dates d
left join (
select start_time ts, 1 cnt from mytable
union all select end_time, -1 from mytable
) t on t.ts <= d.ts
If you are going to run this repeatedly and/or against large time periods, you would better materialize the date ranges in a calendar table rather than use a recursive query. The calendar table has one row per minute over a large period of dates - assuming a table called date_calendar, you would do:
select d.ts, sum(t.cnt) over(order by d.ts) cnt
from date_calendar d
left join (
select start_time ts, 1 cnt from mytable
union all select end_time, -1 from mytable
) t on t.ts <= d.ts
where d.ts >= '2020-09-01' and d.ts < '2020-09-02'

Related

count by person by month between days in mysql

I have a table of absences with 3 columns id, begin_dt, end_dt. I need to give a count of how many id's has at least one day of absence in that month. So for example there is a row as follow:
id begin_dt end_dt
1 01/01/2020 02/02/2020
2 02/02/2020 02/02/2020
my result has to be
month count
01-2020 1
02-2020 2
I thought with a group by on DATE_FORMAT(SYSDATE(), '%Y-%m'), but I don't know how to manage the fact that we had to look for the whole period begin_dt till end_dt
you can find a working creation of table of this example here: https://www.db-fiddle.com/f/rYBsxQzTjjQ9nGBEmeAX6W/0
Schema (MySQL v5.7)
CREATE TABLE absence (
`id` VARCHAR(6),
`begin_dt` DATETIME,
`end_dt` DATETIME
);
INSERT INTO absence
(`id`, `begin_dt`, `end_dt`)
VALUES
('1', DATE('2019-01-01'), DATE('2019-02-02')),
('2', DATE('2019-02-02'), DATE('2019-02-02'));
Query #1
select * from absence;
| id | begin_dt | end_dt |
| --- | ------------------- | ------------------- |
| 1 | 2019-01-01 00:00:00 | 2019-02-02 00:00:00 |
| 2 | 2019-02-02 00:00:00 | 2019-02-02 00:00:00 |
View on DB Fiddle
SELECT DATE_FORMAT(startofmonth, '%Y-%m-01') year_and_month,
COUNT(*) absent_person_count
FROM absence
JOIN ( SELECT DATE_FORMAT(dt + INTERVAL n MONTH, '%Y-%m-01') startofmonth,
DATE_FORMAT(dt + INTERVAL n MONTH, '%Y-%m-01') + INTERVAL 1 MONTH - INTERVAL 1 DAY endofmonth
FROM ( SELECT MIN(begin_dt) dt
FROM absence ) startdate,
( SELECT 0 n UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 ) numbers,
( SELECT DATE_FORMAT(MIN(begin_dt), '%Y-%m') mindate,
DATE_FORMAT(MAX(end_dt), '%Y-%m') maxdate
FROM absence ) datesrange
WHERE DATE_FORMAT(dt + INTERVAL n MONTH, '%Y-%m') BETWEEN mindate AND maxdate ) dateslist
ON begin_dt <= endofmonth
AND end_dt >= startofmonth
GROUP BY year_and_month;
fiddle

count of every day (mysql)

I want get something like this
Mysql data
(dat_reg)
1.1.2000
1.1.2000
1.1.2000
2.1.2000
2.1.2000
3.1.2000
I want to get:
(dat_reg) (count)
1.1.2000 - 3
2.1.2000 - 5
3.1.2000 - 6
What I tried is this:
SELECT COUNT( * ) as a , DATE_FORMAT( dat_reg, '%d.%m.%Y' ) AS dat
FROM members
WHERE (dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY))
GROUP BY DATE_FORMAT(dat_reg, '%d.%m.%Y')
ORDER BY dat_reg
but I get:
1.1.2000 - 3 | 2.1.2000 - 2 | 3.1.2000 - 1
Some tips how create query for this?
I would suggest using variables in MySQL:
SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, COUNT(*) as cnt
FROM members
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params;
If you want an accumulative from the beginning of time, then you need an additional subquery:
SELECT d.*
FROM (SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, dat_reg, COUNT(*) as cnt
FROM members
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params
) d
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
A subquery counting the rows where the registration date is less than or equal to the current registration date could help you out.
SELECT m2.dat_reg,
(SELECT count(*)
FROM members m3
WHERE m3.dat_reg <= m2.dat_reg) count
FROM (SELECT DISTINCT m1.dat_reg
FROM m1.members
WHERE m1.dat_reg > date_sub(now(), INTERVAL 5 DAY)) m2
ORDER BY m2.dat_reg;
(If you got days, on which no one registered and don't want to have gaps in the result, you need to replace the subquery aliased m2 with a table or subquery, that has all days in the respective range.)
I believe you can use the window functions to do the work:
mysql> SELECT employee, sale, date, SUM(sale) OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date | cum_sales |
+----------+------+------------+-----------+
| odin | 200 | 2017-03-01 | 200 |
| odin | 300 | 2017-04-01 | 500 |
| odin | 400 | 2017-05-01 | 900 |
| thor | 400 | 2017-03-01 | 400 |
| thor | 300 | 2017-04-01 | 700 |
| thor | 500 | 2017-05-01 | 1200 |
+----------+------+------------+-----------+
In your case you already have the right groups, it is only a matter of specifying the order in which you want the data the be aggregated.
Source: https://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/
Cheers
Here is a solution using rank and a continuous count variable:
WITH ranked AS (
SELECT m.*
,ROW_NUMBER() OVER (PARTITION BY m.dat_reg ORDER BY m.id DESC) AS rn
FROM (
select id, dat_reg
,#cnt := #cnt + 1 AS ccount from members
,(SELECT #cnt := 0) var
WHERE (dat_reg > DATE_SUB(NOW(), INTERVAL 5 DAY))
) AS m
)
SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, ccount FROM ranked WHERE rn = 1;
DB-Fiddle

Group data by Custom Period Ranges Using a Reference Date and a Set Time Period

If for example, I have a table that looks like this:
+----+--------+---------------------+
| id | volume | createdAt |
+----+--------+---------------------+
| 1 | 0.11 | 2018-01-26 13:56:01 |
| 2 | 0.34 | 2018-01-28 14:22:12 |
| 3 | 0.22 | 2018-03-11 11:01:12 |
| 4 | 0.19 | 2018-04-13 12:12:12 |
| 5 | 0.12 | 2014-04-21 19:12:11 |
+----+--------+---------------------+
I want to perform a query that can accept starting point and then loop through a given number of days, and then group by that date range.
For instance, I'd like the result to look like this:
+------------+------------+--------+
| enddate | startdate | volume |
+------------+------------+--------+
| 2018-04-25 | 2018-04-12 | 0.31 |
| 2018-04-11 | 2018-03-29 | 0.00 |
| 2018-03-28 | 2018-03-15 | 0.00 |
| 2018-03-14 | 2018-03-01 | 0.22 |
| 2018-02-28 | 2018-02-15 | 0.00 |
| 2018-02-14 | 2018-02-01 | 0.00 |
| 2018-01-31 | 2018-01-18 | 0.45 |
| ... | ... | ... |
+------------+------------+--------+
In essence, I want to be able to input a start_date e.g 2018-04-25, a time_interval e.g. 14, like in the illustration above and then the query will sum the volumes in that time range.
I know how to use INTERVAL with the DATE_SUB() and the DATE_ADD() functions but I cannot figure out how to perform the loop I think is necessary.
Please help.
For the given data you can determine time based groupings using the datediff and floor functions:
floor(datediff(createdat, date '2018-04-25')/14) grp
From the group number you can determine the periods stardate and enddate:
date_add(date '2018-04-25', interval (grp*14) day) startdate
date_add(date '2018-04-25', interval ((grp+1)*14) day) enddate
Which represent a half open range with startdate being inclusive and enddate being exclusive.
Putting these together in a usable query:
select startdate, enddate, sum(volume)
from (select t1.*
, date_add(date '2018-04-25', interval (grp*14) day) startdate
, date_add(date '2018-04-25', interval ((grp+1)*14) day) enddate
from (select t.*
, datediff(t.createdat, date '2018-04-25') diff
, floor(datediff(t.createdat, date '2018-04-25')/14) grp
from table1 t) t1) t2
group by startdate, enddate
order by startdate desc;
Unfortunately this does not get the empty periods. To get the empty periods you need a way to generate rows. However, MySQL doesn't have a simple way to generate rows (at least not until MySQL 8 where common table expressions and recursive SQL are added), but there are database objects that already have a large number of rows, such as the information_schema.columns view which likely has sufficient rows for your needs, and if it doesn't, a cross join or two will easily multiply the number of records generated. That paired with a variable that increments for each row returned will provide the needed groups:
select #rn:=#rn+1 rn
, stop
, date_add(date '2018-04-25', interval (#rn*14) day) startdate
, date_add(date '2018-04-25', interval ((#rn+1)*14) day) enddate
from information_schema.columns c
, (select #rn:=min(floor(datediff(createdat, date '2018-04-25')/14))-1
, max(floor(datediff(createdat, date '2018-04-25')/14)) stop
from table1) limits
where #rn < stop;
Outer joining this with the original data and grouping by the period dates yields:
select startdate
, enddate
, sum(volume) volume
from table1
right join (
select #rn:=#rn+1 rn
, stop
, date_add(date '2018-04-25', interval (#rn*14) day) startdate
, date_add(date '2018-04-25', interval ((#rn+1)*14) day) enddate
from information_schema.columns c
-- , information_schema.columns d -- if needed add another cartesian join
, (select #rn:=min(floor(datediff(createdat, date '2018-04-25')/14))-1
, max(floor(datediff(createdat, date '2018-04-25')/14)) stop
from table1) limits
where #rn < stop) periods
on startdate <= createdat
and createdat < enddate
group by startdate, enddate
order by startdate desc;
Take a look at the SQL Fiddle to see this in action
All you need to do it determine start_date(which is the parameter you pass) and end_date from your entire table and loop through them by adding time interval.
Have a look at below stored routine:
CREATE DEFINER=`root`#`localhost` PROCEDURE `getTotalVolumeByDateRange`(start_time timestamp, time_interval int)
BEGIN
DECLARE max_date date;
DECLARE min_date date;
DECLARE temp_end_date date;
SET min_date = DATE(start_time);
SELECT DATE(MAX(createdAt)) FROM VolumeData INTO max_date;
-- SELECT max_date, min_date;
CREATE TEMPORARY TABLE tempRangedVolumeData(
start_date date,
end_date date,
Volume decimal(5,2)
);
while min_date <= max_date DO
SET temp_end_date = DATE_ADD(min_date, Interval time_interval DAY);
INSERT INTO tempRangedVolumeData(start_date, end_date, Volume)
SELECT min_date, temp_end_date, SUM(Volume)
FROM VolumeData
WHERE DATE(CreatedAt) BETWEEN min_date and temp_end_date;
SET min_date = DATE_ADD(min_date, Interval time_interval+1 DAY);
end while;
select
start_date,
end_date,
coalesce(Volume,0) as Volume
from tempRangedVolumeData;
drop table tempRangedVolumeData;
END
I hope this helps. Please comment if i am missing any edge case.

Get number of events occurring per hour

I have a table that contains events along with Unix timestamps:
event_id | start_time | end_time
----------------------------------
1 | 1485388800 | 1485410400
2 | 1485396000 | 1485403200
3 | 1485406800 | 1485414000
I would like to write a query that takes a start time and an end time and tells me how many events are occurring during each hour. The result for the above table, given the start time 1485385200 and the end time 1485414000 would be:
event_count | time
------------------------
0 | 1485385200
1 | 1485388800
1 | 1485392400
2 | 1485396000
2 | 1485399600
1 | 1485403200
2 | 1485406800
1 | 1485410400
0 | 1485414000
What's the best way to write this query? I am stuck both on generating this range and also on range checking the events, preferably without reading the table more than once since it can be quite large.
As mentioned in the comments, the best approach would be to have some kind of calendar table to join on.
However, here's a (slightly hacky) solution without using such a table. It generates the hours sequence using a variable.
The only thing you need to keep in mind is that you need a table that is large enough for the number of intervals you're trying to create (i.e. a table that has at least as many records as the number of hours between your chosen start and end time). In my example, I've used the built-in mysql.help_topic table, but if your events table itself is large enough, you could use that (or any other table) instead.
SET #x:=1485385200, #y:=1485414000;
SELECT COUNT(event_id) AS event_count, hours.start AS time
FROM (
SELECT #x AS start, #x := #x + 3600 AS end
FROM mysql.help_topic
WHERE #x <= #y
) AS hours
LEFT JOIN events AS e
ON e.start_time < hours.end AND e.end_time > hours.start
GROUP BY hours.start
This gives me the following output for the test data you provided:
+-------------+------------+
| event_count | time |
+-------------+------------+
| 0 | 1485385200 |
| 1 | 1485388800 |
| 1 | 1485392400 |
| 2 | 1485396000 |
| 2 | 1485399600 |
| 1 | 1485403200 |
| 2 | 1485406800 |
| 1 | 1485410400 |
| 0 | 1485414000 |
+-------------+------------+
It looks bit messy but works. The idea is to make one calendar table using unions and join it with your table . You can pass start time and end time as per your requirement , I have done it for 24 hrs
SELECT count(event_id) as event_count,t
FROM
(
SELECT
DATE_FORMAT(FROM_UNIXTIME(`start_time`), '%Y-%m-%d %H:%i:%s') as start_time,
DATE_FORMAT(FROM_UNIXTIME(`end_time`), '%Y-%m-%d %H:%i:%s') as end_time,
tmp.*,
event_id
from test
right JOIN
(
(select concat(date(now()),' ','00:00') as t)
UNION
(select concat(date(now()),' ','01:00') as t)
UNION
(select concat(date(now()),' ','02:00') as t)
UNION
(select concat(date(now()),' ','03:00') as t)
UNION
(select concat(date(now()),' ','04:00') as t)
UNION
(select concat(date(now()),' ','05:00') as t)
UNION
(select concat(date(now()),' ','06:00') as t)
UNION
(select concat(date(now()),' ','07:00') as t)
UNION
(select concat(date(now()),' ','08:00') as t)
UNION
(select concat(date(now()),' ','09:00') as t)
UNION
(select concat(date(now()),' ','10:00') as t)
UNION
(select concat(date(now()),' ','11:00') as t)
UNION
(select concat(date(now()),' ','12:00') as t)
UNION
(select concat(date(now()),' ','13:00') as t)
UNION
(select concat(date(now()),' ','14:00') as t)
UNION
(select concat(date(now()),' ','15:00') as t)
UNION
(select concat(date(now()),' ','16:00') as t)
UNION
(select concat(date(now()),' ','17:00') as t)
UNION
(select concat(date(now()),' ','18:00') as t)
UNION
(select concat(date(now()),' ','19:00') as t)
UNION
(select concat(date(now()),' ','20:00') as t)
UNION
(select concat(date(now()),' ','21:00') as t)
UNION
(select concat(date(now()),' ','22:00') as t)
UNION
(select concat(date(now()),' ','23:00') as t)
)tmp
on(cast(tmp.t as datetime) between DATE_FORMAT(FROM_UNIXTIME(`start_time`), '%Y-%m-%d %H:%i:%s') and DATE_FORMAT(FROM_UNIXTIME(`end_time`), '%Y-%m-%d %H:%i:%s'))
)xxx
group by t

Finding count for a Period in sql

I have a table with :
user_id | order_date
---------+------------
12 | 2014-03-23
12 | 2014-01-24
14 | 2014-01-26
16 | 2014-01-23
15 | 2014-03-21
20 | 2013-10-23
13 | 2014-01-25
16 | 2014-03-23
13 | 2014-01-25
14 | 2014-03-22
A Active user is someone who has logged in last 12 months.
Need output as
Period | count of Active user
----------------------------
Oct-2013 - 1
Jan-2014 - 5
Mar-2014 - 10
The Jan 2014 value - includes Oct -2013 1 record and 4 non duplicate record for Jan 2014)
You can use a variable to calculate the running total of active users:
SELECT Period,
#total:=#total+cnt AS `Count of Active Users`
FROM (
SELECT CONCAT(MONTHNAME(order_date), '-', YEAR(order_date)) AS Period,
COUNT(DISTINCT user_id) AS cnt
FROM mytable
GROUP BY Period
ORDER BY YEAR(order_date), MONTH(order_date) ) t,
(SELECT #total:=0) AS var
The subquery returns the number of distinct active users per Month/Year. The outer query uses #total variable in order to calculate the running total of active users' count.
Fiddle Demo here
I've got two queries that do the thing. I am not sure which one's the fastest. Check them aginst your database:
SQL Fiddle
Query 1:
select per.yyyymm,
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(per.yyyymm - INTERVAL 1 YEAR) and o.order_date < per.yyyymm + INTERVAL 1 MONTH) as `count`
from
(select DISTINCT LAST_DAY(order_date) + INTERVAL 1 DAY - INTERVAL 1 MONTH as yyyymm
from orders) per
order by per.yyyymm
Results:
| yyyymm | count |
|---------------------------|-------|
| October, 01 2013 00:00:00 | 1 |
| January, 01 2014 00:00:00 | 5 |
| March, 01 2014 00:00:00 | 6 |
Query 2:
select DATE_FORMAT(order_date, '%Y-%m'),
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(LAST_DAY(o1.order_date) + INTERVAL 1 DAY - INTERVAL 13 MONTH) and
o.order_date <= LAST_DAY(o1.order_date)) as `count`
from orders o1
group by DATE_FORMAT(order_date, '%Y-%m')
Results:
| DATE_FORMAT(order_date, '%Y-%m') | count |
|----------------------------------|-------|
| 2013-10 | 1 |
| 2014-01 | 5 |
| 2014-03 | 6 |
The best thing I could do is this:
SELECT Date, COUNT(*) as ActiveUsers
FROM
(
SELECT DISTINCT userId, CONCAT(YEAR(order_date), "-", MONTH(order_date)) as Date
FROM `a`
ORDER BY Date
)
AS `b`
GROUP BY Date
The output is the following:
| Date | ActiveUsers |
|---------|-------------|
| 2013-10 | 1 |
| 2014-1 | 4 |
| 2014-3 | 4 |
Now, for every row you need to sum up the number of active users in previous rows.
For example, here is the code in C#.
int total = 0;
while (reader.Read())
{
total += (int)reader['ActiveUsers'];
Console.WriteLine("{0} - {1} active users", reader['Date'].ToString(), reader['ActiveUsers'].ToString());
}
By the way, for the March of 2014 the answer is 9 because one row is duplicated.
Try this, but thise doesn't handle the last part: The Jan 2014 value - includes Oct -2013
select TO_CHAR(order_dt,'MON-YYYY'), count(distinct User_ID ) cnt from [orders]
where User_ID in
(select User_ID from
(select a.User_ID from [orders] a,
(select a.User_ID,count (a.order_dt) from [orders] a
where a.order_dt > (select max(b.order_dt)-365 from [orders] b where a.User_ID=b.User_ID)
group by a.User_ID
having count(order_dt)>1) b
where a.User_ID=b.User_ID) a
)
group by TO_CHAR(order_dt,'MON-YYYY');
This is what I think you are looking for
SET #cnt = 0;
SELECT Period, #cnt := #cnt + total_active_users AS total_active_users
FROM (
SELECT DATE_FORMAT(order_date, '%b-%Y') AS Period , COUNT( id) AS total_active_users
FROM t
GROUP BY DATE_FORMAT(order_date, '%b-%Y')
ORDER BY order_date
) AS t
This is the output that I get
Period total_active_users
Oct-2013 1
Jan-2014 6
Mar-2014 10
You can also do COUNT(DISTINCT id) to get the unique Ids only
Here is a SQL Fiddle