Get number of events occurring per hour - mysql

I have a table that contains events along with Unix timestamps:
event_id | start_time | end_time
----------------------------------
1 | 1485388800 | 1485410400
2 | 1485396000 | 1485403200
3 | 1485406800 | 1485414000
I would like to write a query that takes a start time and an end time and tells me how many events are occurring during each hour. The result for the above table, given the start time 1485385200 and the end time 1485414000 would be:
event_count | time
------------------------
0 | 1485385200
1 | 1485388800
1 | 1485392400
2 | 1485396000
2 | 1485399600
1 | 1485403200
2 | 1485406800
1 | 1485410400
0 | 1485414000
What's the best way to write this query? I am stuck both on generating this range and also on range checking the events, preferably without reading the table more than once since it can be quite large.

As mentioned in the comments, the best approach would be to have some kind of calendar table to join on.
However, here's a (slightly hacky) solution without using such a table. It generates the hours sequence using a variable.
The only thing you need to keep in mind is that you need a table that is large enough for the number of intervals you're trying to create (i.e. a table that has at least as many records as the number of hours between your chosen start and end time). In my example, I've used the built-in mysql.help_topic table, but if your events table itself is large enough, you could use that (or any other table) instead.
SET #x:=1485385200, #y:=1485414000;
SELECT COUNT(event_id) AS event_count, hours.start AS time
FROM (
SELECT #x AS start, #x := #x + 3600 AS end
FROM mysql.help_topic
WHERE #x <= #y
) AS hours
LEFT JOIN events AS e
ON e.start_time < hours.end AND e.end_time > hours.start
GROUP BY hours.start
This gives me the following output for the test data you provided:
+-------------+------------+
| event_count | time |
+-------------+------------+
| 0 | 1485385200 |
| 1 | 1485388800 |
| 1 | 1485392400 |
| 2 | 1485396000 |
| 2 | 1485399600 |
| 1 | 1485403200 |
| 2 | 1485406800 |
| 1 | 1485410400 |
| 0 | 1485414000 |
+-------------+------------+

It looks bit messy but works. The idea is to make one calendar table using unions and join it with your table . You can pass start time and end time as per your requirement , I have done it for 24 hrs
SELECT count(event_id) as event_count,t
FROM
(
SELECT
DATE_FORMAT(FROM_UNIXTIME(`start_time`), '%Y-%m-%d %H:%i:%s') as start_time,
DATE_FORMAT(FROM_UNIXTIME(`end_time`), '%Y-%m-%d %H:%i:%s') as end_time,
tmp.*,
event_id
from test
right JOIN
(
(select concat(date(now()),' ','00:00') as t)
UNION
(select concat(date(now()),' ','01:00') as t)
UNION
(select concat(date(now()),' ','02:00') as t)
UNION
(select concat(date(now()),' ','03:00') as t)
UNION
(select concat(date(now()),' ','04:00') as t)
UNION
(select concat(date(now()),' ','05:00') as t)
UNION
(select concat(date(now()),' ','06:00') as t)
UNION
(select concat(date(now()),' ','07:00') as t)
UNION
(select concat(date(now()),' ','08:00') as t)
UNION
(select concat(date(now()),' ','09:00') as t)
UNION
(select concat(date(now()),' ','10:00') as t)
UNION
(select concat(date(now()),' ','11:00') as t)
UNION
(select concat(date(now()),' ','12:00') as t)
UNION
(select concat(date(now()),' ','13:00') as t)
UNION
(select concat(date(now()),' ','14:00') as t)
UNION
(select concat(date(now()),' ','15:00') as t)
UNION
(select concat(date(now()),' ','16:00') as t)
UNION
(select concat(date(now()),' ','17:00') as t)
UNION
(select concat(date(now()),' ','18:00') as t)
UNION
(select concat(date(now()),' ','19:00') as t)
UNION
(select concat(date(now()),' ','20:00') as t)
UNION
(select concat(date(now()),' ','21:00') as t)
UNION
(select concat(date(now()),' ','22:00') as t)
UNION
(select concat(date(now()),' ','23:00') as t)
)tmp
on(cast(tmp.t as datetime) between DATE_FORMAT(FROM_UNIXTIME(`start_time`), '%Y-%m-%d %H:%i:%s') and DATE_FORMAT(FROM_UNIXTIME(`end_time`), '%Y-%m-%d %H:%i:%s'))
)xxx
group by t

Related

How to count concurrently bookings in sql in time interval per minute?

If I have a start and stop time for a booking, how can I calculate the number of bookings there are each minute? I made a simplified version of my database table looks like here:
Start time | End time | booking |
--------------------------------------------------
2020-09-01 10:00 | 2020-09-01 10:10 | Booking 1 |
2020-09-01 10:00 | 2020-09-01 10:05 | Booking 2 |
2020-09-01 10:05 | 2020-09-01 10:10 | Booking 3 |
2020-09-01 10:09 | 2020-09-01 10:10 | Booking 4 |
I want to have the bookings between a given time interval like 10:02 - 10:09. It should be something like this as result:
Desired result
Time | count
-----------
10:02 | 2 |
10:03 | 2 |
10:04 | 2 |
10:05 | 3 |
10:06 | 2 |
10:07 | 2 |
10:08 | 2 |
10:09 | 3 |
Question
How can this be achieved? Today I export it to python however I think it should be possible to achieve directly in SQL.
You can use a recursive CTE directly on your data:
with recursive cte as (
select start_time, end_time
from t
union all
select start_time + interval 1 minute, end_time
from cte
where start_time < end_time
)
select start_time, count(*)
from cte
group by start_time
order by start_time;
Here is a db<>fiddle.
EDIT:
In earlier versions of MySQL, it helps to have a tally table. You can create one on the fly, using something like:
(select #rn := #rn + 1 as n
from t cross join
(select #rn := 0) params
) tally
You need enough numbers for your maximum span, but then you can do:
select t.start_time + interval tally.n hour, count(*)
from t join
(select #rn := #rn + 1 as n
from t cross join
(select #rn := -1) params -- so it starts from 0
limit 100
) tally
on t.start_time + interval tally.n hour <= t.end_time
group by t.start_time + interval tally.n hour;
You can use a recursive query to generate the timestamp range, then unpivot the table and join:
with recursive dates (ts) as(
select '2020-09-01'
union all
select ts + interval 1 minute
from dates
where ts + itnerval 1 minute < '2020-09-02'
)
select d.ts, sum(t.cnt) over(order by d.ts) cnt
from dates d
left join (
select start_time ts, 1 cnt from mytable
union all select end_time, -1 from mytable
) t on t.ts <= d.ts
If you are going to run this repeatedly and/or against large time periods, you would better materialize the date ranges in a calendar table rather than use a recursive query. The calendar table has one row per minute over a large period of dates - assuming a table called date_calendar, you would do:
select d.ts, sum(t.cnt) over(order by d.ts) cnt
from date_calendar d
left join (
select start_time ts, 1 cnt from mytable
union all select end_time, -1 from mytable
) t on t.ts <= d.ts
where d.ts >= '2020-09-01' and d.ts < '2020-09-02'

MySQL Recursive CTE table does not exist

I'm learning about recursive functions, Since I need to extract a row for each day in a range of days. This is my current data
+----+------------+------------+
| id | from | to |
+----+------------+------------+
| 1 | 09-20-2019 | 09-25-2019 |
+----+------------+------------+
The goal is to receive my data as follows
+----+------------+
| id | date |
+----+------------+
| 1 | 09-20-2019 |
| 1 | 09-21-2019 |
| 1 | 09-22-2019 |
| 1 | 09-23-2019 |
| 1 | 09-24-2019 |
| 1 | 09-25-2019 |
+----+------------+
I'm following an example seen here: https://stackoverflow.com/a/54538866/1731057
But for some reason my recursive function is looking for the 'cte' table.
Query 1 ERROR: Table 'project.cte' doesn't exist
WITH cte AS (
SELECT date_from
FROM event_dates
UNION ALL
SELECT DATE_ADD(event_dates.date_from, INTERVAL 1 DAY)
FROM cte
WHERE DATE_ADD(event_dates.date_from, INTERVAL 1 DAY) <= event_dates.date_until
)
select * FROM cte;
The structure of your recursive CTE is off, and the upper half of the union should be a seed base case. Then, the recursive part should add one day to the previous incoming value:
WITH RECURSIVE cte (n, dt) AS (
SELECT 1, '2019-09-20'
UNION ALL
SELECT n + 1, TIMESTAMPADD(DAY, n, '2019-09-20') FROM cte WHERE n <= 5
)
SELECT * FROM cte;
Demo
Of note, we use TIMESTAMPADD() here to get around the problem of the INTERVAL expression, which can't really take a variable.
If you want to use this approach to generate a series of dates which matches the from and to values in your table, then you can try a join:
SELECT
t1.dt
FROM cte t1
INNER JOIN yourTable t2
ON t1.dt BETWEEN t2.from_date AND t2.to_date;
When used this way, the recursive CTE is acting as a calendar table.

mysql avg length of a date squence

I have a report i'm trying to figure out, but I would like to do it all with in a SQL statement instead of needing to iterate over a bunch of data in script to do it.
I have a table that is structured like:
CREATE TABLE `batch_item` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`record_id` int(11) DEFAULT NULL,
`created` DATE NOT NULL,
PRIMARY KEY (`id`),
KEY `record_id` (`record_id`)
);
The Date field is always YEAR-MONTH-01. Data looks something like:
+------+-----------+------------+
| id | record_id | created |
+------+-----------+------------+
| 1 | 1 | 2019-01-01 |
| 2 | 2 | 2019-01-01 |
| 3 | 3 | 2019-01-01 |
| 4 | 1 | 2019-02-01 |
| 5 | 2 | 2019-02-01 |
| 6 | 1 | 2019-03-01 |
| 7 | 3 | 2019-03-01 |
| 8 | 1 | 2019-04-01 |
| 9 | 2 | 2019-04-01 |
+------+-----------+------------+
So what I'm trying to do, with out having to create a looping script, is find the AVG number of sequential months for each record. Example with the data above would be:
Record_id 1 would have a avg of 4 months.
Record_id 2 would be 1.5
Record_id 3 would be 1
I can write a script to iterate through all the records. I just would rather avoid that.
This is a gaps-and-islands problem. You simply need an enumeration of the rows for this to work. In MySQL 8+, you would use row_number() but you can use a global enumeration here:
select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp;
Note that when using row_number(), you would normally partition by record_id. However that is not necessary, if the numbers are created in the correct sequence.
The above query gets the islands. For your final results, you need one more level of aggregation:
select record_id, avg(num_months)
from (select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp
) bi
group by record_id;
This is not a tested solution. It should work in MySQL 8.x with minor tweaks, since I don't remember date arithmetic in MySQL:
with
a as ( -- the last row of each island
select *
from batch_item
where lead(created) over(partition by record_id order by created) is null
or lead(created) over(partition by record_id order by created)
> created + 1 month -- Fix the date arithmetic here!
),
e as ( -- each row, now with the last row of its island
select b.id, b.record_id, min(a.last_created) as end_created
from batch_item b
join a on b.record_id = a.record_id and b.created <= a.created
group by b.id, b.record_id
),
m as ( -- each island with the number of months it has
select
record_id, end_created, count(*) as months
from e
group by record_id, end_created
)
select -- the average length of islands for each record_id
record_id, avg(months) as avg_months
from m
group by record_id

count of every day (mysql)

I want get something like this
Mysql data
(dat_reg)
1.1.2000
1.1.2000
1.1.2000
2.1.2000
2.1.2000
3.1.2000
I want to get:
(dat_reg) (count)
1.1.2000 - 3
2.1.2000 - 5
3.1.2000 - 6
What I tried is this:
SELECT COUNT( * ) as a , DATE_FORMAT( dat_reg, '%d.%m.%Y' ) AS dat
FROM members
WHERE (dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY))
GROUP BY DATE_FORMAT(dat_reg, '%d.%m.%Y')
ORDER BY dat_reg
but I get:
1.1.2000 - 3 | 2.1.2000 - 2 | 3.1.2000 - 1
Some tips how create query for this?
I would suggest using variables in MySQL:
SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, COUNT(*) as cnt
FROM members
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params;
If you want an accumulative from the beginning of time, then you need an additional subquery:
SELECT d.*
FROM (SELECT d.*, (#sumc := #sumc + cnt) as running_cnt
FROM (SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, dat_reg, COUNT(*) as cnt
FROM members
GROUP BY dat
ORDER BY dat_reg
) d CROSS JOIN
(SELECT #sumc := 0) params
) d
WHERE dat_reg > DATE_SUB(NOW() , INTERVAL 5 DAY)
A subquery counting the rows where the registration date is less than or equal to the current registration date could help you out.
SELECT m2.dat_reg,
(SELECT count(*)
FROM members m3
WHERE m3.dat_reg <= m2.dat_reg) count
FROM (SELECT DISTINCT m1.dat_reg
FROM m1.members
WHERE m1.dat_reg > date_sub(now(), INTERVAL 5 DAY)) m2
ORDER BY m2.dat_reg;
(If you got days, on which no one registered and don't want to have gaps in the result, you need to replace the subquery aliased m2 with a table or subquery, that has all days in the respective range.)
I believe you can use the window functions to do the work:
mysql> SELECT employee, sale, date, SUM(sale) OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date | cum_sales |
+----------+------+------------+-----------+
| odin | 200 | 2017-03-01 | 200 |
| odin | 300 | 2017-04-01 | 500 |
| odin | 400 | 2017-05-01 | 900 |
| thor | 400 | 2017-03-01 | 400 |
| thor | 300 | 2017-04-01 | 700 |
| thor | 500 | 2017-05-01 | 1200 |
+----------+------+------------+-----------+
In your case you already have the right groups, it is only a matter of specifying the order in which you want the data the be aggregated.
Source: https://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/
Cheers
Here is a solution using rank and a continuous count variable:
WITH ranked AS (
SELECT m.*
,ROW_NUMBER() OVER (PARTITION BY m.dat_reg ORDER BY m.id DESC) AS rn
FROM (
select id, dat_reg
,#cnt := #cnt + 1 AS ccount from members
,(SELECT #cnt := 0) var
WHERE (dat_reg > DATE_SUB(NOW(), INTERVAL 5 DAY))
) AS m
)
SELECT DATE_FORMAT(dat_reg, '%d.%m.%Y') as dat, ccount FROM ranked WHERE rn = 1;
DB-Fiddle

MySQL count rows within the same intervals to eachother

I have a table where one column is the date:
+----------+---------------------+
| id | date |
+----------+---------------------+
| 5 | 2012-12-10 10:12:37 |
+----------+---------------------+
| 4 | 2012-12-10 09:09:55 |
+----------+---------------------+
| 3 | 2012-12-09 21:12:35 |
+----------+---------------------+
| 2 | 2012-12-09 20:15:07 |
+----------+---------------------+
| 1 | 2012-12-09 20:01:42 |
+----------+---------------------+
What I need, is to count the rows which are for example whitin 3 hours to each other. In this example I want to join the upper row with the 2nd row, and the 3rd row with the 4th and 5th rows. So my output should be like this:
+----------+---------------------+---------+
| id | date | count |
+----------+---------------------+---------+
| 5 | 2012-12-10 10:12:37 | 2 |
+----------+---------------------+---------+
| 3 | 2012-12-09 21:12:35 | 3 |
+----------+---------------------+---------+
How could I do this?
I think you need a self-join for this:
select t.id, t.date, COUNT(t2.id)
from t left outer join
t t2
on t.date between t2.date - interval 3 hour and t2.date + interval 3 hour
group by t.id, t.date
(This is untested code so it might have a syntax error.)
If you are trying to divide everything into 3-hour intervals, you can do something like:
select max(t.date), t.id, count(*)
from (select t.*,
(date(date)*100 + floor(hour(date)/3)*3) as interval
from t
) t
group by interval
I am not sure how to do this with My SQL but i am able to build a set of queries in SQL Server 2005 which will provide the intended results. Here is the working sample, its very complex and may be overly complex but that's how i was able to get the desired result:
WITH BaseData AS
(
SELECT 5 AS ID, '2012-12-10 10:12:37' AS Date
UNION ALL
SELECT 4 AS ID, '2012-12-10 09:09:55' AS Date
UNION ALL
SELECT 3 AS ID, '2012-12-09 21:12:35' AS Date
UNION ALL
SELECT 2 AS ID, '2012-12-09 20:15:07' AS Date
UNION ALL
SELECT 1 AS ID, '2012-12-09 20:01:42' AS Date
),
BaseDataWithRowNum AS
(
SELECT ID,DATE, ROW_NUMBER() OVER (ORDER BY Date DESC) AS RowNum
FROM BaseData
),
InterRelatedDates AS
(
SELECT B1.RowNum AS RowNum1,B2.RowNum AS RowNum2
FROM BaseDataWithRowNum B1
INNER JOIN BaseDataWithRowNum B2
ON B1.Date BETWEEN B2.Date AND DATEADD(hh,3,B2.Date)
AND B1.RowNum < B2.RowNum
AND B1.ID != B2.ID
),
InterRelatedDatesWithinMultipleGroups AS
(
SELECT G1.RowNum1,G2.RowNum2
FROM InterRelatedDates G1
LEFT JOIN InterRelatedDates G2
ON G1.RowNum2 = G2.RowNum2
AND G1.RowNum1 != G2.RowNum1
)
SELECT BN.ID,
BN.Date,
CountExcludingOriginalGrouppingRecord +1 AS C
FROM
(
SELECT RowNum1 AS RowNum,COUNT(1) AS CountExcludingOriginalGrouppingRecord
FROM
(
-- If a row was used in only one group then it is ok. use as it is
SELECT D1.RowNum1
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NULL
UNION ALL
-- In case a row was selected in two groups, choose the one with higher date
SELECT Min(D1.RowNum1)
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NOT NULL
GROUP BY D1.RowNum2
) T
GROUP BY RowNum1
) T2
INNER JOIN BaseDataWithRowNum BN
ON BN.RowNum = T2.RowNum