Calculating total of days between multiple date ranges - mysql

I have a problem figuring out how to calculate total days between different date ranges using MySQL.
I need to count total of days between different date ranges without days that include each other date range.
Data example:
from
to
2021/08/28
2021/09/29
2021/08/29
2021/09/01
2021/09/01
2021/09/01
Date ranges example and output
Dates 2021-08-28 2021-08-29 2021-08-30 2021-08-31 2021-09-01 2021-09-02 2021-09-03 2021-09-04
Range1 |--------------------|
Range2 |--------------------|
Range3 |--------------------|
Total Days: 6
Dates 2021-08-28 2021-08-29 2021-08-30 2021-08-31 2021-09-01 2021-09-02 2021-09-03 2021-09-04
Range1 |--------------------|
Range2 |--------------------------------------------|
Range3 |--------|
Total Days: 5

Possibly the simplest method is a recursive CTE:
with recursive dates as (
select `from`, `to`
from t
union all
select `from` + interval 1 day, `to`
from dates
where `from` < `to`
)
select count(distinct `from`)
from dates;
Note that from and to are really bad names for columns because they are SQL keywords.
EDIT:
In MySQL 5.7, you can use a tally table -- a table of numbers.
Assuming your original table has enough rows for the widest time span, you can use:
select count(distinct `from` + interval (n - 1) day)
from t cross join
(select (#rn := #rn + 1) as n
from t cross join
(select #rn := 0) params
) n
on `from` + interval (n - 1) day <= `to`;
If your table is really big, you might want a limit for the widest time period.

Related

SQL get consecutive starting and end date with specific period

I have a hotel_availablities table something like this.
date
availability
2021-01-15
y
2021-01-16
y
2021-01-17
y
2021-01-18
n
2021-01-19
n
2021-01-20
y
2021-01-21
n
2021-01-22
y
2021-01-23
y
I wanted to get the results of possible available date range values where period of stay is 2 days.
date range
2021-01-15 : 2021-01-16
2021-01-16 : 2021-01-17
2021-01-22 : 2021-01-23
If period of stays was 3 days I would get results as below
date range
2021-01-15 : 2021-01-18
How can I achieve this result with sql?
This is a gaps and islands problem. Assuming you are using MySQL 8+, we can use the difference in row numbers method here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY availability ORDER BY date) rn2
FROM yourTable
)
SELECT MIN(date) AS start_date, MAX(date) AS end_date, COUNT(*) AS cnt
FROM cte
WHERE availability = 'y'
GROUP BY rn1 - rn2
HAVING COUNT(*) >= 2; -- but change to COUNT(*) >= 3, e.g. for three days in a row
Demo
Note that my query does not give the exact output you expect, but maybe this would be enough for your requirement. If you wanted to break out each island larger than 2 days in terms of pairs of 2 days at a time, you might have to also bring in a calendar table here.
Assuming you have a row for each date, you can use a single window function -- and no aggregation. That window function is a count of 'y" in the current row and next n - 1 days:
select date, date + interval <n - 1> day
from (select t.*,
sum(availability = 'y') over (order by date
rows between current row and <n - 1> following
) as num_y
from t
) t
where num_y = <n>;
Through below query you can achieve that. First I have numbered the rows with row_number()
user lead() to get next consecutive dates. In lead second parameter is determining how many consecutive dates will be considered.
WITH t AS (
SELECT date ,ROW_NUMBER() OVER(ORDER BY date) rownumber
FROM hotel_availablities where availability='y'
),
t2 as (SELECT date StartDate ,lead(date ,1)over (partition by date_add(date ,INTERVAL -rownumber day)) EndDate
FROM t)
select concat(startdate,' - ',enddate)daterange from t2 where enddate is not null

SQL/MySQL: split a quantity value into multiple rows by date

I have a table with three columns: planning_start_date - planning_end_date - quantity.
For example I have this data:
planning_start_date | planning_end_date | quantity
2019-03-01 | 2019-03-31 | 1500
I need to split the value 1500 into multiple rows with the adverage per day, so 1500 / 31 days = 48,38 per day.
The expected result should be:
date daily_qty
2019-03-01 | 48,38
2019-03-02 | 48,38
2019-03-03 | 48,38
...
2019-03-31 | 48,38
Anyone with some suggestions?
Should you decide to upgrade to MySQL 8.0, here's a recursive CTE that will generate a list of all the days between planning_start_date and planning_end_date along with the required daily quantity:
WITH RECURSIVE cte AS (
SELECT planning_start_date AS date,
planning_end_date,
quantity / (DATEDIFF(planning_end_date, planning_start_date) + 1) AS daily_qty
FROM test
UNION ALL
SELECT date + INTERVAL 1 DAY, planning_end_date, daily_qty
FROM cte
WHERE date < planning_end_date
)
SELECT `date`, daily_qty
FROM cte
ORDER BY `date`
Demo on dbfiddle
In MySLQ 8+, you can use a recursive CTE like this:
with recursive cte(dte, planning_end_date, quantity, days) as (
select planning_start_date as dte, planning_end_date, quantity, datediff(planning_end_date, planning_start_date) + 1 as days
from t
union all
select dte + interval 1 day as dte, planning_end_date, quantity, days
from cte
where dte < planning_end_date
)
select dte, quantity / days
from cte;
Here is a db<>fiddle.
In earlier versions, you want a numbers table of some sort. For instance, if your table has enough rows, you can just use it:
select (planning_start_date + interval n.n day),
quantity / (datediff(planning_end_date, planning_start_date) + 1)
from t join
(select (#rn := #rn + 1) as n
from t cross join
(select #rn := 0) params
) n
on planning_start_date + interval n.n day <= planning_end_date;
You can use any table that is large enough for n.

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.

Find max of continuous streak and the current streak from datetime

I have the following data of a particular user -
Table temp -
time_stamp
2015-07-19 10:52:00
2015-07-18 10:49:00
2015-07-12 10:43:00
2015-06-08 12:32:00
2015-06-07 11:33:00
2015-06-06 10:05:00
2015-06-05 04:17:00
2015-04-14 04:11:00
2014-04-02 23:19:00
So the output for the query should be -
Maximum streak = 4, Current streak = 2
Max streak = 4 because of these -
2015-06-08 12:32:00
2015-06-07 11:33:00
2015-06-06 10:05:00
2015-06-05 04:17:00
And current streak is 2 because of these (Assuming today's date is 2015-07-19)-
2015-07-19 10:52:00
2015-07-18 10:49:00
EDIT: I want a simple SQL query for MYSQL
For MAX streak(streak) you can use this, I have use the same query to calculate max streak. This may help you
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT date AS d, COUNT(*) AS n
FROM table_name
group by date
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
A general approach with the gaps and islands queries is to tag each row with its rank in the data and with its rank in the full list of dates. The clusters will all have the same difference.
Caveats: I don't know if this query will be efficient. I don't remember if MySQL allows for scalar subqueries. I didn't look up the way to calculate a day interval in MySQL.
select user_id, max(time_stamp), count(*)
from (
select
t.user_id, t.time_stamp,
(
select count(*)
from T as t2
where t2.user_id = t.user_id and t2.time_stamp <= t.time_stamp
) as rnk,
number of days from t.time_stamp to current_date as days
from T as t
) as data
group by usr_id, days - rnk