SQL get consecutive starting and end date with specific period - mysql

I have a hotel_availablities table something like this.
date
availability
2021-01-15
y
2021-01-16
y
2021-01-17
y
2021-01-18
n
2021-01-19
n
2021-01-20
y
2021-01-21
n
2021-01-22
y
2021-01-23
y
I wanted to get the results of possible available date range values where period of stay is 2 days.
date range
2021-01-15 : 2021-01-16
2021-01-16 : 2021-01-17
2021-01-22 : 2021-01-23
If period of stays was 3 days I would get results as below
date range
2021-01-15 : 2021-01-18
How can I achieve this result with sql?

This is a gaps and islands problem. Assuming you are using MySQL 8+, we can use the difference in row numbers method here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY availability ORDER BY date) rn2
FROM yourTable
)
SELECT MIN(date) AS start_date, MAX(date) AS end_date, COUNT(*) AS cnt
FROM cte
WHERE availability = 'y'
GROUP BY rn1 - rn2
HAVING COUNT(*) >= 2; -- but change to COUNT(*) >= 3, e.g. for three days in a row
Demo
Note that my query does not give the exact output you expect, but maybe this would be enough for your requirement. If you wanted to break out each island larger than 2 days in terms of pairs of 2 days at a time, you might have to also bring in a calendar table here.

Assuming you have a row for each date, you can use a single window function -- and no aggregation. That window function is a count of 'y" in the current row and next n - 1 days:
select date, date + interval <n - 1> day
from (select t.*,
sum(availability = 'y') over (order by date
rows between current row and <n - 1> following
) as num_y
from t
) t
where num_y = <n>;

Through below query you can achieve that. First I have numbered the rows with row_number()
user lead() to get next consecutive dates. In lead second parameter is determining how many consecutive dates will be considered.
WITH t AS (
SELECT date ,ROW_NUMBER() OVER(ORDER BY date) rownumber
FROM hotel_availablities where availability='y'
),
t2 as (SELECT date StartDate ,lead(date ,1)over (partition by date_add(date ,INTERVAL -rownumber day)) EndDate
FROM t)
select concat(startdate,' - ',enddate)daterange from t2 where enddate is not null

Related

How can i get substaction calculation row between multiple dates and display the percentage?

I have the following information using group by and some calculations
I'm trying to calculate the maximum difference value between several dates in this example are 3 dates (2022, 2021, 2020), the oldest date should calculate 0 because won't do substractions.
After detecting the maximun difference between the previous year, it must calculate the percentage:
After doing the query for maximum difference calculation between date rows. The final result should be this:
Demo with 4 dates: https://dbfiddle.uk/KF-d2KpR?hide=4
The following query is displaying without percentage:
WITH cte1 AS (
SELECT
a.date_rehearsal,
a.col1_val,
ROW_NUMBER() OVER (PARTITION BY a.date_rehearsal ORDER BY a.date_rehearsal DESC) AS rn
FROM demo a
),
cte2 AS (
SELECT
b.date_rehearsal,
b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn
ORDER BY b.date_rehearsal DESC), b.col1_val) AS diff
FROM cte1 b)
SELECT
c.date_rehearsal AS 'Dates',
MAX(c.diff) as 'max_col1_val_difference'
FROM cte2 c
GROUP BY c.date_rehearsal
ORDER BY c.date_rehearsal DESC
Can you please help me this operation to display with percentage?
Thanks in advance.
You can use a CTE to get the ROW_NUMBER by date, then SELECT the MAX difference, GROUPED BY date, using a subquery that will take the current col1_val and subtracting it from the subsequent row's value using LEAD partitioned by the ROW_NUMBER from the CTE. If the subsequent row is NULL, then subtract it from the current row's col1_val using COALESCE, which will return zero for the earliest year in your table (in your case, 2020).
WITH cte AS (
SELECT
a.date_rehearsal,
a.col1_val,
ROW_NUMBER() OVER (PARTITION BY a.date_rehearsal ORDER BY a.date_rehearsal DESC) AS rn
FROM demo a
)
SELECT
c.date_rehearsal AS 'Dates',
MAX(c.diff) as 'max_col1_val_difference',
ROUND(MAX(c.diffPercent),2) as 'max_col1_val_percent',
CONCAT(MAX(c.diff), ' (', ROUND(MAX(c.diffPercent),2), '%)') as 'max_dif_with_percentage'
FROM (
SELECT
b.date_rehearsal,
b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn ORDER BY b.date_rehearsal DESC), b.col1_val) AS diff,
(((b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn ORDER BY b.date_rehearsal DESC), b.col1_val))/b.col1_val)*100) AS diffPercent
FROM cte b) c
GROUP BY c.date_rehearsal
ORDER BY c.date_rehearsal DESC
Result:
Dates
max_col1_val_difference
max_col1_val_percent
max_dif_with_percentage
2022-07-01
6
5.08
6 (5.08%)
2021-07-01
10
10.00
10 (10.00%)
2020-07-01
0
0.00
0 (0.00%)
db<>fiddle here.

Group a sequence of lines [SQL]

Is there a way to group a sequence of rows in SQL (MySQL 5.1.73).
Let me explain, I have a query that gives this:
hour
start_date
end_date
10
2022-02-01 10:11:18
2022-02-01 10:50:18
11
2022-02-01 11:30:31
2022-02-01 11:38:12
13
2022-02-17 13:55:09
2022-02-17 13:58:38
14
2022-02-17 14:51:09
2022-02-17 14:57:59
And I would like to convert it to this:
hour
start_date
end_date
10
2022-02-01 10:11:18
2022-02-01 11:38:12
13
2022-02-17 13:55:09
2022-02-17 14:57:59
Indeed, I would like to group all the lines whose hours follow each other.
My request is a grouping in hours, like this :
SELECT hour( date ) as hour, MIN(date) as start_date , MAX(date) as end_date
FROM test_tbl
GROUP BY hour( date ) , date( date )
order by date, hour( date ) ;
But after doing this query, I would like to group the lines whose hours follow each other (10,11 => 10)...
EDIT: the following answer only works with MySQL version 8+
with tbl_by_hour as (
SELECT hour( date ) as hour, MIN(date) as start_date , MAX(date) as end_date
FROM test_tbl
GROUP BY hour( date ) , date( date )
order by date, hour( date )
)
select
min(hour) as hour,
min(start_date) as start_date,
max(end_date) as end_date
from (
select tab1.*,
sum(case when prev_hour is null or prev_hour = hour - 1 then 0 else 1 end) over(order by hour) grp
from (
select hour, start_date, end_date, lag(hour) over(order by hour) prev_hour from tbl_by_hour
) as tab1
) as tab2
group by grp
You can probably do something like this:
SELECT MIN(hours), dates, MIN(start_date), MAX(end_date), tn
FROM
(SELECT *,
CEIL(rownum/5) AS tn
FROM
(SELECT *,
CASE WHEN dates=#dt
AND hours=#hr+1
THEN #rn := #rn+1
WHEN dates=#dt
AND hours > #hr+1
THEN #rn := #rn+20
ELSE #rn := 1
END AS rownum,
#dt := dates,
#hr := hours
FROM
(SELECT hour(date) as hours, date(date) dates,
MIN(date) as start_date , MAX(date) as end_date
FROM test_tbl t
GROUP BY dates, hours) v
CROSS JOIN (SELECT #rn := 0, #dt := NULL, #hr := 0) r
ORDER BY dates, hours) s
) w
GROUP BY dates, tn;
I took your original query as base then made it as subquery.
Then I CROSS JOIN with a subquery of variables where I'm attempting to generate a custom row numbering. The conditions of the row number are:
If it's on the same date and the next hour increment from previous is +1 then continue the numbering.
If it's on the same date and the next hour increment from previous more than +1 then pick-up the last number and increment it by +20.
Repeat the row numbering sequence if the date is different.
After generating the row numbering, I convert to subquery then divide the row numbering by 5 and use ceiling (CIEL) function to somehow make them the same, effectively identifying (assuming) these rows with same CIEL(rownum/5) result as one group - this is where I felt it's not really convincing but it works anyhow.
Lastly, I convert that to a subquery again and did the whole MIN(hours), dates, MIN(start_date), MAX(end_date), tn with GROUP BY dates, tn.
It's not a convincing solution because the final operation (generating of the tn column) is based on creativity and not something certain. I usually prefer a solution that covers all the possible scenarios with something concrete rather than creative. However, I did some extensive tests on the current query with more data variation and so far it's returning good results. Also, I do notice that you said your MySQL version is 5.1+ so, I'm not really sure if this particular operation will work. Version 5.5+ is probably the lowest version of MySQL fiddle that is available online.
Here's a demo fiddle

A query for getting results separated by a date gap

ID
TIMESTAMP
1
2020-01-01 12:00:00
2
2020-02-01 12:00:00
3
2020-05-01 12:00:00
4
2020-06-01 12:00:00
5
2020-07-01 12:00:00
I am looking for a way to get records in a MySQL database that are within a certain range of each other. In the above example, notice that there is a month between the first two records, then a three month gap, before we see another three records with a month between.
What is a way to group these into two result sets, so I will get Ids 1, 2 and 3, 4, 5 A solution using days would be probably work the best as thats easier to modify.
You can use lag() and then logic to see where a gap is big enough to start a new set of records. A cumulative sum gives you the groups you want:
select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t;
If you want to summarize this with a start and end date:
select min(timestamp), max(timestamp)
from (select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t
) t
group by grouping;
For example, the following query:
select group_concat(ID)
from (
select w1.ID,w1.TS,w2.ID flag
from work1 w1 left outer join work1 w2
on timestampdiff(month,w2.TS,w1.TS)=1
order by w1.ID
) w
group by
case when flag is null then #str:=ID else #str end
See db fiddle

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.