A query for getting results separated by a date gap - mysql

ID
TIMESTAMP
1
2020-01-01 12:00:00
2
2020-02-01 12:00:00
3
2020-05-01 12:00:00
4
2020-06-01 12:00:00
5
2020-07-01 12:00:00
I am looking for a way to get records in a MySQL database that are within a certain range of each other. In the above example, notice that there is a month between the first two records, then a three month gap, before we see another three records with a month between.
What is a way to group these into two result sets, so I will get Ids 1, 2 and 3, 4, 5 A solution using days would be probably work the best as thats easier to modify.

You can use lag() and then logic to see where a gap is big enough to start a new set of records. A cumulative sum gives you the groups you want:
select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t;
If you want to summarize this with a start and end date:
select min(timestamp), max(timestamp)
from (select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t
) t
group by grouping;

For example, the following query:
select group_concat(ID)
from (
select w1.ID,w1.TS,w2.ID flag
from work1 w1 left outer join work1 w2
on timestampdiff(month,w2.TS,w1.TS)=1
order by w1.ID
) w
group by
case when flag is null then #str:=ID else #str end
See db fiddle

Related

SQL get consecutive starting and end date with specific period

I have a hotel_availablities table something like this.
date
availability
2021-01-15
y
2021-01-16
y
2021-01-17
y
2021-01-18
n
2021-01-19
n
2021-01-20
y
2021-01-21
n
2021-01-22
y
2021-01-23
y
I wanted to get the results of possible available date range values where period of stay is 2 days.
date range
2021-01-15 : 2021-01-16
2021-01-16 : 2021-01-17
2021-01-22 : 2021-01-23
If period of stays was 3 days I would get results as below
date range
2021-01-15 : 2021-01-18
How can I achieve this result with sql?
This is a gaps and islands problem. Assuming you are using MySQL 8+, we can use the difference in row numbers method here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY availability ORDER BY date) rn2
FROM yourTable
)
SELECT MIN(date) AS start_date, MAX(date) AS end_date, COUNT(*) AS cnt
FROM cte
WHERE availability = 'y'
GROUP BY rn1 - rn2
HAVING COUNT(*) >= 2; -- but change to COUNT(*) >= 3, e.g. for three days in a row
Demo
Note that my query does not give the exact output you expect, but maybe this would be enough for your requirement. If you wanted to break out each island larger than 2 days in terms of pairs of 2 days at a time, you might have to also bring in a calendar table here.
Assuming you have a row for each date, you can use a single window function -- and no aggregation. That window function is a count of 'y" in the current row and next n - 1 days:
select date, date + interval <n - 1> day
from (select t.*,
sum(availability = 'y') over (order by date
rows between current row and <n - 1> following
) as num_y
from t
) t
where num_y = <n>;
Through below query you can achieve that. First I have numbered the rows with row_number()
user lead() to get next consecutive dates. In lead second parameter is determining how many consecutive dates will be considered.
WITH t AS (
SELECT date ,ROW_NUMBER() OVER(ORDER BY date) rownumber
FROM hotel_availablities where availability='y'
),
t2 as (SELECT date StartDate ,lead(date ,1)over (partition by date_add(date ,INTERVAL -rownumber day)) EndDate
FROM t)
select concat(startdate,' - ',enddate)daterange from t2 where enddate is not null

MySQL Date Range Multi Column Count and Group By Select Statement

I have a query I have been working on for a while but I cannot seem to get it down. The other answers on here work well for counting an amount with a certain date range then grouping by the date to get the count. However I need to have two columns counted and grouped by date.
For example here is the query I have tried to get to work:
(SELECT COUNT(*) arrived, DATE(arrived) date, 'arrived' AS source
FROM products
WHERE arrived BETWEEN '2016-01-01' AND '2016-01-31'
GROUP BY DATE(date)
ORDER BY date ASC)
UNION ALL
(SELECT COUNT(*) released, DATE(released) date, 'released' AS source
FROM products
WHERE released BETWEEN '2016-01-01' AND '2016-01-31'
GROUP BY DATE(date)
ORDER BY date ASC)
However this returns the following:
arrived date source
3 2016-01-12 arrived
2 2016-01-28 arrived
1 2016-01-29 arrived
1 2016-01-05 released
What I am requiring is something like this:
date arrived released
2016-01-05 0 1
2016-01-12 3 0
2016-01-28 2 0
2016-01-29 1 0
Any suggestions? Thank you.
You can apply conditional aggregation to a derived table obtained by a UNION ALL operation for 'arrived' and 'released' dates:
SELECT `date`,
COUNT(CASE WHEN type = 'arrived' THEN 1 END) AS arrived,
COUNT(CASE WHEN type = 'released' THEN 1 END) AS released
FROM (
SELECT arrived AS `date`, 'arrived' as type
FROM products
WHERE arrived BETWEEN '2016-01-01' AND '2016-01-31'
UNION ALL
SELECT released AS `date`, 'released' as type
FROM products
WHERE released BETWEEN '2016-01-01' AND '2016-01-31') AS t
GROUP BY `date`
Demo here

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

SQL sum hits per day and calculate percentage change

I have a single table with a list of hits/downloads, every row has of course a date.
I was able to sum all the rows grouped by day.
Do you think it's possible to also calculate the change in percentage of every daily sum compared to the previous day using a single query, starting from the entire list of hits?
I tried to do this
select *, temp1.a-temp2.b/temp1.a*100 as percentage from
(select DATE(date), count(id_update) as a from vas_updates group by DATE(date)) as table1
UNION
(select DATE_ADD(date, INTERVAL 1 DAY), count(id_update) as b from vas_updates group by DATE(date)) as table2, vas_updates
but it won't work (100% CPU + crash).
Of course I can't JOIN them because those two temp tables share nothing in common being with 1 day offset.
The table looks like this, nothing fancy.
id_updates | date
1 2014-07-06 12:45:21
2 2014-07-06 12:46:10
3 2014-07-07 10:16:10
and I want
date | sum a | sum b | percentage
2014-07-07 2 1 -50%
It can be either be positive or negative obviously
select DATE(v.date), count(v.id_update) a, q2.b, count(v.id_update) - q2.b/count(v.id_update)*100 as Percentage
from vas_updates v
Left Join (select DATE_ADD(date, INTERVAL 1 DAY) d2, count(id_update) as b
from vas_updates group by d2) as q2
ON v.date = q2.d2
group by DATE(v.date)
The sum by day is:
select DATE(date), count(id_update) as a
from vas_update
group by DATE(date);
In MySQL, the easiest way to get the previous value is by using variables, which looks something like this:
select DATE(u.date), count(u.id_update) as cnt,
#prevcnt as prevcnt, count(u.id_update) / #prevcnt * 100,
#prevcnt := count(u.id_update)
from vas_update u cross join
(select #prevcnt := 0) vars
group by DATE(u.date)
order by date(u.date);
This will generally work in practice, but MySQL doesn't guarantee the ordering of variables. A more guaranteed approach looks like:
select dt, cnt, prevcnt, (case when prevcnt > 0 then 100 * cnt / prevcnt end)
from (select DATE(u.date) as dt, count(u.id_update) as cnt,
(case when (#tmp := #prevcnt) is null then null
when (#prevcnt := count(u.id_update)) is null then null
else #tmp
end) as prevcnt
from vas_update u cross join
(select #prevcnt := 0, #tmp := 0) vars
group by DATE(u.date)
order by date(u.date)
) t;

Group by half hour interval

I was lucky enough to find this awesome piece of code on Stack Overflow, however I wanted to change it up so it showed each half hour instead of every hour, but messing around with it, only caused me to ruin the query haha.
This is the SQL:
SELECT CONCAT(HOUR(created_at), ':00-', HOUR(created_at)+1, ':00') as hours,
COUNT(*)
FROM urls
GROUP BY HOUR(created_at)
ORDER BY HOUR(created_at) ASC
How would I go about getting a result every half an hour? :)
Another thing, is that, if it there is half an hour with no results, I would like it to return 0 instead of just skipping that step. It looks kinda of weird win I do statistics over the query, when it just skips an hour because there were none :P
If the format isn't too important, you can return two columns for the interval. You might even just need the start of the interval, which can be determined by:
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start
the alias can be used in GROUP BY and ORDER BY clauses. If you also need the end of the interval, you will need a small modification:
SELECT
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i') as period_start,
date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i') as period_end,
COUNT(*)
FROM urls
GROUP BY period_start
ORDER BY period_start ASC;
Of course you can also concatenate the values:
SELECT concat_ws('-',
date_format(created_at - interval minute(created_at)%30 minute, '%H:%i'),
date_format(created_at + interval 30-minute(created_at)%30 minute, '%H:%i')
) as period,
COUNT(*)
FROM urls
GROUP BY period
ORDER BY period ASC;
Demo: http://rextester.com/RPN50688
Another thing, is that, if it there is half an hour with no results, I
would like it to return 0
If you use the result in a procedural language, you can initialize all 48 rows with zero in a loop and then "inject" the non-zero rows from the result.
However - If you need it to be done in SQL, you will need a table for a LEFT JOIN with at least 48 rows. That could be done inline with a "huge" UNION ALL statement, but (IMHO) it would be ugly. So I prefer to have sequence table with one integer column, which can be very usefull for reports. To create that table I usually use the information_schema.COLUMNS, since it is available on any MySQL server and has at least a couple of hundreds rows. If you need more rows - just join it with itself.
Now let's create that table:
drop table if exists helper_seq;
create table helper_seq (seq smallint auto_increment primary key)
select null
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
limit 100; -- adjust as needed
Now we have a table with integers from 1 to 100 (though right now you only need 48 - but this is for demonstration).
Using that table we can now create all 48 time intervals:
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48;
We will get the following result:
period_start | period_end
00:00:00 | 00:30:00
00:30:00 | 01:00:00
...
23:30:00 | 24:00:00
Demo: http://rextester.com/ISQSU31450
Now we can use it as a derived table (subquery in FROM clause) and LEFT JOIN your urls table:
select p.period_start, p.period_end, count(u.created_at) as cnt
from (
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48
) p
left join urls u
on time(u.created_at) >= p.period_start
and time(u.created_at) < p.period_end
group by p.period_start, p.period_end
order by p.period_start
Demo: http://rextester.com/IQYQ32927
Last step (if really needed) is to format the result. We can use CONCAT or CONCAT_WS and TIME_FORMAT in the outer select. The final query would be:
select concat_ws('-',
time_format(p.period_start, '%H:%i'),
time_format(p.period_end, '%H:%i')
) as period,
count(u.created_at) as cnt
from (
select time(0) + interval 30*(seq-1) minute as period_start,
time(0) + interval 30*(seq) minute as period_end
from helper_seq s
where s.seq <= 48
) p
left join urls u
on time(u.created_at) >= p.period_start
and time(u.created_at) < p.period_end
group by p.period_start, p.period_end
order by p.period_start
The result would look like:
period | cnt
00:00-00:30 | 1
00:30-01:00 | 0
...
23:30-24:00 | 3
Demo: http://rextester.com/LLZ41445
Switch to seconds.
Do arithmetic to get a number for each unit of time (using 30*60 for half-hour, in your case)
Have a table of consecutive numbers.
Use LEFT JOIN to get even missing units of time.
Do the GROUP BY.
Convert back from units of time to actual time -- for display.
(Steps 3 and 4 are optional. The question says "every", so I assume they are needed.)
Steps 1 and 2 are embodied in something like
FLOOR(UNIX_TIMESTAMP(created_at) / (30*60))
For example:
mysql> SELECT NOW(), FLOOR(UNIX_TIMESTAMP(NOW()) / (30*60));
+---------------------+----------------------------------------+
| NOW() | FLOOR(UNIX_TIMESTAMP(NOW()) / (30*60)) |
+---------------------+----------------------------------------+
| 2018-03-02 08:24:48 | 844448 |
+---------------------+----------------------------------------+
Step 3 is needs to be done once and kept in a permanent table. Or, if you have MariaDB, use a "seq" pseudo-table; for example `seq_844448_to_900000 would dynamically give a table that would reach pretty far into the future.
Step 6 example:
mysql> SELECT DATE_FORMAT(FROM_UNIXTIME((844448) * 30*60), "%b %d %h:%i");
+-------------------------------------------------------------+
| DATE_FORMAT(FROM_UNIXTIME((844448) * 30*60), "%b %d %h:%i") |
+-------------------------------------------------------------+
| Mar 02 08:00 |
+-------------------------------------------------------------+
+---------------------------------------------------------------+
| DATE_FORMAT(FROM_UNIXTIME((844448+1) * 30*60), "%b %d %h:%i") |
+---------------------------------------------------------------+
| Mar 02 08:30 |
+---------------------------------------------------------------+
Well, this could be a bit verbose but it works:
SELECT hours, SUM(count) as count FROM (
SELECT CONCAT(HOUR(created_at), ':', LPAD(30 * FLOOR(MINUTE(created_at)/30), 2, '0'), '-',
HOUR(DATE_ADD(created_at, INTERVAL 30 minute)), ':', LPAD(30 * FLOOR(MINUTE(DATE_ADD(created_at, INTERVAL 30 minute))/30), 2, '0')) as hours,
COUNT(*) as count
FROM urls
GROUP BY HOUR(created_at), FLOOR(MINUTE(created_at)/30)
UNION ALL
SELECT '00:00-00:30'as hours, 0 as count UNION ALL SELECT '00:30-01:00'as hours, 0 as count UNION ALL
SELECT '01:00-01:30'as hours, 0 as count UNION ALL SELECT '01:30-02:00'as hours, 0 as count UNION ALL
SELECT '02:00-02:30'as hours, 0 as count UNION ALL SELECT '02:30-03:00'as hours, 0 as count UNION ALL
SELECT '03:00-03:30'as hours, 0 as count UNION ALL SELECT '03:30-04:00'as hours, 0 as count UNION ALL
SELECT '04:00-04:30'as hours, 0 as count UNION ALL SELECT '04:30-05:00'as hours, 0 as count UNION ALL
SELECT '05:00-05:30'as hours, 0 as count UNION ALL SELECT '05:30-06:00'as hours, 0 as count UNION ALL
SELECT '06:00-06:30'as hours, 0 as count UNION ALL SELECT '06:30-07:00'as hours, 0 as count UNION ALL
SELECT '07:00-07:30'as hours, 0 as count UNION ALL SELECT '07:30-08:00'as hours, 0 as count UNION ALL
SELECT '08:00-08:30'as hours, 0 as count UNION ALL SELECT '08:30-09:00'as hours, 0 as count UNION ALL
SELECT '09:00-09:30'as hours, 0 as count UNION ALL SELECT '09:30-10:00'as hours, 0 as count UNION ALL
SELECT '10:00-10:30'as hours, 0 as count UNION ALL SELECT '10:30-11:00'as hours, 0 as count UNION ALL
SELECT '11:00-11:30'as hours, 0 as count UNION ALL SELECT '11:30-12:00'as hours, 0 as count UNION ALL
SELECT '12:00-12:30'as hours, 0 as count UNION ALL SELECT '12:30-13:00'as hours, 0 as count UNION ALL
SELECT '13:00-13:30'as hours, 0 as count UNION ALL SELECT '13:30-14:00'as hours, 0 as count UNION ALL
SELECT '14:00-14:30'as hours, 0 as count UNION ALL SELECT '14:30-15:00'as hours, 0 as count UNION ALL
SELECT '15:00-15:30'as hours, 0 as count UNION ALL SELECT '15:30-16:00'as hours, 0 as count UNION ALL
SELECT '16:00-16:30'as hours, 0 as count UNION ALL SELECT '16:30-17:00'as hours, 0 as count UNION ALL
SELECT '17:00-17:30'as hours, 0 as count UNION ALL SELECT '17:30-18:00'as hours, 0 as count UNION ALL
SELECT '18:00-18:30'as hours, 0 as count UNION ALL SELECT '18:30-19:00'as hours, 0 as count UNION ALL
SELECT '19:00-19:30'as hours, 0 as count UNION ALL SELECT '19:30-20:00'as hours, 0 as count UNION ALL
SELECT '20:00-20:30'as hours, 0 as count UNION ALL SELECT '20:30-21:00'as hours, 0 as count UNION ALL
SELECT '21:00-21:30'as hours, 0 as count UNION ALL SELECT '21:30-22:00'as hours, 0 as count UNION ALL
SELECT '22:00-22:30'as hours, 0 as count UNION ALL SELECT '22:30-23:00'as hours, 0 as count UNION ALL
SELECT '23:00-23:30'as hours, 0 as count UNION ALL SELECT '23:30-00:00'as hours, 0 as count
) AS T
GROUP BY hours ORDER BY hours;
The most difficult part of your query is output of statistics for intervals that don't have any hits. SQL is all about querying and aggregating existing data; selecting or aggregating the data missing in the table is quite unordinary task. That's why, like Wolph stated in comments, there is no pretty solution for this task.
I solved this problem by explicitly selecting all half intervals of the day. This solution could be used if number of intervals is limited like in your case. This will not work however if you aggregate by different days from long period of time.
I'm not a fan of this query but I can't propose anything better. More elegant solution could be achieved with stored procedure with a loop, but seems like you want to solve it with raw SQL query.
You can add some math to calculate 48 intervals instead of 24 and put it into another field by which you're going to group and sort.
SELECT HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) as interval48,
if(HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) % 2 =0,
CONCAT(HOUR(created_at), ':00-', HOUR(created_at), ':30'),
CONCAT(HOUR(created_at), ':30-', HOUR(created_at)+1, ':00')
) as hours,
count(*)
FROM urls
GROUP BY HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30)
ORDER BY HOUR(created_at)*2+FLOOR(MINUTE(created_at)/30) ASC
Example of result:
0 0:00-0:30 2017
1 0:30-1:00 1959
2 1:30-2:00 1830
3 1:30-2:00 1715
4 2:30-3:00 1679
5 2:30-3:00 1688
The result of original query posted by Jazerix was:
0:00-1:00 3976
1:00-2:00 3545
2:00-3:00 3367
A different Approach without creating additional tables. May look like a hack though :-)
Step 1 : Generate a Time Table Dynamically
Assumption : INFORMATION_SCHEMA DB is avaialble and has a table COLLATIONS which normally has more than 100 records. You can use any table which has minimum 48 records
Query :
SELECT #time fromTime, ADDTIME(#time, '00:29:00') toTime,
#time := ADDTIME(#time, '00:30:00')
FROM information_schema.COLLATIONS
JOIN (SELECT #time := TIME('00:00:00')) a
WHERE #time < '24:00:00'
Above query will give a table with from time and to time with an interval of 30 minutes.
Step 2 : Use the first query to generate required result joining urls table
Query :
SELECT CONCAT(fromTime, '-', toTime) AS halfHours, COUNT(created_at)
FROM
(SELECT #time fromTime, ADDTIME(#time, '00:29:00') toTime, #time := ADDTIME(#time, '00:30:00')
FROM information_schema.COLLATIONS
JOIN (SELECT #time := TIME('00:00:00')) a
WHERE #time < '24:00:00'
) timeTable
LEFT JOIN urls ON HOUR(created_at) BETWEEN HOUR(fromTime) AND HOUR(toTime)
AND MINUTE(created_at) BETWEEN MINUTE(fromTime) AND MINUTE(toTime)
GROUP BY fromTime
SQLFiddle
I hope this will work for,
SELECT
#sTime:= CONCAT(HOUR(created_at),":",
(CASE WHEN MINUTE(created_at) > 30 THEN 30 ELSE 0 END)) as intVar,
(CONCAT(
AddTime(#sTime, '00:00:00'),
' to ',
AddTime(#sTime, '00:30:00')
)) as timeInterval,
COUNT(*) FROM urls
GROUP BY
(CONCAT(HOUR(created_at),":",(CASE WHEN MINUTE(created_at) > 30 THEN 30 ELSE 0 END)))
ORDER BY HOUR(created_at) ASC
Simply convert to sec and divide by 30 mins(1800secs). And to verify i used min, max on timestamp.
SELECT concat(TIME_FORMAT(min(created_at),"%H:%i")," - ", TIME_FORMAT(max(created_at),"%H:%i")) as hours,
COUNT(*)
FROM urls
GROUP BY FLOOR(TIME_TO_SEC(created_at)/1800)
ORDER BY HOUR(created_at) ASC