How to find maximum time range collision occurencies in Mysql - mysql

I have a time range entity with start and end datetime column.
I need to find the maximum occurrencies (count) of overlapping the same time slot.
In the example above, the count is 4.
https://www.db-fiddle.com/f/pcq1MjQeqSEMDdyGxkFsR5/0
Probably I need some recurring query but I don't know how to start.

For MySQL 5.x:
SELECT SUM(points2.weight) max_weight
FROM (
SELECT start dt FROM slots
UNION DISTINCT
SELECT `end` FROM slots
) points1
JOIN (
SELECT dt, SUM(weight) weight
FROM (
SELECT start dt, 1 weight FROM slots
UNION ALL
SELECT `end`, -1 FROM slots
) points
GROUP BY dt
) points2 ON points1.dt >= points2.dt
GROUP BY points1.dt
ORDER BY max_weight DESC LIMIT 1
https://dbfiddle.uk/f0b56Q4X (step-by-step, with comments)

Related

collect_set() distinct users by day from last 90 days only when user is older than last 90 days

for now I was able to collect_set() everyone that is active with no problem:
with aux as(
select date
,collect_set(user_id) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as user_id
,feature
--
from (
select data
,feature
,collect_set(user_id)
--
from table
--
group by date, feature
)
)
--
select date
,distinct_array(flatten(user_id))
,feature
--
from aux
The problem is, now I have to keep only users that are older than last 90 days
I tried this and didn't work:
select date
,collect_set(case when user_created_at < date - interval 90 day
then user_id end) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as teste
,feature
from table
The reason it didn't work is because the filter inside collect_select() filters only users from one day instead filtering all the users from the last 90 days,
Making the result with more results than expected.
How can I get it correctly?
As reference, I'm using this query to verify if is correct:
select
count(distinct user_id) as total
,count(distinct case when user_created_at < date('2020-04-30') - interval 90 day then user_id end)
,count(distinct case when user_created_at >= date('2020-04-30') - interval 90 day then user_id end)
--
from table
--
where 1=1
and date >= date('2020-04-30') - interval 90 day
and date <= '2020-04-30'
and feature = 'a_feature'
pretty ugly workaround but:
select data
,feature
,collect_set(cus.client_id) as client
from (
select data
,explode(array_distinct(flatten(client))) as client
,feature
from(
select data
,collect_set(client_id) over(
partition by feature
order by cast(timestamp(data) as float)
range between (-90*60*60*24) following and 0 preceding
) as cliente
,feature
from (
select data
,feature
,collect_set(client_id) as cliente
from da_pandora.ds_transaction dtr
--
group by data, feature
)
)
)as dtr
left join costumer as cus
on cus.client_id = dtr.client and date(client_created_at) < data - interval 90 day
group by data, feature

Function to identify records that are within 5 minute intervals

I have a MYSQL table with many records as shown in the image below.
I need to identify rows that are within 5 minute intervals for example, and mark each row in a new column that row is within 5 minutes. See the example of the output.
How can I do this through a function?
Test
WITH RECURSIVE
cte AS ( SELECT MIN(`datetime`) AS dt_start,
MIN(`datetime`) + INTERVAL 5 MINUTE AS dt_end,
1 AS group_num
FROM sourcetable
UNION ALL
SELECT dt_end,
dt_end + INTERVAL 5 MINUTE,
group_num + 1
FROM cte
WHERE dt_end <= ( SELECT MAX(`datetime`)
FROM sourcetable )
)
SELECT sourcetable.*, cte.group_num
FROM sourcetable
JOIN cte ON `datetime` >= dt_start
AND `datetime` < dt_end
If your 5-minute intervals are based on calendar time, then the most efficient method would use window functions:
select t.*,
dense_rank() over (order by to_seconds(floor(data / 300))) as tr
from t;
If it is based on when the first record within each group starts, then you need recursive CTEs as Akina suggests.

Retrieve first and last entry of hour with MySQL

I have database to storing temperatures. There are three to five temperatures per hour and i want to get first and last temperature of an hour but i can't create suitable subquery.
Here is the main query to get temperatures by hour.
SELECT HOUR(time) AS h, max(temp_in) as max, min(temp_in) as min, count(temp_in) as count
FROM tbl_temps
GROUP BY h;
You can use a union and subquery
select * from
my_table where (HOUR(time),MINUTE(time)) in (
select HOUR(time), min(MINUTE(time))
from my_table
group by HOUR(time)
)
union
select * from
my_table where (HOUR(time),MINUTE(time)) in (
select HOUR(time), max(MINUTE(time))
from my_table
group by HOUR(time)
)
order by HOUR(time), MINUTE(time)

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.

MySQL - How to select rows with the min(timestamp) per hour of a given date

I have a table of production readings and need to get a result set containing a row for the min(timestamp) for EACH hour.
The column layout is quite simple:
ID,TIMESTAMP,SOURCE_ID,SOURCE_VALUE
The data sample would look like:
123,'2013-03-01 06:05:24',PMPROD,12345678.99
124,'2013-03-01 06:15:17',PMPROD,88888888.99
125,'2013-03-01 06:25:24',PMPROD,33333333.33
126,'2013-03-01 06:38:14',PMPROD,44444444.44
127,'2013-03-01 07:12:04',PMPROD,55555555.55
128,'2013-03-01 10:38:14',PMPROD,44444444.44
129,'2013-03-01 10:56:14',PMPROD,22222222.22
130,'2013-03-01 15:28:02',PMPROD,66666666.66
Records are added to this table throughout the day and the source_value is already calculated, so no sum is needed.
I can't figure out how to get a row for the min(timestamp) for each hour of the current_date.
select *
from source_readings
use index(ID_And_Time)
where source_id = 'PMPROD'
and date(timestamp)=CURRENT_DATE
and timestamp =
( select min(timestamp)
from source_readings use index(ID_And_Time)
where source_id = 'PMPROD'
)
The above code, of course, gives me one record. I need one record for the min(hour(timestamp)) of the current_date.
My result set should contain the rows for IDs: 123,127,128,130. I've played with it for hours. Who can be my hero? :)
Try below:
SELECT * FROM source_readings
JOIN
(
SELECT ID, DATE_FORMAT(timestamp, '%Y-%m-%d %H') as current_hour,MIN(timestamp)
FROM source_readings
WHERE source_id = 'PMPROD'
GROUP BY current_hour
) As reading_min
ON source_readings.ID = reading_min.ID
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT DATE(TIMESTAMP) date,
HOUR(TIMESTAMP) hour,
MIN(TIMESTAMP) min_date
FROM Table1
GROUP BY DATE(TIMESTAMP), HOUR(TIMESTAMP)
) b ON DATE(a.TIMESTAMP) = b.date AND
HOUR(a.TIMESTAMP) = b.hour AND
a.timestamp = b.min_date
SQLFiddle Demo
With window function:
WITH ranked (
SELECT *, ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp) rn
FROM source_readings -- original table
WHERE date(timestamp)=CURRENT_DATE AND source_id = 'PMPROD' -- your custom filter
)
SELECT * -- this will contain `rn` column. you can select only necessary columns
FROM ranked
WHERE rn=1
I haven't tested it, but the basic idea is:
1) ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp)
This will give each row a number, starting from 1 for each hour, increasing by timestamp. The result might look like:
|rest of columns |rn
123,'2013-03-01 06:05:24',PMPROD,12345678.99,1
124,'2013-03-01 06:15:17',PMPROD,88888888.99,2
125,'2013-03-01 06:25:24',PMPROD,33333333.33,3
126,'2013-03-01 06:38:14',PMPROD,44444444.44,4
127,'2013-03-01 07:12:04',PMPROD,55555555.55,1
128,'2013-03-01 10:38:14',PMPROD,44444444.44,1
129,'2013-03-01 10:56:14',PMPROD,22222222.22,2
130,'2013-03-01 15:28:02',PMPROD,66666666.66,1
2) Then on the main query we select only rows with rn=1, in other words, rows that has lowest timestamp in each hourly partition (1st row after sorted by timestamp in each hour).