I have a MySQL table with a datetime row. How can I find all groups with at least 5 entries within 10 minutes?
My only idea is to write a program (in whatever language) and loop over the timestamps, check always 5 (..) successive entries, calculate the time span between the last and the first and check whether it is below the limit.
Can this be done using a single SQL query too?
(The scenario is is simplified and the numbers are just examples.)
As requested, here comes an example:
id | timestamp | other_column
---|---------------------|-------------
3 | 2017-01-01 11:00:00 | thank
2 | 2017-01-01 11:01:00 | you
1 | 2017-01-01 11:02:00 | for
* 6 | 2017-01-01 11:20:00 | your
* 5 | 2017-01-01 11:21:00 | efforts
* 4 | 2017-01-01 11:22:00 | to
* 7 | 2017-01-01 11:23:00 | help
* 8 | 2017-01-01 11:24:00 | me
9 | 2017-01-01 11:40:00 | :
10 | 2017-01-01 11:41:00 | )
If the count limit is 5 and the timespan limit is 10 minutes, I'd like to get the entries marked with "*". The "id" column is the primary key of the table, but the order is not always the order of the timestamps. The "other_column" is used for a where clause. The table has about 1 million entries.
Try to break this down logically. Sorry for the psuedo code bits, I'm a little short on time.
select t1.id, t1.timestamp, t2.timestamp
from yourtable t1
inner join yourtable t2 on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
(plus 20 minutes won't work as is, use appropriate add function)
So this will give you a relatively giant list of all ID's joined to any other id's within a 20 minute time interval (including one row for itself). (add, I'm only picking out the first row of the group at this point, easier just to grab the 'header row' here by this timestamp plus 20 minutes and worry about the rest in the next step) If we group by the ID and time, we get a count of how many rows were within 20 minutes:
select id, t1.timestamp, count(1)
from yourtable t1
inner join yourtable t2 on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
group by id, t1.timestamp
having count(1) > 4
This will now give you a list of all the ID's and it's timestamp that has itself and 4 other records or more within 20 minutes away from that timestamp. Now it depends on how you want to group from here, if you want each of the 5 lines, we can call the query above a subquery and join it back to the main table to get the rows you want returned.
select t3.*
from
(select id, t1.timestamp, count(1)
from yourtable t1
inner join yourtable t2
on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
group by id, t1.timestamp
having count(1) > 4) a
inner join yourtable t3 on t3.timestamp >= a.timestamp and t3.timestamp < (a.timestamp + 20 minutes)
And that should give you ID 4-8 and it's info returned (order as you see fit).
My apologies that I don't have the time to test, but the logic should work.
Related
I have 3 columns, employee_id, start_time and end_time I want to make bucks of 1 hour to show me how many employees were working in each hour. For example, employee A worked from 12 pm to 3 pm and employee B worked from 2 pm to 4 pm so, at 12 pm (1 employee was working) 1 pm (1 employee) 2 pm (2 employees were working) 3 pm (2 employees) and 4 pm (1 employee), how can I make this in SQL? Let me show you a picture of the start and end time columns.
Sample input would be:
Expected outcome would be something like
I want to create a bucket in order to know how many people were working in each hour of the day.
SELECT
Employee_id,
TIME(shift_start_at,timezone) AS shift_start,
TIME(shift_end_at,timezone) AS shift_end,
FROM
`employee_shifts` AS shifts
WHERE
DATE(shifts.shift_start_at_local) >= "2022-05-01"
GROUP BY
1,
2,
3
Assuming you are on mysql version 8 or above generate all the buckets , left join to shifts to infill times in start-endtime ranges , filter out those that are not applicable then count eg:-
DROP TABLE IF EXISTS t;
create table t (id int, startts datetime, endts datetime);
insert into t values
(1,'2022-06-19 08:30:00','2022-06-19 10:00:00'),
(2,'2022-06-19 08:30:00','2022-06-19 08:45:00'),
(3,'2022-06-19 07:00:00','2022-06-19 07:59:00');
with cte as
(select 7 as bucket union select 8 union select 9 union select 10 union select 11),
cte1 as
(select bucket,t.*,
floor(hour(startts)) starthour, floor(hour(endts)) endhour
from cte
left join t on cte.bucket between floor(hour(startts)) and floor(hour(endts))
)
select bucket,count(id) nof from cte1 group by bucket
;
+--------+-----+
| bucket | nof |
+--------+-----+
| 7 | 1 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
| 11 | 0 |
+--------+-----+
5 rows in set (0.001 sec)
If you have a limited number of time bucket maybe you can use it this way
WITH CTE AS
(SELECT
COUNTRY,
MONTH,
TIMESTAMP_DIFF(time_b, time_a, MINUTE) dt,
METRIC_a,
METRIC_b
FROM
TABLE_NAME)
SELECT
CASE
WHEN dt BETWEEN 0 AND 10 THEN "0-10"
WHEN dt BETWEEN 10 AND 20 THEN "11-20"
WHEN dt BETWEEN 20 AND 30 THEN "21-30"
WHEN dt BETWEEN 30 AND 40 THEN "31-40"
WHEN dt > 40 THEN ">40"
END as time_bucket,
AVG(METRIC_a),
SUM(METRIC_b)
FROM CTE
Althought, I should emphasize that this solution works if you have a limited bucket. If you have a lot of buckets, you can create a base table with your buckets then LEFT JOIN it to get your results.
Just use a subquery for each column mentioning the required timestamp in between, also make sure your start_time and end_time columns are timestamp types. For more information, please share the table structure, sample data, and expected output
If I understood well, this would be
SELECT HOUR, (SELECT COUNT(*)
FROM employee
WHERE start_time <= HOUR
AND end_time >= HOUR) AS working
FROM schedule HOUR
Where schedule is a table with employee schedules.
I have two tables, both with a Time column as timestamp type which is filled by default when the row is created: Table1 is updated approximately every 10 seconds:
Time | Val_1a | Val_2a | Val_3a
2021-11-06 13:59:53 | 15 | 10 | 35
2021-11-06 14:00:02 | 12 | 15 | 34
.................
2021-11-06 14:05:25 | 11 | 13 | 35
2021-11-06 14:05:35 | 11 | 17 | 36
Table2 is updated every hour after mathematical operations on table1:
Time | Var_1b | Var_2b | Var_3b
2021-11-06 11:00:00 | 2 | 15 | 30
2021-11-06 12:00:00 | 8 | 12 | 32
2021-11-06 13:00:00 | 12 | 11 | 35
What I would like to get but I'm not able to do in any way, is:
Check that the last table1.Val_2a value is greater than the first table1.Val_2a value written at the beginning of the current hour (with the tables above, check if 17 > 15). If this condition is not met, the entire query must return 0 otherwise:
2a) If the last row in table2 refers to the previous day, then the query result is simply the difference of the two table1.Val_2a values (17 - 15 = 2)
2b) Otherwise their difference is calculated as at point 2a (17-15 = 2) and it is added to the table2.Var_1b value (2 + 12 = 14)
I hope I was able to explain it in a clearly way, and that it all is possible with a single query. Thanks everyone for the support
Sorry, if I add an Answer but I couldn't add the image into the comment.
This is the qwery I used to test the CASE clause
SELECT t1.dtm, t1.Val_2a2, t1.Val_2a1,
CASE WHEN Val_2a2 > Val_2a1
THEN Val_2a2-Val_2a1 ELSE 0 END AS ValF FROM (SELECT DATE_FORMAT(time, '%Y-%m-%d %H:00:00') dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time),',',1) Val_2a1,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time DESC),',',1) Val_2a2 FROM table1 GROUP BY dtm) t1
and this is the unexpected result
Qwery result
It is possible in a single query but different people will have different method of doing it. Whatever the method is, I personally think that the most important part is to keep the logic intact. The details you've provided in your question got me assuming that this might be a kind of query you're looking for:
SELECT t1.dtm, t1.Val_2a2, t1.Val_2a1, t2.Val_1b2,
CASE WHEN Val_2a2 > Val_2a1
THEN Val_2a2-Val_2a1+Val_1b2 ELSE 0 END AS ValF
FROM
(SELECT DATE_FORMAT(time, '%Y-%m-%d %H:00:00') dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time),',',1) Val_2a1 ,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time DESC),',',1) Val_2a2
FROM table1
GROUP BY dtm) t1
LEFT JOIN
(SELECT DATE(time) dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_1b ORDER BY time DESC),',',1) Val_1b2
FROM table2
GROUP BY dtm) t2
ON DATE(t1.dtm)=t2.dtm;
Demo fiddle
hoping it can help someone else, after some more test this is the final qwery I got, considering I just need a value on the fly without needing of storing it.
Of course every consideration by the experts is more than appreciate.
Thanks to all
SELECT
CASE WHEN
(ABS(t1.Val_2a2) - ABS(t1.Val_2a1)) BETWEEN 0 AND 30
THEN t1.Val_2a2-t1.Val_2a1+t2.Val_1b2
ELSE t2.Val_1b2
END AS My_result
FROM
(SELECT DATE_FORMAT(Time, '%Y-%m-%d %H:00:00') dtm,
(SELECT Val_2a FROM table1 WHERE Time >= DATE_FORMAT(NOW(),"%Y-%m-%d %H:00:00") ORDER BY Time LIMIT 1) Val_2a1,
(SELECT Val_2a FROM table1 WHERE Time >= DATE_FORMAT(NOW(),"%Y-%m-%d %H:00:00") ORDER BY Time DESC LIMIT 1) Val_2a2
FROM table1
GROUP BY dtm
ORDER BY Time DESC LIMIT 1) t1
LEFT JOIN
(SELECT (Time) dtm,
(Val_1b) Val_1b2
FROM table2
GROUP BY dtm ORDER BY dtm DESC LIMIT 1) t2
ON DATE(t1.dtm)= DATE(t2.dtm)
id month status
1 1997-11-01 A
1 2015-08-01 B
2 2010-01-01 A
2 2010-02-01 B
2 2012-10-01 C
That I would like to format to be:
id month lead_month status
1 1997-11-01 2015-08-01 A
1 2015-08-01 NOW() B
2 2010-01-01 2010-02-01 A
2 2010-02-01 2012-10-01 B
2 2012-10-01 NOW() C
MySQL is new to me, and I have trouble wrapping my head around variables. I would prefer to use a simple LEAD() with a PARTITION but unfortunately, I can't.
Here's my attempt, that doesn't work:
SET #lead = '1995-01-01'; --abitrary floor
select id, month, status, #lead, #lead:=month from table
The output looks like this, which also doesn't check if the id's from row to row are the same:
id month lead_month status
1 1997-11-01 1995-01-01 A
1 2015-08-01 1997-11-01 B
2 2010-01-01 2015-08-01 A
2 2010-02-01 2010-01-01 B
2 2012-10-01 2010-02-01 C
Don't muck around with variables in MySQL. That sort of logic would better reside in whatever language you are using for your application. This, however can be done in SQL.
My first instinct is simply to save that data in an extra column. Don't worry about the size of the db–there aren't enough months in the universe to become a problem.
There is also something wrong with your ids: these should almost always be primary keys, i. e. unique.
If you insist on your scheme, you can use a join. Assuming consecutive unique ids:
SELECT a.id, a.month, b.month AS lead_month, status FROM table AS a LEFT JOIN table AS b WHERE a.id - 1 = b.id;
You can use a correlated subquery:
select t.*,
(select t2.month
from t t2
where t.id = t2.id
order by t2.month desc
limit 1
) as next_month
from t;
If you want to replace the value for the last month for each id, then you can use coalesce():
select t.*,
coalesce((select t2.month
from t t2
where t.id = t2.id
order by t2.month desc
limit 1
), now()) as next_month
from t;
I have a table in my database that contains an ID and DATETIME column, here is some sample data:
ID | DATETIME
1 | 2014-05-06 01:12
1 | 2014-05-06 01:30
1 | 2014-05-06 01:45
1 | 2014-05-06 02:59
2 | 2014-05-06 01:17
2 | 2014-05-06 01:18
2 | 2014-05-06 01:19
2 | 2014-05-06 02:00
I need to produce a query that determines the ID belonging to the object that has the longest time between its DATETIME values, where the time between consecutive DATETIME values does not exceed 20 minutes.
For example, in the sample data, I would want to return 1 as it has DATETIME values from (01:12 - 01:45) without having a consecutive difference of 20 minutes between DATETIME values.
Thanks.
It looks like you will need a self-join. Because if you had 10 entries for an ID, your 20 minute gap might be between entries 3-6 vs 1-4 or even 4-9. So the second instance of the join would be on the same ID and have a date time higher than that of the primary entry, but less than 20 minutes. Then, it could be ordered by the time-gap and limit to the one you want. Something like:
select
YT.ID,
YT.DTColumn,
MAX( YT2.DTColumn ) as MaxDateWithin20Minutes
from
YourTable YT
JOIN YourTable YT2
ON YT.ID = YT2.ID
AND YT.DTColumn < YT2.DTColumn
AND YT2.DTColumn <= date_add( YT.DTColumn, INTERVAL 20 MINUTE )
group by
YT.ID,
YT.DTColumn
order by
timediff(MAX( YT2.DTColumn ), YT.DTColumn) DESC
limit
1
You need to get the next (or previous) value and get the time difference. I think the following does what you want:
select t.*
from (select t.*,
(select t2.datetime
from table t2
where t2.id = t.id and t2.datetime < t.datetime
order by t2.datetime desc
) prev_datetime
from table t
) t
where datetime <= prev_datetime + interval 20 minutes
order by timestampdiff(second, prev_datetime, datetime) desc
limit 1;
I am trying to generate a table in the following format.
Proday | 2014-04-01 | 2014-03-01
--------------------------------
1 | 12 | 17
2 | 6 | 0
7 | 0 | 24
13 | 3 | 7
Prodays (duration between two timestamps) is a calculated value and the data for months is a COUNT. I can output the data for a single month, but am having troubles joining queries to additional months. The index (prodays) may not match for each month. e.g.. 2014-04-01 may not have any data for Prodays 7, whereas 2014-03-01 may not have Proday 2. Should indicate with 0 or null.
I suspect FULL OUTER JOIN is what should do the trick. But have read that's not possible in Mysql?
This is the query to get data for a single month:
SELECT round((protime - createtime) / 86400) AS prodays, COUNT(id) AS '2014-04-01'
FROM `tbl_users` as t1
WHERE status = 1 AND DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') >= '2014-04-01'
AND DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') <= LAST_DAY('2014-04-01')
GROUP BY prodays
ORDER BY `prodays` ASC
How can I join/union an additional query to create a column for 2014-03-01?
You want to use conditional aggregation -- that is, move the filtering logic from the where clause to the select clause:
SELECT round((protime - createtime) / 86400) AS prodays,
sum(DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') >= '2014-04-01' AND
DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') <= LAST_DAY('2014-04-01')
) as `2014-04-01`,
sum(DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') >= '2014-03-01' AND
DATE_FORMAT(FROM_UNIXTIME(createtime),'%Y-%m-%d') <= LAST_DAY('2014-03-01')
) as `2014-03-01`
FROM `tbl_users` as t1
WHERE status = 1
GROUP BY prodays
ORDER BY `prodays` ASC;