Group By 3 columns (JobId, StartTime, EndTime) for continuous days in MySQL - mysql

I want to group by the JobId, StartTime & EndTime only for continuous days. If a specific row doesn't form part of a range it should be discarded. The Id's should also pivot into a column per grouping.
Id
Date
StartTime
EndTime
JobId
1
2021-08-23
08:30:00
19:00:00
1
2
2021-08-24
08:30:00
19:00:00
1
3
2021-08-24
12:30:00
14:30:00
2
4
2021-08-24
15:30:00
19:00:00
1
5
2021-08-25
08:30:00
19:00:00
1
6
2021-08-25
12:30:00
14:30:00
2
7
2021-08-25
15:45:00
19:00:00
1
8
2021-08-26
08:30:00
09:30:00
1
9
2021-08-26
15:30:00
19:00:00
1
10
2021-08-26
10:30:00
11:00:00
1
11
2021-08-26
12:00:00
14:30:00
1
12
2021-08-27
08:30:00
09:30:00
1
13
2021-08-27
11:00:00
11:15:00
1
14
2021-08-27
11:30:00
14:30:00
1
15
2021-08-28
08:30:00
09:30:00
1
Using the above sample data you can see 3 groupings that can form such a continuous range.
Range 1 consists of Id's, 1,2 & 5 - 2021-08-23 to 2021-08-25, 08:30:00 to 19:00:00
Range 2 consists of Id's 3 & 6 - 2021-08-24 to 2021-08-25, 12:30:00 to 14:30:00
Range 3 consists of Id's 8, 12 & 15 - 2021-08-26 to 2021-08-28, 08:30:00 to 09:30:00
The end result should be:
JobId
StartDate
EndDate
StartTime
EndTime
Ids
1
2021-08-23
2021-08-25
08:30:00
19:00:00
1,2,5
2
2021-08-24
2021-08-25
12:30:00
14:30:00
3,6
1
2021-08-26
2021-08-28
08:30:00
09:30:00
8,12,15
MySQL 8.0.23

Assuming that JobId, `Date`, StartTime, EndTime is unique you may use:
SELECT JobId,
MIN(`Date`) StartDate,
MAX(`Date`) EndDate,
StartTime,
EndTime,
GROUP_CONCAT(Id) Ids
FROM test
GROUP BY JobId,
StartTime,
EndTime
HAVING COUNT(*) > 1
AND DATEDIFF(EndDate, StartDate) = COUNT(*) - 1
ORDER BY StartDate, StartTime
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=fce8590f72ac1d50cd9e89add3ed01e7

Related

Spark scala joining with subquery with limit

I need to join two tables on fake_id but table 2 contains more than one matching records for fake_id so I need to match with record where table2.end_time >= table1.event_time and table2.start_time <= table1.event_time
If there are more than one record in table 2 matching this condition, I need to only consider latest by updated_time
Here is what I tried.
spark.sql("select t1.fake_id, t1.attribute_1,t1.event_time,t22.end_time from table1 t1 left outer join (
select fake_id, end_time from table2 t2 where t2.fake_id=t1.fake_id and t2.end_time >= t1.event_time and t2.start_time <= t1.event_time order by t2.updated_time desc limit 1)
as t22 on t1.fake_id=t22.fake_id")
For above statement spark throwing me error for unknown column t1.fake_id
Table.1 -
---------------------------------------------------------------------------
fake_id attribute_1 event_time
---------------------------------------------------------------------------
1 attr_val_11 2020-08-01 05:00:00
2 attr_val_12 2020-08-01 15:00:00
3 attr_val_31 2020-08-03 07:00:00
4 attr_val_41 2020-08-01 05:00:00
Table.2 -
---------------------------------------------------------------------------
fake_id start_time end_time updated_time
---------------------------------------------------------------------------
1 2020-08-01 02:00:00 2020-08-01 08:00:00 2020-08-01 00:00:00
2 2020-08-01 04:00:00 2020-08-01 23:00:00 2020-08-01 00:00:00
3 2020-08-03 02:00:00 2020-08-03 08:00:00 2020-08-03 08:00:00
3 2020-08-03 05:00:00 2020-08-03 10:00:00 2020-08-03 12:00:00
3 2020-08-04 05:00:00 2020-08-04 10:00:00 2020-08-04 12:00:00
4 2020-08-01 08:00:00 2020-08-01 18:00:00 2020-08-01 18:00:00
4 2020-08-01 02:00:00 2020-08-01 05:00:00 2020-08-01 22:00:00
Result :
----------------------------------------------------------------------------------------------
fake_id attribute_1 event_time start_time end_time
----------------------------------------------------------------------------------------------
1 attr_val_11 2020-08-01 05:00:00 2020-08-01 02:00:00 2020-08-01 08:00:00
2 attr_val_12 2020-08-01 15:00:00 2020-08-01 04:00:00 2020-08-01 23:00:00
3 attr_val_31 2020-08-03 07:00:00 2020-08-03 05:00:00 2020-08-03 10:00:00
4 attr_val_41 2020-08-01 05:00:00 2020-08-01 02:00:00 2020-08-01 05:00:00
Use the between and get the row_number, sort and take the maximum update time.
spark.sql('''
select
fake_id,
attribute_1,
event_time,
start_time,
end_time
from (
select
t1.fake_id,
t1.attribute_1,
t1.event_time,
t2.start_time,
t2.end_time,
row_number() OVER (PARTITION BY t1.fake_id, t1.attribute_1 ORDER BY t2.updated_time DESC) as rank
from
table1 t1
left join
table2 t2
on
t1.fake_id = t2.fake_id and
t1.event_time between t2.start_time and t2.end_time) t
where
rank = 1
order by
fake_id
''').show()
+-------+-----------+-------------------+-------------------+-------------------+
|fake_id|attribute_1| event_time| start_time| end_time|
+-------+-----------+-------------------+-------------------+-------------------+
| 1|attr_val_11|2020-08-01 05:00:00|2020-08-01 02:00:00|2020-08-01 08:00:00|
| 2|attr_val_12|2020-08-01 15:00:00|2020-08-01 04:00:00|2020-08-01 23:00:00|
| 3|attr_val_31|2020-08-03 07:00:00|2020-08-03 05:00:00|2020-08-03 10:00:00|
| 4|attr_val_41|2020-08-01 05:00:00|2020-08-01 02:00:00|2020-08-01 05:00:00|
+-------+-----------+-------------------+-------------------+-------------------+

Match between two row data

I want to select match count from result where the match count is exact int on a date.
date id_event id_timewindows max_hits
2014-12-16 1 1,2,3 2
2014-12-16 2 2,3,4 2
2014-12-16 3 4 2
2014-12-16 4 5,6 2
2014-12-16 5 7,8 2
2014-12-16 6 9 2
The result what i want is:
date id_event id_timewindows max_hits
2014-12-16 1 2,3 2
2014-12-16 2 2,3 2
Have anybody idea, how to do it in MySQL?
UPDATE:
So i have to explain more. The id_timewindows is not a string attribute, the first one is a result of a view which grouped by id_events and one id_event has multiple id_timewindow.
View result before grouped:
date id_event id_timewindow begin end max_rooms
2014-12-16 1 1 06:00:00 07:00:00 2
2014-12-16 1 2 07:00:00 08:00:00 2
2014-12-16 1 3 08:00:00 09:00:00 2
2014-12-16 2 2 07:00:00 08:00:00 2
2014-12-16 2 3 08:00:00 09:00:00 2
2014-12-16 2 4 09:00:00 10:00:00 2
2014-12-16 3 4 09:00:00 10:00:00 2
2014-12-16 4 6 11:00:00 12:00:00 2
2014-12-16 4 5 10:00:00 11:00:00 2
2014-12-16 5 7 12:00:00 13:00:00 2
2014-12-16 5 8 13:00:00 14:00:00 2
2014-12-16 6 9 14:00:00 15:00:00 2
I use GROUP BY id_event and the id_timewindows is group_concat(id_timewindow SEPARATOR ',')
I found a solution:
SELECT
date,
id_timewindow,
max_rooms,
COUNT(concat(date, id_timewindow)) as counter
FROM `timewindows_reserved`
GROUP BY
date, id_timewindow
HAVING counter < max_rooms
That will result what inverse what i want and i can use it.
date id_timewindow max_hits
2014-12-16 1 2
2014-12-16 4 2
2014-12-16 5 2
2014-12-16 6 2
2014-12-16 7 2
2014-12-16 8 2
2014-12-16 9 2
If I group by date and make a LIST from the id_timewindow then i can recevie the same result as what I wanted, but in inverse logic. Not a reserved timewindows rather the free timewindows. If I reverse this then I can get the result:
SELECT
date,
id_event,
GROUP_CONCAT(id_timewindow SEPARATOR ','),
max_rooms,
COUNT(concat(date, id_timewindow)) as counter
FROM `table`
GROUP BY
date, id_timewindow
HAVING counter >= max_rooms

SQL return last 12 weeks data, grouped by week, starting last Monday

I've been working on a MySQL query that sorts data into weeks but I just can't figure out how to do it.
I would like to sort the data into weeks for the current and last 11 weeks. Each week will run from Monday 00:00:00 to Sunday 23:59:59.
(Taking todays date as 2014-12-04)...
Week 1: 2014-12-01 > 2014-12-07 - (Last Monday 00:00:00 to next Sunday 23:59:59)
Week 2: 2014-11-24 > 2014-11-30 - (Monday before last 00:00:00 to last Sunday 23:59:59)
Week 3: 2014-11-17 > 2014-11-23 - (Monday before before last 00:00:00 to last last Sunday 23:59:59)
And so on...
For each week the value field data will be totalled.
I need the data returned to be in the format:
datetime: The first date (Always a Monday) of that week.
value: The total of all the values in that week.
For example, the returned data:
Week 1: 2014-12-01 : Totalled value=11
Week 2: 2014-11-24 : Totalled value=3
Week 3: 2014-11-17 : Totalled value=9
Week 4: 2014-11-10 : Totalled value=7
Table_1 data:
table1id datetime value
1 2014-09-01 06:00:00 4
2 2014-09-04 17:00:00 6
3 2014-09-09 18:00:00 9
4 2014-09-15 07:00:00 4
5 2014-09-20 10:00:00 2
6 2014-09-25 10:00:00 3
7 2014-09-30 09:00:00 8
8 2014-10-01 14:00:00 5
9 2014-10-05 10:00:00 7
10 2014-10-09 18:00:00 3
11 2014-10-15 05:00:00 4
12 2014-10-20 07:00:00 8
13 2014-10-24 16:00:00 9
14 2014-10-29 15:00:00 5
15 2014-10-31 16:00:00 7
16 2014-11-05 09:00:00 2
17 2014-11-10 08:00:00 4
18 2014-11-15 16:00:00 3
19 2014-11-20 10:00:00 9
20 2014-11-25 10:00:00 2
21 2014-11-30 10:00:00 1
22 2014-12-01 15:00:00 7
23 2014-12-04 18:00:00 2
I 'could' just pull all the data unsorted for the date range using PHP and sort it from there but I'd rather the MySQL server do it.
Any suggestions would be greatly appreciated. :-)
based on generate days from date range
you can do smething like that:
select mondays.week, mondays.day, sum(value)
from
(select a.a+1 week, curdate() - WEEKDAY(curdate()) - INTERVAL (7*a.a) DAY as day from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 union all select 11) as a) as mondays,
Table_1
where Table_1.datetime between mondays.day and (mondays.day + interval(7) day)
group by mondays.week, mondays.day;

Get 30 minutes interval data between start and end time in mysql

I have table structure in mysql,
table_id no_people booking_date bookingend_time bookingstart_time
14 2 2014-10-31 2014-10-31 13:30:00 2014-10-31 11:00:00
5 4 2014-10-31 2014-10-31 16:30:00 2014-10-31 14:30:00
6 2 2014-10-31 2014-10-31 17:00:00 2014-10-31 16:00:00
2 4 2014-11-06 2014-11-06 12:30:00 2014-11-06 10:00:00
2 4 2014-10-31 2014-10-31 16:00:00 2014-10-31 14:00:00
3 4 2014-11-01 2014-11-01 09:00:00 2014-11-01 07:30:00
6 2 2014-11-01 2014-11-01 10:00:00 2014-11-01 07:30:00
2 4 2014-11-03 2014-11-03 10:30:00 2014-11-03 08:30:00
5 4 2014-11-04 2014-11-04 10:30:00 2014-11-04 08:30:00
3 4 2014-11-05 2014-11-05 09:30:00 2014-11-05 07:30:00
14 2 2014-11-05 2014-11-05 09:30:00 2014-11-05 07:30:00
I want to retrieve table_id data with 30 minutes of interval between start and end time.
Ex:
if i give booking start time 10:30 and end time 12:30 i should get 14 as row..
Similarly it should check all rows and return between two times ..
My query so far
SELECT `table_id` FROM `booking` WHERE bookingstart_time>='2014-10-31 10:30:00' AND bookingend_time<='2014-10-31 11:30:00'
Step 1: expand the input time frame by 30 minutes before and 30 minutes after. DATE_ADD() and DATE_SUB() can do that:
DATE_SUB(_input_start_date_here_, INTERVAL 30 MINUTE)
Step 2: rethink your problem in terms of start and end times. Here are the possible cases:
if the booking started during the (expanded) period, then you want this booking in your result
or if the booking started before the period, then you want this booking unless it also ended before the period
on the other hand, if the booking started after the period, then you do not want this booking
The first situation above could be expressed like this:
WHERE bookingstart_time >= DATE_SUB(_input_start_date_here_, INTERVAL 30 MINUTE)
AND bookingstart_time <= DATE_ADD(_input_end_date_here_, INTERVAL 30 MINUTE)
The second condition is left as an exercise. You can also rewrite the above with a more elegant BETWEEN operator.
SELECT restaurant_table FROM rest_restaurantbooking WHERE TIMESTAMPDIFF(SECOND, bookingstart_time, bookingend_time) > 1800.
FOR REFERENCE: HERE

MySQL Count Numbers Are Off

I am not sure why my numbers are drastically off from each other.
A query with no max id:
SELECT id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00') as date, COUNT(*) as count
FROM test_ips
WHERE id > 0
AND viewip != ""
GROUP BY HOUR(t_stamp)
ORDER BY t_stamp ASC;
I get:
1 2012-07-18 19:00:00 1313
106 2012-07-18 20:00:00 1567
107 2012-07-19 09:00:00 847
225 2012-07-19 10:00:00 5095
421 2012-07-19 11:00:00 205
423 2012-07-19 12:00:00 900
461 2012-07-19 13:00:00 619
490 2012-07-20 15:00:00 729
575 2012-07-20 16:00:00 1682
1060 2012-07-20 17:00:00 2063
2260 2012-07-20 18:00:00 1417
5859 2012-07-20 21:00:00 1303
7060 2012-07-20 22:00:00 1340
8280 2012-07-20 23:00:00 1211
9149 2012-07-21 00:00:00 1675
10418 2012-07-21 01:00:00 721
11127 2012-07-21 02:00:00 825
But if I add a max id:
AND id <= 8279
I get:
1 2012-07-18 19:00:00 1313
106 2012-07-18 20:00:00 1201
107 2012-07-19 09:00:00 118
225 2012-07-19 10:00:00 196
421 2012-07-19 11:00:00 2
423 2012-07-19 12:00:00 38
461 2012-07-19 13:00:00 20
490 2012-07-20 15:00:00 85
575 2012-07-20 16:00:00 483
1060 2012-07-20 17:00:00 1200
2260 2012-07-20 18:00:00 1200
5859 2012-07-20 21:00:00 1201
7060 2012-07-20 22:00:00 1220
The numbers are WAY off from each other. Something is goofy.
EDIT: Here is my table structure:
id t_stamp bID viewip unique
1 2012-07-18 19:22:20 5 192.168.1.1 1
2 2012-07-18 19:22:21 1 192.168.1.1 1
3 2012-07-18 19:22:22 5 192.168.1.1 0
4 2012-07-18 19:22:22 3 192.168.1.1 1
You are not grouping by ID and I think you intend to.
Try:
SELECT id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00') as date, COUNT(*) as count
FROM test_ips
WHERE id > 0
AND viewip != ""
GROUP BY id, DATE_FORMAT(t_stamp, '%Y-%m-%d %H:00:00')
ORDER BY t_stamp;
Your query is not consistent.
In your select statement you are displaying the full date.
But you are grouping your data by the hour. So your count statement is taking the count of all the data for each hour of the day.
As an example take your first result:
1 2012-07-18 19:00:00 1313
The count of 1313 contains the records for all of your dates (7/18, 7/19, 7/20, 7/21, 7/22, etc) that have an hour of 19:00.
But the way you have your query setup, it looks like it should be the count of all records for 2012-07-18 19:00:00.
So when you add AND id <= 8279" The dates of 7/21 and some of 7/20 or no longer being counted so your count values are now lower.
I'm guessing you are meaning to group by the date and hour and not just the hour.