MySQL group by with start and end times - mysql

I have a table called map_item_group in MySQL that looks like this example:
item_serial | group_code | start_date | end_date
===================================================
item1 | group1 | 2015-01-01 | 2016-01-01
item1 | group2 | 2016-02-01 | 2016-03-15
item2 | group1 | 2015-06-01 | 2016-06-30
item1 | group2 | 2016-05-18 | 2016-06-30
I want to create a MySQL view called group_info that looks like this:
group_code | start_date | end_date | items_string
=======================================================
group1 | 2015-01-01 | 2015-06-01 | item1
group1 | 2015-06-01 | 2016-01-01 | item1,item2
group1 | 2016-01-01 | 2016-06-30 | item2
group2 | 2016-02-01 | 2016-03-15 | item1
group2 | 2016-05-18 | 2016-06-30 | item1
In other words, I want one row for each group showing the items in that group over each time span.
Simply grouping by group_code, start_date and end_date (i.e. SELECT group_code, start_date, end_date, GROUP_CONCAT(item_serial) FROM map_item_group GROUP BY group_code, start_date, end_date) does not give the desired output.
I can imagine ways to do this with subqueries, but subqueries aren't allowed in MySQL views. I can create other views in place of subqueries as a workaround, but I'd rather avoid adding a bunch of extra views to my schema. What's the cleanest way to do this?

First I create a list of all dates (start + end) by group_code using UNION I called T1 but should choose a different name
Then use variables to asign a row_number to each date. Subquery T1 and T2
Then have to duplicate the code to join the result to itself and create ranges. Subquery R
You could simplify it making that a separated view.
Now I have the ranges, join back to the original table to see if the item belong to that range.
OUPUT
SQL Demo
SELECT R.`group_code`, R.`start_date`, R.`end_date`, GROUP_CONCAT(T.item_serial SEPARATOR ', ') items
FROM (
SELECT T1.`group_code`, T1.range_date as start_date, T2.range_date as end_date
FROM (
SELECT `group_code`, range_date,
#rn := IF( #grpCode = `group_code`, #rn + 1 , IF(#grpCode := `group_code`, 1, 1)) as rn
FROM (
SELECT `group_code`, `start_date` as range_date
FROM Table1
UNION
SELECT `group_code`, `end_date` as range_date
FROM Table1
ORDER BY 1, 2
) as T1,
(SELECT #rn := 0, #grpCode := '') r
) T1
JOIN (
SELECT `group_code`, range_date,
#rn := IF( #grpCode = `group_code`, #rn + 1 , IF(#grpCode := `group_code`, 1, 1)) as rn
FROM (
SELECT `group_code`, `start_date` as range_date
FROM Table1
UNION
SELECT `group_code`, `end_date` as range_date
FROM Table1
ORDER BY 1, 2
) as T1,
(SELECT #rn := 0, #grpCode := '') r
) T2
ON T1.rn = T2.rn -1
AND T1.group_code = T2.group_code
) R
JOIN Table1 T
ON R.start_date < T.end_date
AND R.end_date > T.start_date
AND R.group_code = T.group_code
GROUP BY R.`group_code`, R.`start_date`, R.`end_date`
ORDER BY 1,2, 4

Related

Get column after group by lastest date

Please help me to get table 2 from table 1
A simple summarisation can be achieved using GROUP BY
select
code1
, max(code2) as code2
, count(*) as times
, max(`date`) as max_dt
from table1
group by
code1
However the result is different to the image:
+-------+-------+-------+------------+
| code1 | code2 | times | max_dt |
+-------+-------+-------+------------+
| 1 | D | 4 | 2020-02-21 |
| 2 | NNNN | 2 | 2021-01-21 |
+-------+-------+-------+------------+
Note:
The maximum of code2 may not be what you expect it to be. e.g. "D" is after "BBBBB" as the data is alphabetical. not based on length of string.
maximum of date for code1 isn't 16/2
I would not recommend naming any column "date" as it is usually a reserved word and can cause difficulties when developing queries.
for MySQL prior to version 8 a technique to get "the most recent" row may be used as follows:
SELECT code1, code2, `date`
, (select count(*) from mytable t2 where d.code1 = t2.code1) as times
FROM (
SELECT
#row_num :=IF(#prev_value = t.code1, #row_num + 1, 1) AS rn
, t.code1
, t.code2
, t.`date`
, #prev_value := t.code1
FROM mytable t
CROSS JOIN (SELECT #row_num :=1, #prev_value :='') vars
ORDER BY
t.code1
, t.`date` DESC
) as d
WHERE rn = 1
;
Or, for MySQL version 8 or later it is possible to use a much simpler query as there are windowing functions available through the over() clause:
SELECT code1, code2, `date`, times
FROM (
SELECT
row_number() over(partition by t.code1
order by t.`date` DESC) AS rn
, t.code1
, t.code2
, t.`date`
, count(*) over(partition by t.code1) as times
FROM mytable t
) as d
WHERE rn = 1
;
The results of the 2nd and 3rd queries are the same:
+-------+-------+------------+-------+
| code1 | code2 | date | times |
+-------+-------+------------+-------+
| 1 | D | 2020-02-21 | 4 |
| 2 | NNNN | 2021-01-21 | 2 |
+-------+-------+------------+-------+
solution demonstrated at db<>fiddle here

MySQL - Count Rows between two values in a Column repeatedly

I have a table like so
id | status | data | date
----|---------|--------|-------
1 | START | a4c | Jan 1
2 | WORKING | 2w3 | Dec 29
3 | WORKING | 2d3 | Dec 29
4 | WORKING | 3ew | Dec 26
5 | WORKING | 5r5 | Dec 23
6 | START | 2q3 | Dec 22
7 | WORKING | 32w | Dec 20
8 | WORKING | 9k5 | Dec 10
and so on...
What I am trying to do, is to get the number of 'WORKING' rows between two 'START' i.e.
id | status | count | date
----|---------|--------|-------
1 | START | 4 | Jan 1
6 | START | 2 | Dec 22
and so on ...
I am using MySql 5.7.28.
Highly appreciate any help/suggestion!
date is unusable in the example, try using id as an ordering column instead
select id, status,
(select count(*)
from mytable t2
where t2.id > t.id and t2.status='WORKING'
and not exists (select 1
from mytable t3
where t3.id > t.id and t3.id < t2.id and status='START')
) count,
date
from mytable t
where status='START';
Fiddle
Assuming id is safe then you can do this by finding the next id for each block (and assigning some dummy values) then grouping by next id
drop table if exists t;
create table t
(id int,status varchar(20), data varchar(3),date varchar(10));
insert into t values
( 1 , 'START' , 'a4c' , 'Jan 1'),
( 2 , 'WORKING' , '2w3' , 'Dec 29'),
( 3 , 'WORKING' , '2d3' , 'Dec 29'),
( 4 , 'WORKING' , '3ew' , 'Dec 26'),
( 5 , 'WORKING' , '5r5' , 'Dec 23'),
( 6 , 'START' , '2q3' , 'Dec 22'),
( 7 , 'WORKING' , '32w' , 'Dec 20'),
( 8 , 'WORKING' , '9k5' , 'Dec 10');
SELECT MIN(ID) ID,
'START' STATUS,
SUM(CASE WHEN STATUS <> 'START' THEN 1 ELSE 0 END) AS OBS,
Max(DATE) DATE
FROM
(
select t.*,
CASE WHEN STATUS = 'START' THEN DATE ELSE '' END AS DT,
COALESCE(
(select t1.id from t t1 where t1.STATUS = 'START' and t1.id > t.id ORDER BY T1.ID limit 1)
,99999) NEXTID
from t
) S
GROUP BY NEXTID;
+------+--------+------+--------+
| ID | STATUS | OBS | DATE |
+------+--------+------+--------+
| 1 | START | 4 | Jan 1 |
| 6 | START | 2 | Dec 22 |
+------+--------+------+--------+
2 rows in set (0.00 sec)
This is a form of gaps-and-islands problem -- which is simpler in MySQL 8+ using window functions.
In older versions, probably the most efficient method is to accumulate a count of starts to define groupings for the rows. You can do this using variables and then aggregate:
select min(id) as id, 'START' as status, sum(status = 'WORKING') as num_working, max(date) as date
from (select t.*, (#s := #s + (t.status = 'START')) as grp
from (select t.* from t order by id asc) t cross join
(select #s := 0) params
) t
group by grp
order by min(id);
Here is a db<>fiddle.
SELECT id, status, `count`, `date`
FROM ( SELECT #count `count`,
id,
status,
`date`,
#count:=(#status=status)*#count+1,
#status:=status
FROM test,
( SELECT #count:=0, #status:='' ) init_vars
ORDER BY id DESC
) calculations
WHERE status='START'
ORDER BY id
> Since I am still in design/development I can move to MySQL 8 if that makes it easier for this logic? Any idea how this could be done with Windows functions? – N0000B
WITH cte AS ( SELECT id,
status,
`date`,
SUM(status='WORKING') OVER (ORDER BY id DESC) workings
FROM test
ORDER BY id )
SELECT id,
status,
workings - COALESCE(LEAD(workings) OVER (ORDER BY id), 0) `count`,
`date`
FROM cte
WHERE status='START'
ORDER BY id
fiddle

MySQL - Group By Contigous Blocks

I am struggling to make a GROUP BY contiguous blocks, I've used the following two for references:
- GROUP BY for continuous rows in SQL
- How can I do a contiguous group by in MySQL?
- https://gcbenison.wordpress.com/2011/09/26/queries-that-group-tables-by-contiguous-blocks/
The primary idea that I am trying to encapsulate periods with a start and end date of a given state. A complexity unlike other examples is that I'm using a date per room_id as the indexing field (rather than a sequential id).
My table:
room_id | calendar_date | state
Sample data:
1 | 2016-03-01 | 'a'
1 | 2016-03-02 | 'a'
1 | 2016-03-03 | 'a'
1 | 2016-03-04 | 'b'
1 | 2016-03-05 | 'b'
1 | 2016-03-06 | 'c'
1 | 2016-03-07 | 'c'
1 | 2016-03-08 | 'c'
1 | 2016-03-09 | 'c'
2 | 2016-04-01 | 'b'
2 | 2016-04-02 | 'a'
2 | 2016-04-03 | 'a'
2 | 2016-04-04 | 'a'
The objective:
room_id | date_start | date_end | state
1 | 2016-03-01 | 2016-03-03 | a
1 | 2016-03-04 | 2016-03-05 | b
1 | 2016-03-06 | 2016-03-09 | c
2 | 2016-04-01 | 2016-04-01 | b
2 | 2016-04-02 | 2016-04-04 | c
The two attempts I've made at this:
1)
SELECT
rooms.row_new,
rooms.state_new,
MIN(rooms.room_id) AS room_id,
MIN(rooms.state) AS state,
MIN(rooms.date) AS date_start,
MAX(rooms.date) AS date_end,
FROM
(
SELECT #r := #r + (#state != state) AS row_new,
#state := state AS state_new,
rooms.*
FROM (
SELECT #r := 0,
#state := ''
) AS vars,
rooms_vw
ORDER BY room_id, date
) AS rooms
WHERE room_id = 1
GROUP BY row_new
ORDER BY room_id, date
;
This is very close to working, but when I print out row_new it starts to jump (1, 2, 3, 5, 7, ...)
2)
SELECT
MIN(rooms_final.calendar_date) AS date_start,
MAX(rooms_final.calendar_date) AS date_end,
rooms_final.state,
rooms_final.room_id,
COUNT(*)
FROM (SELECT
rooms.date,
rooms.state,
rooms.room_id,
CASE
WHEN rooms_merge.state IS NULL OR rooms_merge.state != rooms.state THEN
#rownum := #rownum+1
ELSE
#rownum
END AS row_num
FROM rooms
JOIN (SELECT #rownum := 0) AS row
LEFT JOIN (SELECT rooms.date + INTERVAL 1 DAY AS date,
rooms.state,
rooms.room_id
FROM rooms) AS rooms_merge ON rooms_merge.calendar_date = rooms.calendar_date AND rooms_merge.room_id = rooms.room_id
ORDER BY rooms.room_id, rooms.calendar_date
) AS rooms_final
GROUP BY rooms_final.state, rooms_final.row_num
ORDER BY room_id, calendar_date;
For some reason this is returning some null room_id's results as well as generally inaccurate.
Working with variables is a bit tricky. I would go for:
SELECT r.state_new, MIN(r.room_id) AS room_id, MIN(r.state) AS state,
MIN(r.date) AS date_start, MAX(r.date) AS date_end
FROM (SELECT r.*,
(#grp := if(#rs = concat_ws(':', room, state), #grp,
if(#rs := concat_ws(':', room, state), #grp + 1, #grp + 1)
)
) as grp
FROM (SELECT r.* FROM rooms_vw r ORDER BY ORDER BY room_id, date
) r CROSS JOIN
(SELECT #grp := 0, #rs := '') AS params
) AS rooms
WHERE room_id = 1
GROUP BY room_id, grp
ORDER BY room_id, date;
Notes:
Assigning a variable in one expression and using it in another is unsafe. MySQL does not guarantee the order of evaluation of expressions.
In more recent versions of MySQL, you need to do the ORDER BY in a subquery.
In the most recent versions, you can use row_number(), greatly simplifying the calculation.
Thanks to #Gordon Linoff for giving me insights to get to this answer:
SELECT
MIN(room_id) AS room_id,
MIN(state) AS state,
MIN(date) AS date_start,
MAX(date) AS date_end
FROM
(
SELECT
#r := #r + IF(#state <> state OR #room_id <> room_id, 1, 0) AS row_new,
#state := state AS state_new,
#room_id := room_id AS room_id_new,
tmp_rooms.*
FROM (
SELECT #r := 0,
#room_id := 0,
#state := ''
) AS vars,
(SELECT * FROM rooms WHERE room_id IS NOT NULL ORDER BY room_id, date) tmp_rooms
) AS rooms
GROUP BY row_new
order by room_id, date
;

MySQL select rows based on previous and next rows

Suppose I have a table looking like:
mytable
category | begintime
---------|----------------------
cat1 | 2016-09-25 15:00:00
cat2 | 2016-09-25 16:00:00
cat1 | 2016-09-25 17:30:00
cat3 | 2016-09-25 19:00:00
cat1 | 2016-09-25 20:00:00
: :
Note that it doesn't have an ID-number, the begintime column is my primary key.
In the end, I would like to select all rows that are surrounded by a certain category, that is, to select all rows such that the category from the previous row is #catBefore and the category from the next row is #catAfter.
For example, what I would like is something like:
SELECT * FROM mytable WHERE previousRow.category = 'cat1' AND nextRow.category = 'cat1'
resulting in
SELECT * FROM ...
category | begintime
---------|----------------------
cat2 | 2016-09-25 16:00:00
cat3 | 2016-09-25 19:00:00
: :
The previousRow and nextRow in this, don't seem to be definable.
Idea
I have tried some things, but nothing has worked out yet. One of my ideas was to first select the previous and next category as new columns, so something like:
SELECT mytable.*,
previousRow.category AS prevCat,
nextRow.category AS nextCat
FROM mytable, [stuff-I-don't-know]
resulting in
SELECT mytable.* ...
category | begintime | prevCat | nextCat
---------|----------------------|---------|---------
cat1 | 2016-09-25 15:00:00 | null | cat2
cat2 | 2016-09-25 16:00:00 | cat1 | cat1
cat1 | 2016-09-25 17:30:00 | cat2 | cat3
cat3 | 2016-09-25 19:00:00 | cat1 | cat1
cat1 | 2016-09-25 20:00:00 | cat3 | ...
: :
and then filtering using a WHERE clause.
Is this idea possible, or could it be done in some other way?
One method uses correlated subqueries. This should be okay performance-wise, if you do indeed have primary key declarations:
select t.*
from (select t.*,
(select t2.category
from mytable t2
where t2.begintime < t.begintime
order by begintime desc
limit 1
) as prev_category,
(select t2.category
from mytable t2
where t2.begintime > t.begintime
order by begintime asc
limit 1
) as next_category
from mytable t
) t
where prev_category = #cat1 and next_category = #cat2;
EDIT:
You can do this with variables:
select t.*
from (select t.*,
(#pn := (case when (#pcy := #pn) = NULL then -1 -- never gets here
when (#pn := category) = NULL then -1 -- never gets here
else #pcy
end)
) as next_category
from (select t.*,
(#pc := (case when (#pcx := #pc) = NULL then -1 -- never gets here
when (#pc := category) = NULL then -1 -- never gets here
else #pcx
end)
) as prev_category
from t cross join
(select #pc := '') params
order by t.begintime
) t cross join
(select #pn := '') params
order by t.begintime desc
) t
where prev_category = #cat1 and next_category = #cat2;
I did this one... it seems to work, but it seems too slow actually... =\ I'll try to improve it. Please, check if it works for you:
SELECT
t1.category,
t1.begintime
FROM myTable t1 -- current
INNER JOIN myTable t2 ON 1=1 -- prev
AND t2.begintime < t1.begintime
INNER JOIN myTable t3 ON 1=1 -- next
AND t3.begintime > t1.begintime
LEFT JOIN myTable t4 ON 1=1 -- between current and prev
AND t4.begintime < t1.begintime
AND t4.begintime > t2.begintime
LEFT JOIN myTable t5 ON 1=1 -- between current and next
AND t5.begintime > t1.begintime
AND t5.begintime < t3.begintime
WHERE 1=1
AND t2.category = 'cat1' -- prev cat
AND t3.category = 'cat1' -- next cat
AND t4.begintime IS NULL -- nothing between current and prev
AND t5.begintime IS NULL -- nothing between current and next
;

Get first date from timestamp in SQL

I have in my Moodle db table for every session sessid and timestart. The table looks like this:
+----+--------+------------+
| id | sessid | timestart |
+----+--------+------------+
| 1 | 3 | 1456819200 |
| 2 | 3 | 1465887600 |
| 3 | 3 | 1459839600 |
| 4 | 2 | 1457940600 |
| 5 | 2 | 1460529000 |
+----+--------+------------+
How to get for every session the first date from the timestamps in SQL?
You can easy use this:
select sessid,min(timestart) FROM mytable GROUP by sessid;
And for your second question, something like this:
SELECT
my.id,
my.sessid,
IF(my.timestart = m.timestart, 'yes', 'NO' ) AS First,
my.timestart
FROM mytable my
LEFT JOIN
(
SELECT sessid,min(timestart) AS timestart FROM mytable GROUP BY sessid
) AS m ON m.sessid = my.sessid;
Try this.
SELECT
*
FROM
tbl
WHERE
(sessid, timestart) IN (
SELECT tbl2.sessid, MIN(tbl2.timestart)
FROM tbl tbl2
WHERE tbl.sessid = tbl2.sessid
);
Query
select sessid, min(timestart) as timestart
from your_table_name
group by sessid;
Just an other perspective if you need even the id.
select t.id, t.sessid, t.timestart from
(
select id, sessid, timestart,
(
case sessid when #curA
then #curRow := #curRow + 1
else #curRow := 1 and #curA := sessid end
) as rn
from your_table_name t,
(select #curRow := 0, #curA := '') r
order by sessid,id
)t
where t.rn = 1;