MySQL - Group By Contigous Blocks - mysql

I am struggling to make a GROUP BY contiguous blocks, I've used the following two for references:
- GROUP BY for continuous rows in SQL
- How can I do a contiguous group by in MySQL?
- https://gcbenison.wordpress.com/2011/09/26/queries-that-group-tables-by-contiguous-blocks/
The primary idea that I am trying to encapsulate periods with a start and end date of a given state. A complexity unlike other examples is that I'm using a date per room_id as the indexing field (rather than a sequential id).
My table:
room_id | calendar_date | state
Sample data:
1 | 2016-03-01 | 'a'
1 | 2016-03-02 | 'a'
1 | 2016-03-03 | 'a'
1 | 2016-03-04 | 'b'
1 | 2016-03-05 | 'b'
1 | 2016-03-06 | 'c'
1 | 2016-03-07 | 'c'
1 | 2016-03-08 | 'c'
1 | 2016-03-09 | 'c'
2 | 2016-04-01 | 'b'
2 | 2016-04-02 | 'a'
2 | 2016-04-03 | 'a'
2 | 2016-04-04 | 'a'
The objective:
room_id | date_start | date_end | state
1 | 2016-03-01 | 2016-03-03 | a
1 | 2016-03-04 | 2016-03-05 | b
1 | 2016-03-06 | 2016-03-09 | c
2 | 2016-04-01 | 2016-04-01 | b
2 | 2016-04-02 | 2016-04-04 | c
The two attempts I've made at this:
1)
SELECT
rooms.row_new,
rooms.state_new,
MIN(rooms.room_id) AS room_id,
MIN(rooms.state) AS state,
MIN(rooms.date) AS date_start,
MAX(rooms.date) AS date_end,
FROM
(
SELECT #r := #r + (#state != state) AS row_new,
#state := state AS state_new,
rooms.*
FROM (
SELECT #r := 0,
#state := ''
) AS vars,
rooms_vw
ORDER BY room_id, date
) AS rooms
WHERE room_id = 1
GROUP BY row_new
ORDER BY room_id, date
;
This is very close to working, but when I print out row_new it starts to jump (1, 2, 3, 5, 7, ...)
2)
SELECT
MIN(rooms_final.calendar_date) AS date_start,
MAX(rooms_final.calendar_date) AS date_end,
rooms_final.state,
rooms_final.room_id,
COUNT(*)
FROM (SELECT
rooms.date,
rooms.state,
rooms.room_id,
CASE
WHEN rooms_merge.state IS NULL OR rooms_merge.state != rooms.state THEN
#rownum := #rownum+1
ELSE
#rownum
END AS row_num
FROM rooms
JOIN (SELECT #rownum := 0) AS row
LEFT JOIN (SELECT rooms.date + INTERVAL 1 DAY AS date,
rooms.state,
rooms.room_id
FROM rooms) AS rooms_merge ON rooms_merge.calendar_date = rooms.calendar_date AND rooms_merge.room_id = rooms.room_id
ORDER BY rooms.room_id, rooms.calendar_date
) AS rooms_final
GROUP BY rooms_final.state, rooms_final.row_num
ORDER BY room_id, calendar_date;
For some reason this is returning some null room_id's results as well as generally inaccurate.

Working with variables is a bit tricky. I would go for:
SELECT r.state_new, MIN(r.room_id) AS room_id, MIN(r.state) AS state,
MIN(r.date) AS date_start, MAX(r.date) AS date_end
FROM (SELECT r.*,
(#grp := if(#rs = concat_ws(':', room, state), #grp,
if(#rs := concat_ws(':', room, state), #grp + 1, #grp + 1)
)
) as grp
FROM (SELECT r.* FROM rooms_vw r ORDER BY ORDER BY room_id, date
) r CROSS JOIN
(SELECT #grp := 0, #rs := '') AS params
) AS rooms
WHERE room_id = 1
GROUP BY room_id, grp
ORDER BY room_id, date;
Notes:
Assigning a variable in one expression and using it in another is unsafe. MySQL does not guarantee the order of evaluation of expressions.
In more recent versions of MySQL, you need to do the ORDER BY in a subquery.
In the most recent versions, you can use row_number(), greatly simplifying the calculation.

Thanks to #Gordon Linoff for giving me insights to get to this answer:
SELECT
MIN(room_id) AS room_id,
MIN(state) AS state,
MIN(date) AS date_start,
MAX(date) AS date_end
FROM
(
SELECT
#r := #r + IF(#state <> state OR #room_id <> room_id, 1, 0) AS row_new,
#state := state AS state_new,
#room_id := room_id AS room_id_new,
tmp_rooms.*
FROM (
SELECT #r := 0,
#room_id := 0,
#state := ''
) AS vars,
(SELECT * FROM rooms WHERE room_id IS NOT NULL ORDER BY room_id, date) tmp_rooms
) AS rooms
GROUP BY row_new
order by room_id, date
;

Related

MySQL group by with start and end times

I have a table called map_item_group in MySQL that looks like this example:
item_serial | group_code | start_date | end_date
===================================================
item1 | group1 | 2015-01-01 | 2016-01-01
item1 | group2 | 2016-02-01 | 2016-03-15
item2 | group1 | 2015-06-01 | 2016-06-30
item1 | group2 | 2016-05-18 | 2016-06-30
I want to create a MySQL view called group_info that looks like this:
group_code | start_date | end_date | items_string
=======================================================
group1 | 2015-01-01 | 2015-06-01 | item1
group1 | 2015-06-01 | 2016-01-01 | item1,item2
group1 | 2016-01-01 | 2016-06-30 | item2
group2 | 2016-02-01 | 2016-03-15 | item1
group2 | 2016-05-18 | 2016-06-30 | item1
In other words, I want one row for each group showing the items in that group over each time span.
Simply grouping by group_code, start_date and end_date (i.e. SELECT group_code, start_date, end_date, GROUP_CONCAT(item_serial) FROM map_item_group GROUP BY group_code, start_date, end_date) does not give the desired output.
I can imagine ways to do this with subqueries, but subqueries aren't allowed in MySQL views. I can create other views in place of subqueries as a workaround, but I'd rather avoid adding a bunch of extra views to my schema. What's the cleanest way to do this?
First I create a list of all dates (start + end) by group_code using UNION I called T1 but should choose a different name
Then use variables to asign a row_number to each date. Subquery T1 and T2
Then have to duplicate the code to join the result to itself and create ranges. Subquery R
You could simplify it making that a separated view.
Now I have the ranges, join back to the original table to see if the item belong to that range.
OUPUT
SQL Demo
SELECT R.`group_code`, R.`start_date`, R.`end_date`, GROUP_CONCAT(T.item_serial SEPARATOR ', ') items
FROM (
SELECT T1.`group_code`, T1.range_date as start_date, T2.range_date as end_date
FROM (
SELECT `group_code`, range_date,
#rn := IF( #grpCode = `group_code`, #rn + 1 , IF(#grpCode := `group_code`, 1, 1)) as rn
FROM (
SELECT `group_code`, `start_date` as range_date
FROM Table1
UNION
SELECT `group_code`, `end_date` as range_date
FROM Table1
ORDER BY 1, 2
) as T1,
(SELECT #rn := 0, #grpCode := '') r
) T1
JOIN (
SELECT `group_code`, range_date,
#rn := IF( #grpCode = `group_code`, #rn + 1 , IF(#grpCode := `group_code`, 1, 1)) as rn
FROM (
SELECT `group_code`, `start_date` as range_date
FROM Table1
UNION
SELECT `group_code`, `end_date` as range_date
FROM Table1
ORDER BY 1, 2
) as T1,
(SELECT #rn := 0, #grpCode := '') r
) T2
ON T1.rn = T2.rn -1
AND T1.group_code = T2.group_code
) R
JOIN Table1 T
ON R.start_date < T.end_date
AND R.end_date > T.start_date
AND R.group_code = T.group_code
GROUP BY R.`group_code`, R.`start_date`, R.`end_date`
ORDER BY 1,2, 4

How can I grouping an unix time per day?

I have a table like this:
// requests
+----+----------+-------------+
| id | id_user | unix_time |
+----+----------+-------------+
| 1 | 2353 | 1339412843 |
| 2 | 2353 | 1339412864 |
| 3 | 5462 | 1339412894 |
| 4 | 3422 | 1339412899 |
| 5 | 3422 | 1339412906 |
| 6 | 2353 | 1339412906 |
| 7 | 7785 | 1339412951 |
| 8 | 2353 | 1339413640 |
| 9 | 5462 | 1339413621 |
| 10 | 5462 | 1339414490 |
| 11 | 2353 | 1339414923 |
| 12 | 2353 | 1339419901 |
| 13 | 8007 | 1339424860 |
| 14 | 7785 | 1339424822 |
| 15 | 2353 | 1339424902 |
+----+----------+-------------+
I want to grouping unix_time column based on separated days. Actually I'm trying to make this for an specific user:
As you see I need tow numbers for an user:
the number of all days which there is a foot print of the user into requests table
the number of biggest consecutive days
How can I do that?
Actually I can use WHERE id_user = :id to select user's rows. And I can calculate the number of days by SUM(). And by using MAX() I can calculate the biggest consecutive range. Just I need to grouping those unix times.
Please give it a try:
SELECT
t.id_user,
COUNT(*) totalVisits,
MAX(t.max_cons) maxCons
FROM
(SELECT
id_user,
#lastUnixTime AS lastUnixTimeOfuser,
IF(#uid <> id_user, #currentMax := 1 , #currentMax),
IF(#uid <> id_user, #lastUnixTime := 0, #lastUnixTime := #lastUnixTimeOfLastRecord),
IF(#uid = id_user,
IF((#lastUnixTime + 86400) >= utime, #currentMax := #currentMax + 1, #currentMax := 1), #lastUnixTime := 0),
IF(#currentMax > #max, #max := #currentMax, #max ),
IF(#uid <> id_user , #max := 1 ,#max),
#uid := id_user,
#lastUnixTimeOfLastRecord := utime,
#max AS max_cons
FROM
(
SELECT
id_user,
(unix_time DIV 86400) * 86400 AS utime
FROM requests
GROUP BY id_user, utime ) dayWiseRequestTable ,
(
SELECT
#uid := 0,
#currentMax := 0,
#max := 0,
#lastUnixTime := 0,
#lastUnixTimeOfLastRecord := 0
) vars
ORDER BY id_user, utime) t
GROUP BY t.id_user;
SQL FIDDLE DEMO
Output:
The final output looks like below:
id_user Total_Visits Maximum_Consecutive_Visits
2353 7 2
3422 2 2
5462 3 2
7785 2 1
8007 1 1
EDIT:
In order to get output for a specific user you need to add a WHERE clause in the inner query.
Please check this SQL FIDDLE
Use can extract the day using from_unixtime(). Then you can get count the days using variables:
select id_user, d,
(#rn := if(#di = concat_ws(':', d - interval 1 day, id_user), #rn + 1,
if(#di := concat_ws(':', d, id_user), 1, 1)
)
) as rn
from (select id_user, date(from_unixtime(unix_time)) as d
from t
group by id_user, d
) cross join
(select #di := '', #rn := 0) params
order by id_user, d;
From here to the summary is just an aggregation:
select id_user, count(*) as numdays, max(rn) as maxconsecutive
from (select id_user, d,
(#rn := if(#di = concat_ws(':', d - interval 1 day, id_user), #rn + 1,
if(#di := concat_ws(':', d, id_user), 1, 1)
)
) as rn
from (select id_user, date(from_unixtime(unix_time)) as d
from t
group by id_user, d
) cross join
(select #di := '', #rn := 0) params
order by id_user, d
) ud
group by id_user;
Here is a SQL Fiddle illustrating the code.

Get first date from timestamp in SQL

I have in my Moodle db table for every session sessid and timestart. The table looks like this:
+----+--------+------------+
| id | sessid | timestart |
+----+--------+------------+
| 1 | 3 | 1456819200 |
| 2 | 3 | 1465887600 |
| 3 | 3 | 1459839600 |
| 4 | 2 | 1457940600 |
| 5 | 2 | 1460529000 |
+----+--------+------------+
How to get for every session the first date from the timestamps in SQL?
You can easy use this:
select sessid,min(timestart) FROM mytable GROUP by sessid;
And for your second question, something like this:
SELECT
my.id,
my.sessid,
IF(my.timestart = m.timestart, 'yes', 'NO' ) AS First,
my.timestart
FROM mytable my
LEFT JOIN
(
SELECT sessid,min(timestart) AS timestart FROM mytable GROUP BY sessid
) AS m ON m.sessid = my.sessid;
Try this.
SELECT
*
FROM
tbl
WHERE
(sessid, timestart) IN (
SELECT tbl2.sessid, MIN(tbl2.timestart)
FROM tbl tbl2
WHERE tbl.sessid = tbl2.sessid
);
Query
select sessid, min(timestart) as timestart
from your_table_name
group by sessid;
Just an other perspective if you need even the id.
select t.id, t.sessid, t.timestart from
(
select id, sessid, timestart,
(
case sessid when #curA
then #curRow := #curRow + 1
else #curRow := 1 and #curA := sessid end
) as rn
from your_table_name t,
(select #curRow := 0, #curA := '') r
order by sessid,id
)t
where t.rn = 1;

Assign the same rank if multiple users have same count value

i have user_points Table with two columns.
select user_values, userid from user_points, based on count of userid i want to assgin the rank to users.. i have write this query
SELECT count_temp.* , #curRank:=(#curRank + 1) AS rank
FROM (
SELECT userid, COUNT(*) AS totalcount FROM user_points t GROUP BY t.userid
) AS count_temp
, (SELECT #curRank := 0) r
ORDER BY totalcount DESC;
gives the result as :
userid | totalcount | rank
2 6 1
3 2 2
4 2 3
1 1 4
but i want to assgin to rank 2 for userid 3 and 4 because their totalcount are same ..
To emulate RANK() function, which returns the rank of each row within the partition of a result set, you can do
SELECT userid, totalcount, rank
FROM
(
SELECT userid, totalcount,
#n := #n + 1, #r := IF(#c = totalcount, #r, #n) rank, #c := totalcount
FROM
(
SELECT userid, COUNT(*) AS totalcount
FROM user_points t
GROUP BY t.userid
ORDER BY totalcount DESC
) t CROSS JOIN
(
SELECT #r := 0, #n := 0, #c := NULL
) i
) q;
Output:
| USERID | TOTALCOUNT | RANK |
|--------|------------|------|
| 2 | 6 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 2 |
| 1 | 1 | 4 |
To emulate DENSE_RANK() function, which returns the rank of rows within the partition of a result set, without any gaps in the ranking, you can do
SELECT userid, totalcount, rank
FROM
(
SELECT userid, totalcount,
#r := IF(#c = totalcount, #r, #r + 1) rank, #c := totalcount
FROM
(
SELECT userid, COUNT(*) AS totalcount
FROM user_points t
GROUP BY t.userid
ORDER BY totalcount DESC
) t CROSS JOIN
(
SELECT #r := 0, #c := NULL
) i
) q;
Output:
| USERID | TOTALCOUNT | RANK |
|--------|------------|------|
| 2 | 6 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 2 |
| 1 | 1 | 3 |
Here is SQLFiddle demo for both queries

Rank project numbers in view, MYSQL

I created a view by the following statement.
CREATE VIEW
view_projectHour
AS
SELECT pno
, SUM( hours ) AS total_hours
FROM works_on
GROUP BY pno
ORDER BY total_hours DESC
Now, how can I implement ranking in this view? I want the projects to be ranked. The project with the highest hours must be ranked 1 and be placed on the top and so on. Also there are projects with the same hours.
Unfortunately MySQL lack support for analytic functions. Particularly RANK() and RANK_DENSE().
To emulate RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#n := #n + 1 rnum, #r := IF(#h = total_hours, #r, #n) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #n := 0, #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 4 |
To emulate DENSE_RANK() you can do
SELECT pno, total_hours, rank
FROM
(
SELECT pno, total_hours,
#r := IF(#h = total_hours, #r, #r + 1) rank, #h := total_hours
FROM
(
SELECT pno, SUM(hours) total_hours
FROM works_on
GROUP BY pno
) q CROSS JOIN (SELECT #r := 0, #h := NULL) i
ORDER BY total_hours DESC, pno
) t
Sample output:
| PNO | TOTAL_HOURS | RANK |
|-----|-------------|------|
| 3 | 61 | 1 |
| 1 | 40 | 2 |
| 2 | 40 | 2 |
| 4 | 10 | 3 |
Note: You can ditch outer SELECTs if you don't mind to have one or two extra columns in your resultset.
Here is SQLFiddle demo
An alternate solution is to use a JOIN to count how many values are ranked better for each row;
SELECT 1+COUNT(b.total_hours) rank, a.pno, a.total_hours
FROM test a
LEFT JOIN test b
ON a.total_hours < b.total_hours
GROUP BY a.pno, a.total_hours
ORDER BY total_hours DESC;
An SQLfiddle to test with.