I have the following table bellow.
The timeStamp is the moment that the status began.
There are some rows that don't add new information if status changed (like the second row) and they could be ignored.
I would to calculate (using mysql 5.7) the total amount of time for each status.
| timeStamp | status |
|------------------------------|
| 2019-12-10 14:00:00 | 1 |
| 2019-12-10 14:10:00 | 1 | // this row could be ignored
| 2019-12-10 14:00:00 | 2 | // more 24 hours in status 1
| 2019-12-11 14:10:00 | 2 |
| 2019-12-12 14:00:00 | 1 | // more 24 hours in status 2
| 2019-12-14 14:00:00 | 2 | // more 48 hours in status 1
| 2019-12-16 14:10:00 | 2 |
| 2019-12-17 14:20:00 | 2 |
| 2019-12-18 14:00:00 | 3 | // more 96 hours in status 2
| 2019-12-19 14:00:00 | 1 | // more 24 hours in status 3
I would like to see as result a table like bellow.
| status | amount_of_time |
|-------------------------|
| 1 | 72 hours |
| 2 | 120 hours |
| 3 | 24 hours |
What complicates this is that the status don't stay in order: is not 1, 2,3.
In the example above it is: 1, 2, 1, 2, 3, 1, so I can't use the MIN information.
Get the timestamp of the following row in a subquery and calculate the difference to the timestamp of the current row:
select t1.status, timestampdiff(second,
t1.timeStamp,
(
select min(t2.timeStamp)
from mytable t2
where t2.timeStamp > t1.timeStamp
)
) as diff
from mytable t1;
This will return:
| status | diff |
| ------ | ------ |
| 1 | 600 |
| 1 | 86400 |
| 2 | 600 |
| 2 | 85800 |
| 1 | 172800 |
| 2 | 173400 |
| 2 | 87000 |
| 2 | 85200 |
| 3 | 86400 |
| 1 | NULL |
View on DB Fiddle
From here it's just a matter of GROUP BY and SUM:
select status, sum(diff) as duratation_in_seconds
from (
select t1.status, timestampdiff(second,
t1.timeStamp,
(
select min(t2.timeStamp)
from mytable t2
where t2.timeStamp > t1.timeStamp
)
) as diff
from mytable t1
) x
group by status;
Result:
| status | duratation_in_seconds |
| ------ | --------------------- |
| 1 | 259800 |
| 2 | 432000 |
| 3 | 86400 |
View on DB Fiddle
If you want the time in hours, change the first line to
select status, round(sum(diff)/3600) as duratation_in_hours
and you will get:
| status | duratation_in_hours |
| ------ | ------------------- |
| 1 | 72 |
| 2 | 120 |
| 3 | 24 |
View on DB Fiddle
You might though want to use floor() instead of round(). That's not clear from your question.
In MySQL 8 you could use the LEAD() window function to get the timestamp of the next row:
select status, sum(diff) as duratation_in_seconds
from (
select
status,
timestampdiff(second, timeStamp, lead(timeStamp) over (order by timeStamp)) as diff
from mytable
) x
group by status;
View on DB Fiddle
Related
This question already has answers here:
Group by minimum value in one field while selecting distinct rows
(10 answers)
Closed 2 years ago.
I have a table that stores facial login data of employees based upon employee id. I need to get the earliest login for each employee on a day and all other logins to be ignored. I know how to get latest or earliest record for each employee but I am unable to figure out how to get earliest entry in each day by each employee.
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 14 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-11 12:07:24 |
| 15 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-11 12:09:35 |
| 16 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-11 12:09:41 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 18 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-12 14:40:20 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 20 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-12 16:07:24 |
| 21 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-12 17:09:35 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
The result will look like below...
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
You can do:
select *
from t
where (employee_id, created_at) in (
select employee_id, min(created_at)
from t
group by employee_id, date(created_at)
)
how to get earliest entry in each day by each employee
You can filter with a correlated subquery:
select t.*
from mytable t
where t.created_at = (
select min(t1.created_at)
from mytable t1
where
t1.employee_id = t.employee_id
and t1.created_at >= date(t.created_at)
and t1.created_at < date(t.created_at) + interval 1 day
)
This query would take advantage of an index on (employee_id, created_at).
Or, if you are running MySQL 8.0, you can use window functions:
select *
from (
select
t.*,
row_number() over(
partition by employee_id, date(created_at)
order by created_at
) rn
from mytable t
) t
where rn = 1
I have a ratings table, where each user can add one rating a day. But each user might miss several days between ratings.
I'd like to get the average rating for each user_id's first 7 entries of created_at.
My table:
mysql> desc entries;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| rating | tinyint(4) | NO | | NULL | |
| user_id | int(10) unsigned | NO | MUL | NULL | |
| created_at | timestamp | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
Ideally I'd just get something like:
+------------+------------------+
| day | average_rating |
+------------+------------------+
| 1 | 2.53 |
+------------+------------------+
| 2 | 4.30 |
+------------+------------------+
| 3 | 3.67 |
+------------+------------------+
| 4 | 5.50 |
+------------+------------------+
| 5 | 7.23 |
+------------+------------------+
| 6 | 6.98 |
+------------+------------------+
| 7 | 7.22 |
+------------+------------------+
The closest I've been able to get is:
SELECT rating, user_id, created_at FROM entries ORDER BY user_id asc, created at desc
Which isn't very close at all...
Is it even possible? Will the performance be terrible? It's something that would need to run every time a web page is loaded, so would it be better to just run this once a day and save the results? (to another table!?)
edit - second attempt
Working towards a solution, I think this would get the rating for each user's first day:
select rating from entries where user_id in
(select user_id from entries order by created_at limit 1);
But I get:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
So now I'm going to play around with JOIN to see if that helps.
edit - third attempt, getting closer
I found this stackoverflow post, which is closer to what I want.
select e1.* from entries e1 left join entries e2
on (e1.user_id = e2.user_id and e1.created_at > e2.created_at)
where e2.id is null;
It gets the rating for the first day for each user.
Next step is to work out how to get days 2 to 7. I can't use 1.created_at > e2.created_at for that, so I'm really confused now.
edit - fourth attempt
Okay, I think it's not possible. Once I worked out how to turn off 'full group by' mode, I realised I'll probably need to use a subquery with limit <user_id>, <day_num>, for which I get:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
My current method is to just get the entire table, and use PHP to calculate the average for each day.
If I understand correctly you want to take the last 7 ratings the user gave, ordered by the date they gave the rating. The last 7 ratings of one user may fall on different days to another user, however they will be averaged together regardless of date.
First we need to order the data by user and date and give each user their own incrementing row count. I do this by adding two variables, one for the last user id and one for the row number:
select e.created_at,
e.rating,
if(#lastUser=user_id,#row := #row+1, #row:=1) as row,
#lastUser:= e.user_id as user_id
from entries e,
( select #row := 0, #lastUser := 0 ) vars
order by e.user_id asc,
e.created_at desc;
If the previous user_id is different we reset the row counter to 1. The result from this is:
+---------------------+--------+------+---------+
| created_at | rating | row | user_id |
+---------------------+--------+------+---------+
| 2017-01-10 00:00:00 | 1 | 1 | 1 |
| 2017-01-09 00:00:00 | 1 | 2 | 1 |
| 2017-01-08 00:00:00 | 1 | 3 | 1 |
| 2017-01-07 00:00:00 | 1 | 4 | 1 |
| 2017-01-06 00:00:00 | 1 | 5 | 1 |
| 2017-01-05 00:00:00 | 1 | 6 | 1 |
| 2017-01-04 00:00:00 | 1 | 7 | 1 |
| 2017-01-03 00:00:00 | 1 | 8 | 1 |
| 2017-01-02 00:00:00 | 1 | 9 | 1 |
| 2017-01-01 00:00:00 | 1 | 10 | 1 |
| 2017-01-13 00:00:00 | 1 | 1 | 2 |
| 2017-01-11 00:00:00 | 1 | 2 | 2 |
| 2017-01-09 00:00:00 | 1 | 3 | 2 |
| 2017-01-07 00:00:00 | 1 | 4 | 2 |
| 2017-01-05 00:00:00 | 1 | 5 | 2 |
| 2017-01-03 00:00:00 | 1 | 6 | 2 |
| 2017-01-01 00:00:00 | 1 | 7 | 2 |
| 2017-01-13 00:00:00 | 1 | 1 | 3 |
| 2017-01-01 00:00:00 | 1 | 2 | 3 |
| 2017-01-03 00:00:00 | 1 | 1 | 4 |
| 2017-01-01 00:00:00 | 1 | 2 | 4 |
| 2017-01-02 00:00:00 | 1 | 1 | 5 |
+---------------------+--------+------+---------+
We now simply wrap this in another statement to select the avg where the row number is less than or equal to seven.
select e1.row day, avg(e1.rating) avg
from (
select e.created_at,
e.rating,
if(#lastUser=user_id,#row := #row+1, #row:=1) as row,
#lastUser:= e.user_id as user_id
from entries e,
( select #row := 0, #lastUser := 0 ) vars
order by e.user_id asc,
e.created_at desc) e1
where e1.row <=7
group by e1.row;
This outputs:
+------+--------+
| day | avg |
+------+--------+
| 1 | 1.0000 |
| 2 | 1.0000 |
| 3 | 1.0000 |
| 4 | 1.0000 |
| 5 | 1.0000 |
| 6 | 1.0000 |
| 7 | 1.0000 |
+------+--------+
I am not sure this data structure able to do the result I want.
http://sqlfiddle.com/#!9/84939
This is the data, please ignore the duration column.
+----+---------------------+---------------------+---------------------+----------+--------+------+
| id | created_date | start_date | end_date | duration | status | type |
+----+---------------------+---------------------+---------------------+----------+--------+------+
| 1 | 2016-04-05 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 2 |
| 2 | 2016-04-06 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 3 | 2016-04-06 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 3 |
| 4 | 2016-04-06 15:23:29 | 2016-08-17 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 5 | 2016-04-06 15:23:29 | 2016-08-17 09:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 6 | 2016-04-06 15:23:29 | 2016-08-01 09:21:53 | 2016-08-31 00:00:00 | 30 | 1 | 1 |
| 7 | 2016-04-06 15:23:29 | 2016-08-01 09:21:53 | 2016-08-31 00:00:00 | 30 | 0 | 1 |
| 8 | 2016-04-06 15:23:29 | 2016-08-15 09:21:53 | 2016-08-16 00:00:00 | 30 | 1 | 2 |
| 9 | 2016-04-06 15:23:29 | 2016-08-16 09:21:53 | 2016-08-17 00:00:00 | 30 | 1 | 3 |
| 10 | 2016-04-06 15:23:29 | 2016-08-19 09:21:53 | 2016-08-20 00:00:00 | 30 | 1 | 2 |
+----+---------------------+---------------------+---------------------+----------+--------+------+
I want to filter the report from 2016-08-15 until 2016-08-19. for 2015-08-19 even 00:00:00, I am not sure consider count or not. But for my example. I just count it because it is in the range.
This is the summary done by me manually:-
(type-2)15,16,17,18,19
(type-1)15,16,17,18,19
(type-3)15,16,17,18,19
(type-1)17,18,19
(type-1)17,18,19
(type-1)15,16,17,18,19
(type-1)15,16,17,18,19
(type-2)15,16
(type-3)16,17
(type-2)19,20
This is the result I would like to generate in sql return data.
+------------+--------+-----------+-----------+-----------+
| date | ct_all | ct_type_1 | ct_type_2 | ct_type_3 |
+------------+--------+-----------+-----------+-----------+
| 2016-08-15 | 6 | 3 | 2 | 1 |
| 2016-08-16 | 7 | 3 | 2 | 2 |
| 2016-08-17 | 8 | 5 | 1 | 2 |
| 2016-08-18 | 7 | 5 | 1 | 1 |
| 2016-08-19 | 8 | 5 | 2 | 1 |
+------------+--------+-----------+-----------+-----------+
ct_all = count all
ct_type_1 = count total for type 1
As long as the type fall into start_date and end_date then it will count.
Normally we done search date is base on one column type, e.g created_date. and I can use between >= and <= to find the range. But this one got start and end date. Not sure can be accomplished or not.
You have three different things going on here.
an enumeration of days.
a DATETIME range filter.
a so-called pivot, pivoting rows by type into columns.
It's helpful to take these one at a time.
First, I guess you have five days you wish to filter, [15-Aug-2016 - 19-Aug-2016] inclusive. You want to make a list of all those days. This little query will do that. (http://sqlfiddle.com/#!9/84939/21/0)
SELECT CONVERT('2016-08-15' + INTERVAL seq DAY, DATETIME) AS CURDATE
FROM (SELECT 0 AS SEQ UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4
) seq_0_to_4
(Notice something: The MariaDB fork of MySQL has sequence tables like seq_0_to_4 built in so you don't have to do all this UNION ALL stuff.)
Second, you want to get a list of the type values occurring on each day. You can get that to happen with a LEFT JOIN, like so (http://sqlfiddle.com/#!9/84939/26/0).
SELECT seq.curdate, record.type
FROM (
SELECT CONVERT('2016-08-15' + INTERVAL seq DAY, DATETIME) AS CURDATE
FROM (SELECT 0 AS SEQ UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4
) seq_0_to_4
) seq
LEFT JOIN record ON seq.curdate >= DATE(record.start_date)
AND seq.curdate <= DATE(record.end_date)
This gives you a list of curdate and type values.
The ON condition of that join chooses record rows that start on or before each date, and end anytime on each date.
Finally, you need to do a pivot operation to summarize the counts of type values. That looks something like this. (http://sqlfiddle.com/#!9/84939/28/0)
SELECT curdate,
COUNT(type) ct_all,
SUM(CASE WHEN type = 1 THEN 1 ELSE 0 END) ct_1,
SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) ct_2,
SUM(CASE WHEN type = 3 THEN 1 ELSE 0 END) ct_3
FROM (the above query) d
GROUP BY curdate
ORDER BY curdate
This is a case where the structured part of Structured Query Language is necessary.
I'm trying to select the most recent rows for every unique userid where pid = 50 and active = 1. I haven't been able to figure it out.
Here is a sample table
+-----+----------+-------+-----------------------+---------+
| id | userid | pid | start_date | active |
+-----+----------+-------+-----------------------+---------+
| 1 | 4 | 50 | 2015-05-15 12:00:00 | 1 |
| 2 | 4 | 50 | 2015-05-16 12:00:00 | 1 |
| 3 | 4 | 50 | 2015-05-17 12:00:00 | 0 |
| 4 | 4 | 51 | 2015-06-29 12:00:00 | 1 |
| 5 | 4 | 51 | 2015-06-30 12:00:00 | 1 |
| 6 | 5 | 50 | 2015-07-05 12:00:00 | 1 |
| 7 | 5 | 50 | 2015-07-06 12:00:00 | 1 |
| 8 | 5 | 51 | 2015-07-08 12:00:00 | 1 |
+-----+----------+-------+-----------------------+---------+
Desired Result
+-----+----------+-------+-----------------------+---------+
| id | userid | pid | start_date | active |
+-----+----------+-------+-----------------------+---------+
| 2 | 4 | 50 | 2015-05-16 12:00:00 | 1 |
| 7 | 5 | 50 | 2015-07-06 12:00:00 | 1 |
+-----+----------+-------+-----------------------+---------+
I've tried a bunch of things and this is the closest I got but unfortunately it is not quit there.
SELECT *
FROM mytable t1
WHERE
(
SELECT COUNT(*)
FROM mytable t2
WHERE
t1.userid = t2.userid
AND t1.start_date < t2.start_date
) < 1
AND pid = 50
AND active = 1
ORDER BY start_date DESC
plan
get last record grouping by userid where pid is 50 and is active
inner join to mytable to get the record info associated with last
query
select
my.*
from
(
select userid, pid, active, max(start_date) as lst
from mytable
where pid = 50
and active = 1
group by userid, pid, active
) maxd
inner join mytable my
on maxd.userid = my.userid
and maxd.pid = my.pid
and maxd.active = my.active
and maxd.lst = my.start_date
;
output
+----+--------+-----+------------------------+--------+
| id | userid | pid | start_date | active |
+----+--------+-----+------------------------+--------+
| 2 | 4 | 50 | May, 16 2015 12:00:00 | 1 |
| 7 | 5 | 50 | July, 06 2015 12:00:00 | 1 |
+----+--------+-----+------------------------+--------+
sqlfiddle
notes
as suggested by #Strawberry, updated to join also on pid and active. this will avoid the possibility of a record which is not active or not pid 50 but has exact same date also being rendered.
Is there anyway to count a given run of timestamps that are close to each other, but not necessarily in a fixed time frame?
Ie, not grouped by hour or minute, but rather grouped by how close the current row's timestamp is to the next row's timestamp. If the next row is within "x" seconds/minutes then add that row to the group, otherwise start a new grouping.
Given this data:
+----+---------+---------------------+
| id | item_id | event_date |
+----+---------+---------------------+
| 1 | 1 | 2013-05-17 11:59:59 |
| 2 | 1 | 2013-05-17 12:00:00 |
| 3 | 1 | 2013-05-17 12:00:02 |
| 4 | 1 | 2013-05-17 12:00:03 |
| 5 | 3 | 2013-05-17 14:05:00 |
| 6 | 3 | 2013-05-17 14:05:01 |
| 7 | 3 | 2013-05-17 15:30:00 |
| 8 | 3 | 2013-05-17 15:30:01 |
| 9 | 3 | 2013-05-17 15:30:02 |
| 10 | 1 | 2013-05-18 09:12:00 |
| 11 | 1 | 2013-05-18 09:13:30 |
| 12 | 1 | 2013-05-18 09:13:45 |
| 13 | 1 | 2013-05-18 09:14:00 |
| 14 | 2 | 2013-05-20 15:45:00 |
| 15 | 2 | 2013-05-20 15:45:03 |
| 16 | 2 | 2013-05-20 15:45:10 |
| 17 | 2 | 2013-05-23 07:36:00 |
| 18 | 2 | 2013-05-23 07:36:10 |
| 19 | 2 | 2013-05-23 07:36:12 |
| 20 | 2 | 2013-05-23 07:36:15 |
| 21 | 1 | 2013-05-24 11:55:00 |
| 22 | 1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+
Desired Results:
+---------+-------+---------------------+
| item_id | total | last_date_in_group |
+---------+-------+---------------------+
| 1 | 4 | 2013-05-17 12:00:03 |
| 3 | 2 | 2013-05-17 14:05:01 |
| 3 | 3 | 2013-05-17 15:30:02 |
| 1 | 4 | 2013-05-18 09:14:00 |
| 2 | 3 | 2013-05-20 15:45:10 |
| 2 | 4 | 2013-05-23 07:36:15 |
| 1 | 2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+
This is a little complicated. To start, you need is time of the next event for each record. The following subquery adds in such a time (nexted), if it is within bounds:
select t.*,
(select event_date
from t t2
where t2.item_id = t.item_id and
t2.event_date > t.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t
This uses a correlated subquery. The <date comparison here> is for whatever date comparison you want. When there is no record, the value will be NULL.
Now, with this information (nexted) there is a trick to get the grouping. For any record, it is the first event time afterwards where nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or joins with aggregations). The result looks a bit unwieldy:
select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
COUNT(*) as num_dates
from (select t.*,
(select min(t2.event_date)
from (select t1.*,
(select event_date
from t t2
where t2.item_id = t1.item_id and
t2.event_date > t1.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t1
) t2
where t2.nexted is null
) as grouping
from t
) s
group by item_id, grouping;
What about approaching it from finding each individual record's local associations, and then grouping on the max event date from each record's discoveries. This is based on a static differential time interval (5 minutes in my example)
SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
FROM table_name t1
LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
GROUP BY t1.id
) t
GROUP BY last_date_in_group