MSQL group by interval? - mysql

I need to write a MYSQL select query to do the following, possibly using Interval, but am not sure how to achieve it.
To retrieve groups of results where the date time value of neighbouring rows is within 5 minutes. In the below simplified example I need to retrieve Bob, Ted and Fred in one group and Andy, Mike and Bert in another.
I don't know how many groups there will be. I need to be able to establish when each group starts, when it ends and collect the data in between.
Once I have these groups I need to be able to access each value from every row within a group in order to perform further calculations
**DB Table**
DateTime Value
2018-10-17 12:50 Bob
2018-10-17 12:55 Ted
2018-10-17 13:00 Fred
2018-10-17 15:00 Andy
2018-10-17 15:05 Mike
2018-10-17 15:10 Bert

In MySql 5.x you can calculate a rank by using variables.
Example:
SELECT `DateTime`, `Value`,
CASE
WHEN #prev_dt >= `DateTime` - INTERVAL 5 MINUTE AND #prev_dt := `DateTime` THEN #rank
WHEN #prev_dt := `DateTime` THEN #rank := #rank + 1 -- this WHEN criteria is always true
END AS `Rank`
FROM YourTable t
CROSS JOIN (SELECT #prev_dt := NULL, #rank := 0) vars
ORDER BY `DateTime`;
You'll find a test on SQL Fiddle here
Result:
DateTime Value Rank
-------------------- --- ----
2018-10-17T12:50:00Z Bob 1
2018-10-17T12:55:00Z Ted 1
2018-10-17T13:00:00Z Fred 1
2018-10-17T15:00:00Z Andy 2
2018-10-17T15:05:00Z Mike 2
2018-10-17T15:10:00Z Bert 2
In MySql 8.x you could also use the window function LAG to get the previous DateTime.

Related

Growing number of months passed - MySQL

I would like to create the MonthCount column described below. I have the ID and Date fields already created I am just having trouble thinking of a clever way to count the number of dates that have passed. The dates are always the first of the month, but the first month could be any month between Jan and Dec.
ID Date MonthCount
1 1/2016 1
1 2/2016 2
1 3/2016 3
2 5/2015 1
2 6/2015 2
2 7/2015 3
It seems like I remember reading somewhere about joining the table to itself using a > or < operator but I can't completely recall the method.
The best way to handle this in MySQL is to use variables:
select t.*,
(#rn := if(#id = id, #rn + 1,
if(#id := id, 1, 1)
)
) as rn
from t cross join
(select #rn := 0, #id := -1) params
order by id, date;
It looks like you're looking for:
select a.id, a.date, b.mindate
from table as a
inner join (
select id, min(date) as mindate
from table
group by id
) as b on (a.id=b.id)
this will give you
ID Date mindate
1 1/1/2016 1/1/2016
1 1/2/2016 1/1/2016
1 1/3/2016 1/1/2016
2 1/5/2015 1/5/2015
2 1/6/2015 1/5/2015
2 1/7/2015 1/5/2015
now homework for you is to figure out how to calculate difference between two dates

Query how often an event occurred at a given time

[Aim]
We would like to find out how often an event "A" ocurred before time "X". More concretely, given the dataset below we want to find out the count of the prior purchases.
[Context]
DMBS: MySQL 5.6
We have following dataset:
user | date
1 | 2015-06-01 17:00:00
2 | 2015-06-02 18:00:00
1 | 2015-06-03 19:00:00
[Desired output]
user | date | purchase count
1 | 2015-06-01 17:00:00 | 1
2 | 2015-06-02 18:00:00 | 1
1 | 2015-06-03 19:00:00 | 2
[Already tried]
We managed to get the count on a specific day using an inner join on the table itself.
[Problem(s)]
- How to do this in a single query?
This could be done using user defined variable which is faster as already mentioned in the previous answer.
This needs creating incremental variable for each group depending on some ordering. And from the given data set its user and date.
Here how you can achieve it
select
user,
date,
purchase_count
from (
select *,
#rn:= if(#prev_user=user,#rn+1,1) as purchase_count,
#prev_user:=user
from test,(select #rn:=0,#prev_user:=null)x
order by user,date
)x
order by date;
Change the table name test to your actual table name
http://sqlfiddle.com/#!9/32232/12
Probably the most efficient way is to use variables:
select t.*,
(#rn := if(#u = user, #rn + 1,
if(#u := user, 1, 1)
)
) as purchase_count;
from table t cross join
(select #rn := 0, #u := '') params
order by user, date ;
You can also do this with correlated subqueries, but this is probably faster.

MYSQL - Sum Interval Dates

I came across the following problem:
I would like to sum the hours of each name, giving a total interval between START and END activities,
would be simple if I could subtract from each record the end of the beginning, more e.g., Mary, started 13th and was up to 15 and started another activity while 14 and 16, I would like the result of it was 3 (she used 3 hours of their time to perform both activities)
e.g.:
Name | START | END |
-----------------------------------------------------------
KATE | 2014-01-01 13:00:00 | 2014-01-01 14:00:00 |
MARY | 2014-01-01 13:00:00 | 2014-01-01 15:00:00 |
TOM | 2014-01-01 13:00:00 | 2014-01-01 16:00:00 |
KATE | 2014-01-01 12:00:00 | 2014-01-02 04:00:00 |
MARY | 2014-01-01 14:00:00 | 2014-01-01 16:00:00 |
TOM | 2014-01-01 12:00:00 | 2014-01-01 18:00:00 |
TOM | 2014-01-01 22:00:00 | 2014-01-02 02:00:00 |
result:
KATE 15 hours
MARY 3 hours
TOM 9 hours
Have you tried a group by and then an aggregate function?
SELECT Name, SUM(UNIX_TIMESTAMP(End) - UNIX_TIMESTAMP(Start)) FROM myTable
GROUP BY Name
Which will return a cumulative total of seconds from the intervals you have. You can then change the seconds to hours for display.
Also I would highly recommend grouping by a primary key or something instead of a string name, but I understand that this may have been just to simplify the question.
I found this problem interesting, so spent a little more time to develop a solution. What I came up with involves sorting the rows by name and start time, then using MySQL variables to account for overlapping ranges. I begin by sorting the table and supplementing it with columns that carry the name and times from one row to the next
SELECT [expounded below]
FROM (SELECT * FROM tbl ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='',
#beg := (SELECT MIN(START) FROM tbl),
#end := (SELECT MAX(END) FROM tbl)) AS t
This adds the name and the outer bounds of the time range to each row of the table, as well as sorting the table so that
names are together in order by starting time. For each row, we will now have #same_name, #beg, and #end carrying values forward from one line to the next, and #x and #gap will accumulate the hours.
Now we have to do some reasoning about the possible overlaps that can occur. For any two intervals, they are either disjoint or have an intersection:
Non-overlapping: beg--------end START-------END
Overlapping: beg-----------end beg---------end
START--------------END START-----------END
Subset: beg---------------------------------end
START-----END
Once the rows are adjacent, we can decide if two ranges overlap by comparing their start and end points. They overlap
if the start of one is before the end of the other and vice versa:
IF( #end >= START && #beg <= END,
If they do overlap, then the total interval is the difference between the outer edges of the two intervals:
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END))
If they don't overlap, then we can just add the new interval to the previous one.
We will also need to know the gap between intervals, which is the difference from the end of the first to the beginning of the second. This will be necessary to calculate the hours for a case of more than two intervals, where only some overlap.
1-----------2 3----------4
3--------------------5
Putting this together gets us a calculation per row, where each row calculates the union of the hours with the one
above it. For each variable, we have to reset it if the name changes:
SELECT Name, START, END,
#x := IF(#same_name = Name,
IF( #end >= START && #beg <= END, -- does it overlap?
TIMESTAMPDIFF(HOUR, LEAST(#beg, START), GREATEST(#end, END)),
#x + TIMESTAMPDIFF(HOUR, START, END) ),
TIMESTAMPDIFF(HOUR,START,END) ) AS hr,
#gap := IF(#same_name = Name,
IF(#end >= START && #beg <= END, -- does it overlap?
#gap,
#gap + TIMESTAMPDIFF(HOUR, #end, START)),
0) AS gap,
#beg := IF(#same_name = Name,
CAST(LEAST(#beg, START) AS DATETIME), -- expand interval
START) AS beg, -- reset interval
#end := IF(#same_name = Name,
CAST(GREATEST(#end, END) AS DATETIME),
END) AS finish,
#same_name := Name AS sameName
FROM
(SELECT * FROM xt ORDER BY Name, START, END) AS u,
(SELECT #x := 0, #gap := 0, #same_name:='', #beg := (SELECT MIN(START) FROM xt), #end := (SELECT MAX(END) FROM xt)) AS t
That still gives us as many rows as there were in the original table. The hours and gaps will accumulate for each name, so we have to select the highest values and group by Name:
SELECT Name, MAX(hr) - MAX(gap) AS HOURS
FROM ( [insert above query here] ) AS intermediateCalculcation
GROUP BY Name;
Edit
And of course a moment after hitting enter, it occurs to me that (a) there is a bug for names that have no overlapping intervals at all; and (b) all #x is really doing is building up the interval from MIN(START) to MAX(END) for eacdh name, which could be done with a simpler query and join. Um, exercise for the reader ? :-)

MySQL query extracting two pieces of information from table

I have a table that keeps track of the scores of people playing my game
userID | game_level | date_of_attempt | score
1 1 2014-02-07 19:29:00 2
1 2 2014-02-08 19:00:00 0
2 1 2014-03-03 11:11:04 4
... ... ... ...
I am trying to write a query that, for a given user, will tell me their cumulative score for each game_level as well as they average of the last 20 scores they have obtained on a particular game_level (by sorting on date_of_attempt)
For example:
userID | game_level | sum of scores on game level | average of last 20 level scores
1 1 26 4.5
1 2 152 13
Is it possible to do such a thing in a single query? I often need to perform the query for multiple game_levels, and I use a long subquery to work out which levels are needed which makes me think a single query would be better
MySQL does not support analytic functions, so obtaining the average is trickier than it would be in some other RDBMS. Here I use user-defined variables to obtain the groupwise rank and then test on the result to average only over the 20 most recent records:
SELECT userID, game_level, SUM(score), x.avg
FROM my_table JOIN (
SELECT AVG(CASE WHEN (#rank := (CASE
WHEN t.userID = #userID
AND t.game_level = #gamelevel
THEN #rank + 1
ELSE 0
END) < 20 THEN score END) AS avg,
#userID := userID AS userID,
#game_level := game_level AS game_level
FROM my_table,
(SELECT #rank := #userID := #game_level := NULL) init
ORDER BY userID, game_level, date_of_attempt DESC
) x USING (userID, game_level)
GROUP BY userID, game_level
See How to select the first/least/max row per group in SQL for further information.

Finding employees' overall average times for multiple entries

With MySql, I am trying to write a query to find the average frequency in which employees update their cases. The table name is tgs_doc_his and there are three columns I need to use: EmpID, CaseID and ActualDate. An employee checks out the case, which the system makes the first date entry. Then there are several different history updates the employee can to until the case is closed. These statuses are irrelevant but I include thsi information to make it easier to see what I am trying to do.
It might look like:
EmpID | CaseID | ActualDate | Status
1 , 1 , 2014-01-01 15:00, Checked Out
1 , 2 , 2014-01-02 08:00, Checked Out
1 , 1 , 2014-01-02 09:00, Attempted
1 , 2 , 2014-01-02 10:30, Delivered
2 , 3 , 2014-01-02 11:00, Checked Out
1 , 1 , 2014-01-02 12:00, Delivered
2 , 3 , 2014-01-02 14:45, Delivered
Here you can see I have two(2) employees and three(3) cases. How would I figure out the average amount of time an employee has between status updates overall for every case?
Example. Employee 1's case1 averages 7.0 hours + Case2 Ave = 2.5 hours for a total average of 4.75 hours for all of his cases while employee 2's overall average is 3.75 hours.
I want this returned:
ID, AveTime
1, 4.75
2, 3.75
Is this too much of a challenge? I've been pulling out my hair here.
SELECT delivered.empID, AVG(delivered.ActualDate - checkout.ActualDate) AS AveTime
FROM tgs_doc_his AS delivered
JOIN tgs_doc_his AS checkout ON delivered.empID = checkout.empID
WHERE delivered.Status = 'Delivered'
AND check.Status = 'Checked Out'
GROUP BY delivered.empID
UPDATE:
select empid, avg(datediff)/3600 AS avetime
from (
select empid,
if(#curemp = empid and #curcase = caseid, unix_timestamp(actualdate)-#prevdate, null) as datediff,
#prevdate := unix_timestamp(actualdate),
#curemp := empid, #curcase := caseid
from tgs_doc_his
cross join (select #curemp := null, #curcase := null, #prevdate := null) vars
order by empid, caseid, actualdate) times
group by empid
DEMO