MySQL group by time buckets - mysql

I have a activity log with the following schema:
visitor_id, metadata, timestamp
The first field is the visitors id, the second some metadata for a given activity and the last a unix timestamp from when the activity occurred.
Now, i want to identify individual sessions from this log. That is; i want to group all rows for each visitor where the timestamp is no longer then x seconds apart (eg. 20*60 for 20 minutes) from either the previous or following row by the same visitor.
How can that be done?

You can create something like custom groups like this:
SELECT
t.visitor_id,
MIN(t.timestamp),
MAX(t.timestamp)
FROM (
SELECT
IF(#lt < l.`timestamp` - 60*20 OR l.visitor_id != #lv, #g := #g + 1, #g) as g,
#lv := l.visitor_id,
#lt := l.`timestamp`,
l.*
FROM your_log l
JOIN (SELECT #g := 1, #lt = 0, #lv = NULL) as init
ORDER BY l.visitor_id, l.`timestamp`
) as t
GROUP BY t.visitor_id, g

Related

Session time calculation in sql

I have an events table in mysql with two columns: UserId,EventTime(datetime).
I need to calculate for each UserId two things:
Number of sessions
Sum of sessions lengths
A session end is defined by 2 minutes that there were no events for that user.
How can i write such a query?
So for example, for this user in the attached image, the number of sessions would be 2, the sum of session lengths would be 2 minutes and 34 seconds
This is a pain in MySQL. One method uses a correlated subquery to identify the starts and then variables to assign a number to a session:
select e.*,
(#s := if(#u <> userid or eventtime > prev_et + interval 2 minute, #s + 1
if(#u := userid, #s, #s)
)
) as session_id
from (select e.*,
(select max(e2.eventtime) from events e2 where e2.userid = e.userid and e2.eventtime < e.eventtime
) as prev_et
from events e
order by userid, eventtime
) e cross join
(select #u := 1, #s := 0) params;

SQL query to select last X entries for a certain non-primary field

I'm having difficulties setting up a slightly more advanced SQL query.
What I'm trying to do is to select the last 24 entries for every zr_miner_id, but I keep getting SQL timeouts (the table has around 40000 entries so far).
So let's say there's 200 entries for zr_miner_id 1 and 200 for zr_miner_id 2, I'd end up with 48 results.
So far, I've come up with the query below.
What this is supposed to do is to select each result in zec_results that has less than 24 newer entries with the same zr_miner_id.
I couldn't think of any better way to perform this task, but then again, I'm not that far advanced at SQL yet.
SELECT results_a.*
FROM zec_results results_a
WHERE (
SELECT COUNT(results_b.zr_id)
FROM zec_results AS results_b
WHERE results_b.zr_miner_id = results_a.zr_miner_id
AND results_b.zr_id >= results_a.zr_id
) <= 24
Use variables!
SELECT r.*
FROM (SELECT r.*,
(#rn := if(#m = r.zr_miner_id, #rn + 1,
if(#m := r.zr_miner_id, 1, 1)
)
) as rn
FROM zec_results r CROSS JOIN
(SELECT #m := -1, #rn := 0) params
ORDER BY r.zr_miner_id, r.zr_id DESC
) r
WHERE rn <= 24 ;
If you want to put the query into a view, then the above will not work. Performance on your approach might improve with an index on (zr_miner_id, zr_id).

MySQL Query get the last N rows per Group

Suppose that I have a database which contains the following columns:
VehicleID|timestamp|lat|lon|
I may have multiple times the same VehicleId but with a different timestamp. Thus VehicleId,Timestamp is the primary key.
Now I would like to have as a result the last N measurements per VehicleId or the first N measurements per vehicleId.
How I am able to list the last N tuples according to an ordering column (e.g. in our case timestamp) per VehicleId?
Example:
|VehicleId|Timestamp|
1|1
1|2
1|3
2|1
2|2
2|3
5|5
5|6
5|7
In MySQL, this is most easily done using variables:
select t.*
from (select t.*,
(#rn := if(#v = vehicle, #rn + 1,
if(#v := vehicle, 1, 1)
)
) as rn
from table t cross join
(select #v := -1, #rn := 0) params
order by VehicleId, timestamp desc
) t
where rn <= 3;

Group rows by time interval

i have a database with very much rows from a gps sender. The gps have 1 seconds delay to send next row to the database. So what i want to do is a web interface that shows travels, i dont want to show much rows, i want to group the rows to trips. So i want to do is a query who can declare a trip/travel by checking if its more then 14 minutes to next row, if it is then make a row of all rows before a give it a trip number, else add it to the "travel" collection.
Try this (example is at http://sqlfiddle.com/#!2/a0c86/39)
SELECT Trip, MIN(Date_Time), MAX(Date_Time)
FROM (
SELECT #Trip := IF(TIMESTAMPDIFF(MINUTE, #Date_Time, Date_Time) <= 20, #Trip, #Trip+1) AS TRIP
, logid
, #Date_Time := Date_time AS Date_Time
FROM gpslog
JOIN (SELECT #TRIP := 1, #Date_Time := null ) AS tmp
ORDER BY Date_Time) AS triplist
GROUP BY Trip

MySql counting instances of event

So I have an event log that logs every 5 minutes so my logs look something like this:
OK
Event1
Event1
Event1
OK
Event1
OK
Event1
Event1
Event1
OK
In this case I'd have 3 instances of "Event1", since it had an "OK" period in between the periods when that status was returned.
Is there some decent way to handle this via mySql? (Note, there are other statuses other than Event1 / OK that come up quite regularly)
The actual Sql structure looks something like this:
-Historical
--CID //Unique Identifier, INT, AI
--ID //Unique Identifier for LOCATION, INT
--LOCATION //Unique Identifier for Location, this is the site name, VarChar
--STATUS //Pulled from Software event logger, VarChar
--TIME //Pulled from Software event logger, DateTime
Another answer using a totally different way of doing it:-
SELECT MAX(#Counter) AS EventCount -- Get the max counter
FROM (SELECT #Counter:=#Counter + IF(status = 'OK' AND #PrevStatus = 1, 1, 0), -- If it is an OK record and the prev status was not an OK then add 1 to the counter
#PrevStatus:=CASE
WHEN status = 'OK' THEN #PrevStatus := 2 -- An OK status so save as a prev status of 2
WHEN status != 'OK' AND #PrevStatus != 0 THEN #PrevStatus := 1 -- A non OK status but when there has been a previous OK status
ELSE #PrevStatus:=0 -- Set the prev status to 0, ie, for a record where there is no previous OK status
END
FROM (SELECT * FROM historical ORDER BY TimeStamp) a
CROSS JOIN (SELECT #Counter:=0, #PrevStatus := 0) b -- Initialise counter and store of prev status.
)c
This is using user variables. It has a subselect to get the records back in the right order, then uses a user variable to store a code for the previous status. Starts at 0 and when it finds a status of OK it sets the previous status to a 2. If it finds a status other than OK then it sets the prev status to 1, but ONLY if the prev status is not 0 (ie, it has already found a status of OK). Before storing the prev status code, if the current status is OK and the prev status code is a 1 then it adds 1 to the counter, otherwise it adds 0 (ie, adds nothing)
Then it just has a select around the outside to select the max value of the counter.
Seems to work but hardly readable!
EDIT - To cope with multiple ids
SELECT id, MAX(aCounter) AS EventCount -- Get the max counter for each id
FROM (SELECT id,
#PrevStatus:= IF(#Previd = id, #PrevStatus, 0), -- If the id has changed then set the store of previous status to 0
status,
#Counter:=IF(#Previd = id, #Counter + IF(status = 'OK' AND #PrevStatus = 1, 1, 0), 0) AS aCounter, -- If it is an OK record and the prev status was not an OK and was for the same id then add 1 to the counter
#PrevStatus:=CASE
WHEN status = 'OK' THEN #PrevStatus := 2 -- An OK status so save as a prev status of 2
WHEN status != 'OK' AND #PrevStatus != 0 THEN #PrevStatus := 1 -- A non OK status but when there has been a previous OK status
ELSE #PrevStatus:=0 -- Set the prev status to 0, ie, for a record where there is no previous OK status
END,
#Previd := id
FROM (SELECT * FROM historical ORDER BY id, TimeStamp) a
CROSS JOIN (SELECT #Counter:=0, #PrevStatus := 0, #Previd := 0) b
)c
GROUP BY id -- Group by clause to allow the selection of the max counter per id
Which is even less readable!
Another option, again using user variables to generate a sequence number:-
SELECT Sub1.id, COUNT(DISTINCT Sub1.aCounter) -- Count the number of distinct Sub1 records found for an id (without the distinct counter it would count all the recods between OK status records)
FROM (
SELECT id,
`TimeStamp`,
#Counter1:=IF(#Previd1 = id, #Counter1 + 1, 0) AS aCounter, -- Counter for this status within id
#Previd1 := id -- Store the id, used to determine if the id has changed and so whether to start the counters at 0 again
FROM (SELECT * FROM historical WHERE status = 'OK' ORDER BY id, `TimeStamp`) a -- Just get the OK status records, in id / timestamp order
CROSS JOIN (SELECT #Counter1:=0, #Previd1 := 0) b -- Initialise the user variables.
) Sub1
INNER JOIN (SELECT id,
`TimeStamp`,
#Counter2:=IF(#Previd2 = id, #Counter2 + 1, 0) AS aCounter,-- Counter for this status within id
#Previd2 := id-- Store the id, used to determine if the id has changed and so whether to start the counters at 0 again
FROM (SELECT * FROM historical WHERE status = 'OK' ORDER BY id, `TimeStamp`) a -- Just get the OK status records, in id / timestamp order
CROSS JOIN (SELECT #Counter2:=0, #Previd2 := 0) b -- Initialise the user variables.
) Sub2
ON Sub1.id = Sub2.id -- Join the 2 subselects based on the id
AND Sub1.aCounter + 1 = Sub2.aCounter -- and also the counter. So Sub1 is an OK status, while Sub2 the the next OK status for that id
INNER JOIN historical Sub3 -- Join back against historical
ON Sub1.id = Sub3.id -- on the matching id
AND Sub1.`TimeStamp` < Sub3.`TimeStamp` -- and where the timestamp is greater than the timestamp in the Sub1 OK record
AND Sub2.`TimeStamp` > Sub3.`TimeStamp` -- and where the timestamp is less than the timestamp in the Sub2 OK record
GROUP BY Sub1.id -- Group by the Sub1 id
This is grabbing the table twice for just the status OK records, adding a sequence number each time and matching where the id matches and the sequence number on the 2nd copy is 1 greater than the first one (ie, it is finding each OK and the OK immediately following it). Then joins that against the table where the id matches and the timestamp is between the 2 OK records. Then counts the distinct occurrences of the first counter for each id.
This should be a bit more readable.
Quick try, and I have a feeling I am missing a far better way to do this but think this will work.
SELECT COUNT(*)
FROM
(
SELECT DISTINCT a.time, b.time
FROM Historical a
INNER JOIN Historical b
ON a.time < b.time
AND a.status = 'OK'
AND b.status = 'OK'
INNER JOIN Historical c
ON a.time < c.time
AND c.time < b.time
AND c.status = 'Event1'
LEFT OUTER JOIN Historical d
ON a.time < d.time
AND d.time < b.time
AND d.status = 'OK'
WHERE d.cid IS NULL
) Sub1
Joins the table against itself repeatedly. Alias a and b should be for OK events, with c being for any Event1 event between those dates. Alias d is looking for an OK event between a and b, and if any are found then the record is dropped in the WHERE clause.
Then use DISTINCT to get rid of the duplicates. Then count the result.
Possible it could be simplified as something like the following (although probably best to cast the dates to chars in the select if doing this)
SELECT COUNT(DISTINCT CONCAT(a.time, b.time))
FROM Historical a
INNER JOIN Historical b
ON a.time < b.time
AND a.status = 'OK'
AND b.status = 'OK'
INNER JOIN Historical c
ON a.time < c.time
AND c.time < b.time
AND c.status = 'Event1'
LEFT OUTER JOIN Historical d
ON a.time < d.time
AND d.time < b.time
AND d.status = 'OK'
WHERE d.cid IS NULL
What you want to count, it seems, are instances of an event when the previous record is OK. You identify these with a correlated subquery, and then summarize to get the numbers:
select status, count(*)
from (select h.*,
(select h2.status
from historical h2
where h2.time < h.time
order by h2.time desc
limit 1
) as prevStatus
from historical h
) h
where status <> 'OK' and (prevStatus = 'OK' or prevStatus is NULL)
group by status;
It is not clear which column contains the values OK and Event1. I'm guessing it is status. I also don't know what role location plays, but this should at least get you started.