I have an events table in mysql with two columns: UserId,EventTime(datetime).
I need to calculate for each UserId two things:
Number of sessions
Sum of sessions lengths
A session end is defined by 2 minutes that there were no events for that user.
How can i write such a query?
So for example, for this user in the attached image, the number of sessions would be 2, the sum of session lengths would be 2 minutes and 34 seconds
This is a pain in MySQL. One method uses a correlated subquery to identify the starts and then variables to assign a number to a session:
select e.*,
(#s := if(#u <> userid or eventtime > prev_et + interval 2 minute, #s + 1
if(#u := userid, #s, #s)
)
) as session_id
from (select e.*,
(select max(e2.eventtime) from events e2 where e2.userid = e.userid and e2.eventtime < e.eventtime
) as prev_et
from events e
order by userid, eventtime
) e cross join
(select #u := 1, #s := 0) params;
Related
I have a message (id, userid, message) table that grows rapidly.
I would like to delete all messages per user except his last 30
ex:
if user1 has 100 messages, we will delete the first 70,
if user2 has 40 messages, we will delete the first 10,
if userN has 10 messages, no action is taken
Is there a way to do it with a single SQL ?
My idea for now is to make a LOOP with PHP and lake N sql, which is very long for N users.
MySQL (pre 8.0) doesn't have a really convenient way to do this. One method uses variables to enumerate the values:
select m.*,
(#rn := if(#u = userid, #rn + 1,
if(#u := userid, 1, 1)
)
) as seqnum
from (select m.*
from messages m
order by userid, id desc
) m cross join
(select #u := -1, #rn := 0) params;
You can turn this into a delete using join:
delete m
from messages m join
(select m.*,
(#rn := if(#u = userid, #rn + 1,
if(#u := userid, 1, 1)
)
) as seqnum
from (select m.*
from messages m
order by userid, id desc
) m cross join
(select #u := -1, #rn := 0) params
) mm
on m.id = mm.id
where seqnum > 30;
As I say in a comment, I don't think this is a good solution for a real-world problem. The history of messages is useful and there are probably other ways to achieve the performance you want. The difference between 30 messages for a user and 70 messages for a user should not have that much of an effect on performance, in a tuned system.
SET #row_number = 0;
DELETE FROM MESSAGE
WHERE ID IN
( SELECT ID FROM
(SELECT ID,
#row_number:=CASE
WHEN #userid = userid THEN
#row_number + 1
ELSE 1
END AS num,
#userid:=userid as userid
FROM MESSAGE) A
WHERE NUM > 70 )
I am using a modified version of a query similiar to another question here:Convert SQL Server query to MySQL
Select *
from
(
SELECT tbl.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, tbl
Where client_id = 55
ORDER BY ordcolumn
) X
where counter >= (80/100 * #counter);
ORDER BY ordcolumn
tbl.* contains the field 'client_id' and I am attempting to get the top 20% of the records for each client_id in a single statement. Right now if I feed it a single client_id in the where statement it gives me the correct results, however if I feed it multiple client_id's it simply takes the top 20% of the combined recordset instead of doing each client_id individually.
I'm aware of how to do this in most databases, but the logic in MySQL is eluding me. I get the feeling it involves some ranking and partitioning.
Sample data is pretty straight forward.
Client_id rate
1 1
1 2
1 3
(etc to rate = 100)
2 1
2 2
2 3
(etc to rate = 100)
Actual values aren't that clean, but it works.
As an added bonus...there is also a date field associated to these records and 1 to 100 exists for this client for multiple dates. I need to grab the top 20% of records for each client_id, year(date),month(date)
You need to do the enumeration for each client:
SELECT *
FROM (SELECT tbl.*, #counter := #counter +1 counter
(#rn := if(#c = client_id, #rn + 1,
if(#c := client_id, 1, 1)
)
)
FROM (select #c := -1, #rn := 0) initvar CROSS JOIN tbl
ORDER BY client_id, ordcolumn
) t cross join
(SELECT client_id, COUNT(*) as cnt
FROM tbl
GROUP BY client_id
) tt
where rn >= (80/100 * tt.cnt);
ORDER BY ordcolumn;
Using Gordon's answer as a starting point, I think this might be closer to what you need.
SELECT t.*
, (#counter := #counter+1) AS overallRow
, (#clientRow := if(#prevClient = t.client_id, #clientRow + 1,
if(#prevClient := t.client_id, 1, 1) -- This just updates #prevClient without creating an extra field, though it makes it a little harder to read
)
) AS clientRow
-- Alteratively (for everything done in clientRow)
, #clientRow := if(#prevClient = t.client_id, #clientRow + 1, 1) AS clientRow
, #prevClient := t.client_id AS extraField
-- This may be more reliable as well; I not sure if the order
-- of evaluation of IF(,,) is reliable enough to guarantee
-- no side effects in the non-"alternatively" clientRow calculation.
FROM tbl AS t
INNER JOIN (
SELECT client_id, COUNT(*) AS c
FROM tbl
GROUP BY client_id
) AS cc ON tbl.client_id = cc.client_id
INNER JOIN (select #prevClient := -1, #clientRow := 0) AS initvar ON 1 = 1
WHERE t.client_id = 55
HAVING clientRow * 5 < cc.c -- You can use a HAVING without a GROUP BY in MySQL
-- (note that clientRow is derived, so you cannot use it in the `WHERE`)
ORDER BY t.client_id, t.ordcolumn
;
I have a MySQL table for fictional fitness app.
Let's say that app is monitoring user progress on doing pushups day by day.
TrainingDays
id | id_user | date | number_of_pushups
Now, I need to find if user have ever managed to do more than 100 pushups 5 days in a row.
I know this is probably doable by fetching all days and then making some php loops, but I wonder if there is possibility to do this in plain mysql...
In MySQL, the easiest way is to use variables. The following gets all sequences of days with 100 or more pushups:
select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp;
This uses the observation that when you subtract a sequence of numbers from a series of dates that increment, then the resulting value is constant.
To determine if there are 5 or more days in a row, use max():
select max(numdaysinarow)
from (select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp
) td;
Your app can then check the value against whatever minimum you like.
Note: this assumes that there is only one record per day. The above can easily be modified if you are looking for the sum of the number of pushups on each day.
Order of records shouldn't be relied on, e.g. with ORDER BY you can change the sequence.
However, you have many functions at hand in a database, which also enables you to use less PHP. What you want is SUM function. Combined with a WHERE clause, this should get you started:
SELECT SUM(number_of_pushups) AS sum_pushups
FROM TrainingDays
WHERE date >= :start_day
AND user_id = :user_id
I have a activity log with the following schema:
visitor_id, metadata, timestamp
The first field is the visitors id, the second some metadata for a given activity and the last a unix timestamp from when the activity occurred.
Now, i want to identify individual sessions from this log. That is; i want to group all rows for each visitor where the timestamp is no longer then x seconds apart (eg. 20*60 for 20 minutes) from either the previous or following row by the same visitor.
How can that be done?
You can create something like custom groups like this:
SELECT
t.visitor_id,
MIN(t.timestamp),
MAX(t.timestamp)
FROM (
SELECT
IF(#lt < l.`timestamp` - 60*20 OR l.visitor_id != #lv, #g := #g + 1, #g) as g,
#lv := l.visitor_id,
#lt := l.`timestamp`,
l.*
FROM your_log l
JOIN (SELECT #g := 1, #lt = 0, #lv = NULL) as init
ORDER BY l.visitor_id, l.`timestamp`
) as t
GROUP BY t.visitor_id, g
I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID