Checking for maximum length of consecutive days which satisfy specific condition - mysql

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?

This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;

Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.

That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID

Related

Grouping rows via two different columns in MYSQL

I just want to ask if grouping rows with the same value but came from different columns is possible.
I have a scenario that we should sum up the total minutes if the records are found "continuous" transactions by checking if the STARTDATETIME column matches the previous data of ENDDATETIME column if they are the same. See image link below for reference.
Thanks guys.
I modified Gordon Linoff's solution ( see my comment under the question):
SELECT
c.employee_id
,MIN(c.start_date) AS start_date
,MAX(c.end_date) AS end_date
,COUNT(*) AS numcontracts,
TIMESTAMPDIFF(minute,MIN(c.start_date),MAX(c.end_date)) AS timediff
FROM
(
SELECT
c0.*
,(#rn := #rn + COALESCE(startflag, 0)) AS cumestarts
FROM
(SELECT c1.*,
(NOT EXISTS (SELECT 1
FROM contracts c2
WHERE c1.employee_id = c2.employee_id AND
c1.start_date = c2.end_date
)
) AS startflag
FROM contracts c1
ORDER BY employee_id, start_date
) c0 CROSS JOIN (SELECT #rn := 0) params
) c
GROUP BY c.employee_id, c.cumestarts
http://rextester.com/VOGMU19779
timediff contains the minutes passed in the combined interval.

Calculate Date by number of working days from a certain Startdate

I've got two tables, a project table and a calendar table. The first containts a startdate and days required. The calendar table contains the usual date information, like date, dayofweek, and a column is workingday, which shows if the day is a saturday, sunday, or bank holiday (value = 0) or a regular workday (value = 1).
For a certain report I need write a stored procedure that calculates the predicted enddate by adding the number of estimated workddays needed.
Example:
**Projects**
Name Start_Planned Work_days_Required
Project A 02.05.2016 6
Calendar (04.05 is a bank holdiday)
Day Weekday Workingday
01.05.2016 7 0
02.05.2016 1 1
03.05.2016 2 1
04.05.2016 3 0
05.05.2016 4 1
06.05.2016 5 1
07.05.2016 6 0
08.05.2016 7 0
09.05.2016 1 1
10.05.2016 2 1
Let's say, the estimated number of days required is given as 6 (which leads to the predicted enddate of 10.05.2016). Is it possible to join the tables in a way, which allows me to put something like
select date as enddate_predicted
from calendar
join projects
where number_of_days = 6
I would post some more code, but I'm quite stuck on how where to start.
Thanks!
You could get all working days after your first date, then apply ROW_NUMBER() to get the number of days for each date:
SELECT Date, DayNum = ROW_NUMBER() OVER(ORDER BY Date)
FROM Calendar
WHERE IsWorkingDay = 1
AND Date >= #StartPlanned
Then it would just be a case of filtering for the 6th day:
DECLARE #StartPlanned DATE = '20160502',
#Days INT = 6;
SELECT Date
FROM ( SELECT Date, DayNum = ROW_NUMBER() OVER(ORDER BY Date)
FROM Calendar
WHERE WorkingDay = 1
AND Date >= #StartPlanned
) AS c
WHERE c.DayNum = #Days;
It's not part of the question, but for future proofing this is easier to acheive in SQL Server 2012+ with OFFSET/FETCH
DECLARE #StartPlanned DATE = '20160502',
#Days INT = 6;
SELECT Date
FROM dbo.Calendar
WHERE Date >= #StartPlanned
AND WorkingDay = 1
ORDER BY Date
OFFSET (#Days - 1) ROWS FETCH NEXT 1 ROWS ONLY
ADDENDUM
I missed the part earlier about having another table, and the comment about putting it into a cursor has prompted me to amend my answer. I would add a new column to your calendar table called WorkingDayRank:
ALTER TABLE dbo.Calendar ADD WorkingDayRank INT NULL;
GO
UPDATE c
SET WorkingDayRank = wdr
FROM ( SELECT Date, wdr = ROW_NUMBER() OVER(ORDER BY Date)
FROM dbo.Calendar
WHERE WorkingDay = 1
) AS c;
This can be done on the fly, but you will get better performance with it stored as a value, then your query becomes:
SELECT p.Name,
p.Start_Planned,
p.Work_days_Required,
EndDate = c2.Date
FROM Projects AS P
INNER JOIN dbo.Calendar AS c1
ON c1.Date = p.Start_Planned
INNER JOIN dbo.Calendar AS c2
ON c2.WorkingDayRank = c1.WorkingDayRank + p.Work_days_Required - 1;
This simply gets the working day rank of your start date, and finds the number of days ahead specified by the project by joining on WorkingDayRank (-1 because you want the end date inclusive of the range)
This will fail, if you ever plan to start your project on a non working day though, so a more robust solution might be:
SELECT p.Name,
p.Start_Planned,
p.Work_days_Required,
EndDate = c2.Date
FROM Projects AS P
CROSS APPLY
( SELECT TOP 1 c1.Date, c1.WorkingDayRank
FROM dbo.Calendar AS c1
WHERE c1.Date >= p.Start_Planned
AND c1.WorkingDay = 1
ORDER BY c1.Date
) AS c1
INNER JOIN dbo.Calendar AS c2
ON c2.WorkingDayRank = c1.WorkingDayRank + p.Work_days_Required - 1;
This uses CROSS APPLY to get the next working day on or after your project start date, then applies the same join as before.
This query returns a table with a predicted enddate for each project
select name,min(day) as predicted_enddate from (
select c.day,p.name from dbo.Calendar c
join dbo.Calendar c2 on c.day>=c2.day
join dbo.Projects p on p.start_planned<=c.day and p.start_planned<=c2.day
group by c.day,p.work_days_required,p.name
having sum(c2.workingday)=p.work_days_required
) a
group by name
--This gives me info about all projects
select p.projectname,p.Start_Planned ,c.date,
from calendar c
join
projects o
on c.date=dateadd(days,p.Work_days_Required,p.Start_Planned)
and c.isworkingday=1
now you can use CTE like below or wrap this in a procedure
;with cte
as
(
Select
p.projectnam
p.Start_Planned ,
c.date,datediff(days,p.Start_Planned,c.date) as nooffdays
from calendar c
join
projects o
on c.date=dateadd(days,p.Work_days_Required,p.Start_Planned)
and c.isworkingday=1
)
select * from cte where nooffdays=6
use below logic
CREATE TABLE #proj(Name varchar(50),Start_Planned date,
Work_days_Required int)
insert into #proj
values('Project A','02.05.2016',6)
CReATE TABLE #Calendar(Day date,Weekday int,Workingday bit)
insert into #Calendar
values('01.05.2016',7,0),
('02.05.2016',1,1),
('03.05.2016',2,1),
('04.05.2016',3,0),
('05.05.2016',4,1),
('06.05.2016',5,1),
('07.05.2016',6,0),
('08.05.2016',7,0),
('09.05.2016',1,1),
('10.05.2016',2,1)
DECLARE #req_day int = 3
DECLARE #date date = '02.05.2016'
--SELECT #req_day = Work_days_Required FROM #proj where Start_Planned = #date
select *,row_number() over(order by [day] desc) as cnt
from #Calendar
where Workingday = 1
and [Day] > #date
SELECT *
FROM
(
select *,row_number() over(order by [day] desc) as cnt
from #Calendar
where Workingday = 1
and [Day] > #date
)a
where cnt = #req_day

figure out total seconds based on timestamp

I have a table with three columns (user, timestamp, activity). The activity is basically check in or check out. How do I generate a query to view total seconds a user has clocked in by day?
There is an issue with the system that sometimes has two check ins but only one check out...in which case I want to only take the lowest check in timestamp.
I came up with something like this (activity: 1 = check in, 0 = checkout):
select a.user_id, a.d, time_to_sec(TIMEDIFF(b.created_at, a.created_at)) total_secs,
a.created_at check_in, b.created_at check_out
from (select user_id, created_at, date(created_at) d, #rownum := #rownum + 1 AS num from table, (SELECT #rownum := 0) r where activity = 1 order by user_id, created_at) a
join (select user_id, created_at, date(created_at) d, #rownum2 := #rownum2 + 1 AS num from table, (SELECT #rownum2 := 0) r where activity = 0 order by user_id, created_at) b
on a.user_id = b.user_id and a.d = b.d and a.num = b.num
However, I think relying on rownum is not accruate
If I understand your requirement correctly, you want the number of seconds between a checkout and the earliest check-in before that, but after the previous checkout.
So what you would need is a query that joins a checkout with the previous checkout and then with the earliest check-in between these two activities.
select
curr_co.user_id,
curr_co.created_at,
max(prev_co.created_at)
from
table as curr_co
left outer join
table as prev_co
on (curr_co.user_id = prev_co.user_id and curr_co.created_at > prev_co.created_at and curr_co.activity = _checkout_ and prev_co.activity = _checkout_)
group by curr_co.user_id, curr_co.created_at
This query should give you a list of checkouts with previous checkouts per user.
Now select all check-ins in between and from those the minimal and calculate the diff.
select ... min(ci.created_at), time_to_sec(TIMEDIFF(min(ci.created_at), curr_co.created_at))
... join
table as ci
on (ci.user_id = curr_co.user_id and ci.created_at < curr_co.created_at and ci.created_at > prev_co.created_at and ci.activity = _check-in_)

Get a query to list the records that are on and in between the start and the end values of a particular column for the same Id

There is a table with the columns :
USE 'table';
insert into person values
('11','xxx','1976-05-10','p1'),
('11','xxx ','1976-06-11','p1'),
('11','xxx ','1976-07-21','p2'),
('11','xxx ','1976-08-31','p2'),
Can anyone suggest me a query to get the start and the end date of the person with respect to the place he changed chronologically.
The query I wrote
SELECT PId,Name,min(Start_Date) as sdt, max(Start_Date) as edt, place
from **
group by Place;
only gives me the first two rows of my answer. Can anyone suggest the query??
This isn't pretty, and performance might be horrible, but at least it works:
select min(sdt), edt, place
from (
select A.Start_Date sdt, max(B.Start_Date) edt, A.place
from person A
inner join person B on A.place = B.place
and A.Start_Date <= B.Start_Date
left join person C on A.place != C.place
and A.Start_Date < C.Start_Date
and C.Start_Date < B.Start_Date
where C.place is null
group by A.Start_Date, A.place
) X
group by edt, place
The idea is that A and B represent all pairs of rows. C will be any row in between these two which has a different place. So after the C.place is null restriction, we know that A and B belong to the same range, i.e. a group of rows for one place with no other place in between them in chronological order. From all these pairs, we want to identify those with maximal range, those which encompass all others. We do so using two nested group by queries. The inner one will choose the maximal end date for every possible start date, whereas the outer one will choose the minimal start date for every possible end date. The result are maximal ranges of chronologically subsequent rows describing the same place.
This can be achived by:
SELECT Id, PId,
MIN(Start_Date) AS sdt,
MAX(Start_Date) as edt,
IF(`place` <> #var_place_prev, (#var_rank:= #var_rank + 1), #var_rank) AS rank,
(#var_place_prev := `place`) AS `place`
FROM person, (SELECT #var_rank := 0, #var_place_prev := "") dummy
GROUP BY rank, Place;
Example: SQLFiddle
If you want records to be ordered by ID then:
SELECT Id, PId,
MIN(Start_Date) AS sdt,
MAX(Start_Date) as edt,
`place`
FROM(
SELECT Id, PId,
Start_Date
IF(`place` <> #var_place_prev,(#var_rank:= #var_rank + 1),#var_rank) AS rank,
(#var_place_prev := `place`) AS `place`
FROM person, (SELECT #var_rank := 0, #var_place_prev := "") dummy
ORDER BY ID ASC
) a
GROUP BY rank, Place;

mysql find date where no row exists for previous day

I need to select how many days since there is a break in my data. It's easier to show:
Table format:
id (autoincrement), user_id (int), start (datetime), end (datetime)
Example data (times left out as only need days):
1, 5, 2011-12-18, 2011-12-18
2, 5, 2011-12-17, 2011-12-17
3, 5, 2011-12-16, 2011-12-16
4, 5, 2011-12-13, 2011-12-13
As you can see there would be a break between 2011-12-13 and 2011-12-16. Now, I need to be able say:
Using the date 2011-12-18, how many days are there until a break:
2011-12-18: Lowest sequential date = 2011-12-16: Total consecutive days: 3
Probably: DATE_DIFF(2011-12-18, 2011-12-16)
So my problem is, how can I select that 2011-12-16 is the lowest sequential date? Remembering that data applies for particular user_id's.
It's kinda like the example here: http://www.artfulsoftware.com/infotree/queries.php#72 but in the reverse.
I'd like this done in SQL only, no php code
Thanks
SELECT qmin.start, qmax.end, DATE_DIFF( qmax.end, qmin.start ) FROM table AS qmin
LEFT JOIN (
SELECT end FROM table AS t1
LEFT JOIN table AS t2 ON
t2.start > t1.end AND
t2.start < DATE_ADD( t1.end, 1 DAY )
WHERE t1.end >= '2011-12-18' AND t2.start IS NULL
ORDER BY end ASC LIMIT 1
) AS qmax
LEFT JOIN table AS t2 ON
t2.end < qmin.start AND
t2.end > DATE_DIFF( qmin.start, 1 DAY )
WHERE qmin.start <= '2011-12-18' AND t2.start IS NULL
ORDER BY end DESC LIMIT 1
This should work - left joins selects one date which can be in sequence, so max can be fineded out if you take the nearest record without sequential record ( t2.anyfield is null ) , same thing we do with minimal date.
If you can calculate days between in script - do it using unions ( eg 1. row - minimal, 2. row maximal )
Check this,
SELECT DATEDIFF((SELECT MAX(`start`) FROM testtbl WHERE `user_id`=1),
(select a.`start` from testtbl as a
left outer join testtbl as b on a.user_id = b.user_id
AND a.`start` = b.`start` + INTERVAL 1 DAY
where a.user_id=1 AND b.`start` is null
ORDER BY a.`start` desc LIMIT 1))
DATEDIFF() show difference of the Two days, if you want to number of consecutive days add one for that result.
If it's not a beauty contents then you may try something like:
select t.start, t2.start, datediff(t2.start, t.start) + 1 as consecutive_days
from tab t
join tab t2 on t2.start = (select min(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t4 where t4.start <= t.start and t4.start >= (select max(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t3 where t3.start <= t.start and t3.gap = 1))
where t.start = '2011-12-18'
Result should be:
start start consecutive_days
2011-12-18 2011-12-16 3