MYSQL - Filter consecutive not null dates - mysql

Get only the biggest date:
These are check-in and check-out records of employees, some times they do twice or more entries on the system in a row. In this sample there were two check-out in a row. Assuming these rows always gonna be ordered, in the case of check-out I would like have the biggest date, and in the case of the check-in the smallest date.
In that case I would like to have this:
The smaller date was excluded:
DEMO

Try this, in this big CASE statement I increment column by one, if checkin switches from null to not null and the other way around. Then it's enough to group by this column taking max and min of checkout and checkin respectively:
select #checkinLag := null, #rn := 0;
select max(id),
functionario,
loja,
min(checkin),
max(checkout)
from (
select case when (checkinLag is null and checkin is not null) or
(checkinLag is not null and checkin is null)
then #rn := #rn + 1 else #rn end rn,
checkin,
checkout,
loja,
id,
functionario
from (
select #checkinLag checkinLag,
#checkinLag := checkin,
checkin,
checkout,
loja,
id,
functionario
from dummyTable
order by coalesce(checkin, checkout)
) a
) a group by functionario, loja, rn
I have used subqueries, to guarantee order of evaluating expressions (assigning and using of #checkinLag), as Gordon Linoff pointed.
Demo

My solution:
Select
*
from dummyTable base
where (base.checkout is null or not exists (
select
1
from dummyTable co
where co.checkout between base.checkout and DATE_ADD(base.checkout, INTERVAL 5 SECOND)
and base.id <> co.id
and base.functionario = co.functionario
and base.loja = co.loja
)) and (base.checkin is null or not exists (
select
1
from dummyTable ci
where ci.checkin between DATE_SUB(base.checkin, INTERVAL 5 SECOND) and base.checkin
and base.id <> ci.id
and base.functionario = ci.functionario
and base.loja = ci.loja
));
you can test the query here. There is no need that the rows are orderd. I choose 5 seconds as the interval where check-in/outs should be ignored.

Related

Grouping rows via two different columns in MYSQL

I just want to ask if grouping rows with the same value but came from different columns is possible.
I have a scenario that we should sum up the total minutes if the records are found "continuous" transactions by checking if the STARTDATETIME column matches the previous data of ENDDATETIME column if they are the same. See image link below for reference.
Thanks guys.
I modified Gordon Linoff's solution ( see my comment under the question):
SELECT
c.employee_id
,MIN(c.start_date) AS start_date
,MAX(c.end_date) AS end_date
,COUNT(*) AS numcontracts,
TIMESTAMPDIFF(minute,MIN(c.start_date),MAX(c.end_date)) AS timediff
FROM
(
SELECT
c0.*
,(#rn := #rn + COALESCE(startflag, 0)) AS cumestarts
FROM
(SELECT c1.*,
(NOT EXISTS (SELECT 1
FROM contracts c2
WHERE c1.employee_id = c2.employee_id AND
c1.start_date = c2.end_date
)
) AS startflag
FROM contracts c1
ORDER BY employee_id, start_date
) c0 CROSS JOIN (SELECT #rn := 0) params
) c
GROUP BY c.employee_id, c.cumestarts
http://rextester.com/VOGMU19779
timediff contains the minutes passed in the combined interval.

How to display only the second purchase made per account

I have a transactions table which has shows various transactions made by several accounts. Some make only one, others more than that. At the moment the SQL I have prints out the first purchase of each account but i need it to print out the second made by each account
SELECT account_id
, purchase_date as second_purchase
, amount as second_purchase_amount
FROM Transactions t
WHERE purchase_date NOT IN (SELECT MIN(purchase_date)
FROM Transactions m
)
GROUP BY account_id
HAVING purchase_date = MIN(purchase_date);
What needs to change that the second purchase date and amount are chosen? I tried adding in a count for the account_id but it was giving me the wrong value.
You can use variables to assign row numbers and get the 2nd purchase.
SELECT account_id,purchase_Date,amount
FROM (
SELECT account_id
,purchase_date
,amount
--, #rn:=IF(account_id=#a_id and #pdate <> purchase_date,#rn+1,1) as rnum
,case when account_id=#a_id and #pdate <> purchase_date then #rn:=#rn+1
when account_id=#a_id and #pdate=purchase_date then #rn:=#rn
else #rn:=1 end as rnum
, #pdate:=purchase_date
, #a_id:=account_id
FROM Transactions t
CROSS JOIN (SELECT #rn:=0,#a_id:=-1,#pdate:='') r
ORDER BY account_id, purchase_date
) x
WHERE rnum=2
Explanation of how it works:
#rn:=0,#a_id:=-1,#pdate:='' - Declare 3 variables and initialize them, #rn for assigning the row numbers, #a_id to hold the account_id and #pdate to hold the purchase_date.
For the first row (ordered by account_id and purchase_date), account_id and #a_id, #pdate and purchase_date will be compared. As they wouldn't be equal, the when conditions fail and the else part would assign #rn=1. Also, the variable assignment happens after this. #aid and #pdate would be updated to current row's values. For the second row, if they are the same account and on a different date the first when condition will be executed and the #rn will be incremented by 1. If there are ties the second when condition would be executed and the #rn remains the same. You can run the inner query to check how the variables are assigned.
Number the rows and choose RowNumber = 2
select *
from (
select
#rn := case when #account_id = account_id then #rn + 1 else #rn := 1 end as RowNumber,
#account_id := account_id as account_id,
purchase_date
from
(select #rn := 1) x,
(select #acount_id :=account_id as account_id, purchase_date
from Transactions
order by account_id, purchase_date) y
) z
where RowNumber = 2;

calculate the differences between two rows in SQL

I have a SQL table, one row is the revenue in the specific day, and I want to add a new column in the table, the value is the incremental (could be positive or negative) revenue between a specific day and the previous day, and wondering how to implement by SQL?
Here is an example,
original table,
...
Day1 100
Day2 200
Day3 150
...
new table (add incremental column at the end, and for first column, could assign zero),
Day1 100 0
Day2 200 100
Day3 150 -50
I am using MySQL/MySQL Workbench.
thanks in advance,
Lin
SELECT a.day, a.revenue , a.revenue-COALESCE(b.revenue,0) as previous_day_rev
FROM DailyRevenue a
LEFT JOIN DailyRevenue b on a.day=b.day-1
the query assume that each day has one record in the table. If there could be more than 1 row for each day you need to create a view that sums up all days grouping by day.
If you're okay with re-ordering the columns slightly, something like this is pretty simple to understand:
SET #prev := 0;
SELECT day, revenue - #prev AS diff, #prev := revenue AS revenue
FROM revenue ORDER BY day ASC;
The trick is that we calculate the difference to the previous first, then set the previous to the current and display it as the current in one step.
Note, this depends on the order being correct since the calculations are done during the returning of the rows, so you need to make sure you have an ORDER BY clause that returns the days in the correct order.
Try;
select
t.date_col, t.val_col,
case when t1.val_col is null then 0
else t.val_col - t1.val_col end diff
from (
select t.* , #r := #r + 1 lev
from tbl t,
(select #r := 0) r
order by t.date_col
) t
left join (
select t.* , #r1 := #r1 + 1 lev
from tbl t,
(select #r1 := 1) r
order by t.date_col
) t1
on t.lev = t1.lev
This will calculate value diff even if there is a missing date

Find gaps in mysql Time

I have a table "channel_001" with timestamp column Time, and i did separate it by 10 minutes.
2013-01-01;00:10:04;
2013-01-01;00:20:00;
2013-01-01;00:30:02;
2013-01-01;00:40:04;
But there are missing datas. How can i detect a missing row? And then insert a row there?!
For example:
2013-01-01;00:10:04;
2013-01-01;00:20:00;
2013-01-01;00:30:02
2013-01-01;00:40:04;
2013-01-01;01:00:02;
then it would be missing:
2013-01-01;00:50:00;
I was thinking of using Join the table to itself, but im new in SQL and too much of a novice to finde the answere alone.
Any ideas?
You can find rows that don't have a "next" time with something like:
select c.*
from channel_001 c
where not exists (select 1
from channel_001 c2
where c2.timestamp > c.timestamp + interval 9 minute and
c2.timestamp < c.timestamp + interval 11 minute
);
If your table is large (tens of thousands of rows), you will probably want to use variables. The following code gets the previous timestamp:
select c.*,
(case when (#tmp := #prevts) is null then null
when (#prevts := timestamp) is null then null
else #tmp
end) as prev_timestamp
from channel_001 c cross join
(select #prevts := 0, #tmp := 0) vars
order by timestamp;
You can use this as a subquery to get gaps that are outside your range.

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID