User Churn - Final outer statement in a cte - mysql

I have a table below as
timestamp | user_id | activity
2021-02-01 03:21:11 mike12 read
2021-02-02 03:45:22 bob55 like
2021-02-03 04:21:33 sarah22 post
2021-02-01 04:11:33 cindy11 sign-in
I want to calculate # users churned in last 7 days as =
number of all users - active users (where active are those who like, read, comment, or post
with active_users as
(
select count(distinct user_id)
from table
where activity IN ('comment','post','read','like')
and date_diff(timestamp, current_date()) <= 7
)
, inactive_users as
(select count(distinct user_id)
from table
where activity IN ('sign-in')
and date_diff(timestamp, current_date()) <= 7)
What would be the correct way to subtract the two above? I am unsure of how to join the two ctes in the final query, thanks for helping!

Related

How to add or substract minutes from a timediff result in mysql

I'm trying to make a query that will show all worked hours, days and persons.
I have that runnning.
Asume i have a table called uren:
rec_id | user_id | start (datetime) | eind (datetime)
and i have table called users
user_id | name |
With the query below i nearly have all the info i want.
select users.name, sec_to_time(SUM(TIME_TO_SEC(TIMEDIFF(uren.eind, uren.start)))),count(distinct(date(start))) as dagen
from uren, users
where date(uren.start) between CAST('2017-10-04 00:00:00' as Date) and CAST('2017-11-04 00:00:00' as DATE) and
uren.user_id = users.user_id
group by uren.user_id
ORDER BY name
Which shows me this
Piet (name) 230 (hours total) 24(days worked)
Now comes the real question:
I want to subtract 30 minutes for each day worked less then 5 hours.
Im clueless atm.
Can someonme please help
Assuming one row per user per day:
select users.name,
(sec_to_time(sum(time_to_sec(timediff(uren.eind, uren.start))) -
30 * 60 * sum(time_to_sec(timediff(uren.eind, uren.start)) < 5*60*60)
)
),
count(distinct(date(start))) as dagen
from uren join users
on uren.user_id = users.user_id
where date(uren.start) between '2017-10-04' and '2017-11-04'
group by uren.user_id
order by name;
If one day has multiple shifts, you need to aggregate by day first:
select u.name,
(sec_to_time(sum(day_secs) -
30 * 60 * sum(day_secs < 5*60*60)
)
),
count(*) as dagen
from (select uren.user_id, uren.name, date(uren.start),
sum(time_to_sec(timediff(uren.eind, uren.start))) as day_secs
from uren join
users
on uren.user_id = users.user_id
where uren.start >= '2017-10-04' and uren.start < '2017-11-05'
group by uren.user_id, date(uren.start)
) u
group by name
order by name
There is a good reason that you do not have a clue. It is because what you are asking is quite complex.
The first thing you need to accept (and I mean despairingly accept) is that your function depends days and users, and therefore, you need group by both, at first, then only by users in the final result. To do this, you will need a subquery that groups by days and users, before the parent can group by users.
Here is what I came up with...
SELECT
users.name,
sec_to_time(SUM(ur.timeWorked)) as tWorked,
SUM(ur.dagen) as Dagen
FROM users
INNER JOIN (
SELECT
user_id,
SUM(TIME_TO_SEC(TIMEDIFF(`eind`, `start`))) - (1800 *
IF(SUM(TIME_TO_SEC(TIMEDIFF(`eind`, `start`))) < 18000,1,0))
as `timeWorked`,
count(distinct(date(`start`))) as `dagen`
FROM uren
WHERE date(`start`)
BETWEEN CAST('2017-10-04 00:00:00' as Date)
AND CAST('2017-11-04 00:00:00' as DATE)
GROUP BY user_id, DAYOFYEAR(`start`)
) as ur ON ur.user_id = users.user_id
GROUP BY ur.user_id
ORDER BY name

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

MySQL query to select distinct rows based on date range overlapping

Let's say we have a table (table1) in which we store 4 values (user_id, name, start_date, end_date)
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
1 1 john 2016-04-02 2016-04-03
2 2 steve 2016-04-06 2016-04-06
3 3 sarah 2016-04-03 2016-04-03
4 1 john 2016-04-12 2016-04-15
I then enter a start_date of 2016-04-03 and end_date of 2016-04-03 to see if any of the users are available to be scheduled for a job. The query that checks for and ignores overlapping dates returns the following:
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
2 2 steve 2016-04-06 2016-04-06
4 1 john 2016-04-12 2016-04-15
The issue I am having is that John is being displayed on the list even though he is already booked for a job for the dates I am searching for. The query returns TRUE for the other entry because the dates don't conflict, but i would like to hide John from the list completely since he will be unavailable.
Is there a way to filter the list and prevent the user info from displaying if the dates entered conflict with another entry for the same user?
An example of the query:
SELECT DISTINCT id, user_id, name, start_date, end_date
FROM table1
WHERE ('{$startDate}' NOT BETWEEN start_date AND end_date
AND '{$endDate}' NOT BETWEEN start_date AND end_date
AND start_date NOT BETWEEN '{$startDate}' AND '{$endDate}'
AND end_date NOT BETWEEN '{$startDate}' AND '{$endDate}');
The "solution" in the question doesn't look right at all.
INSERT INTO table1 VALUES (5,2,'steve', '2016-04-01','2016-04-04')
Now there's a row with Steve having an overlap.
And the query proposed as a SOLUTION in the question will return 'steve'.
Here's a demonstration of building a query to return the users that are "available" during the requested period, because there is no row in table1 for that user that "overlaps" with the requested period.
First problem is getting the users that are not available due to the existence of a row that overlaps the requested period. Assuming that start_date <= end_date for all rows in the table...
A row overlaps the requested period, if the end_date of the row is on or after the start of the requested period, and the start_date of the row is on or before the ed of the requested period.
-- users that are "unavailable" due to row with overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
(If our assumption that start_date <= end_date doesn't hold, we can add that check as a condition in the query)
To get a list of all users, we could query a table that has a distinct list of users. We don't see a table like that in the question, so we can get a list of all users that appear in table1 instead
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
To get the list of all users excluding the users that are unavailable, there are couple of ways we can write that. The simplest is an anti-join pattern:
SELECT a.user_id
FROM ( -- list of all users
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
) a
LEFT
JOIN ( -- users that are unavailable due to overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
) u
ON u.user_id = a.user_id
WHERE u.user_id IS NULL
will this work?
SELECT user_id DISTINCT FROM table1 WHERE (DATEDIFF(_input_,start_date) > 0 AND
DATEDIFF(_input_,end_date) > 0) OR
(DATEDIFF(_input_,start_date) < 0);

SELECT users that appear daily

I have a question that appears easy on the surface but I'm finding challenging, hence the request for help. I have a table with two columns:
table: USERS
USER_ID | LOGGED_IN_DATE
001 | 2015-05-01
002 | 2015-05-01
003 | 2015-05-01
001 | 2015-05-02
...
What I need is a query that will return all of the IDs that were present every day for a given week, say 2015-05-01 through 2015-05-07. Not just anytime during the week, but there must be a record for that user every day. I need the fastest and most concise query possible. Any ideas?
What I tried already:
Sub-queries
Union Queries
self-join
With no success.
Thanks!
Aggregation is probably the easiest way:
select u.user_id
from users u
where u.LOGGED_IN_DATE >= '2015-05-01' and u.LOGGED_IN_DATE < '2015-05-08'
group by u.user_id
having count(distinct date(u.LOGGED_IN_DATE)) = 7;
If the field is really a date with no time, then you don't need the date() function in the having clause.
After thinking about it, instead of trying to do some complicated SQL query, I asked myself what does it mean to be online daily. It means that the number of unique dates in that given time period should equal 7. So this query I think works well:
select sub.user_id, sub.count
FROM (select user_id, count(1) as count from users where logged_in_date >= '2015-05-01' AND logged_in_date < '2015-05-08' group by user_id) sub
where sub.count = 7;
Any thoughts/comments?
UPDATE:
This should handle any number of logins at the day level:
SELECT DISTINCT user_id, count(1) AS total
FROM (SELECT DISTINCT user_id, logged_in_date
FROM users
WHERE logged_in_date >= '2015-05-01'
AND logged_in_date < '2015-05-08'
ORDER BY logged_in_date) sub
GROUP BY user_id
HAVING total = 7;
As well as #Gordon's answer:
SELECT u.user_id
FROM users u
WHERE u.LOGGED_IN_DATE >= '2015-05-01'
AND u.LOGGED_IN_DATE < '2015-05-08'
GROUP BY u.user_id
HAVING COUNT(DISTINCT DATE(u.LOGGED_IN_DATE)) = 7;
I like his better though. Good job.

How to wirte an extensible SQL to find the users who continuously login for n days

If I have a table(Oracle or MySQL), which stores the date user logins.
So how can I write a SQL(or something else) to find the users who have continuously login for n days.
For example:
userID | logindate
1000 2014-01-10
1000 2014-01-11
1000 2014-02-01
1000 2014-02-02
1001 2014-02-01
1001 2014-02-02
1001 2014-02-03
1001 2014-02-04
1001 2014-02-05
1002 2014-02-01
1002 2014-02-03
1002 2014-02-05
.....
We can see that user 1000 has continually logined for two days in 2014, and user 1001 has continually logined for 5 days. and user 1002 never continuously logins.
The SQL should be extensible , which means I can pick every number of n, and modify a little or pass a new parameter, and the results is as expected.
Thank you!
As we don't know what dbms you are using (you named both MySQL and Oracle), here are are two solutions, both doing the same: Order the rows and subtract rownumber days from the login date (so if the 6th record is 2014-02-12 and the 7th is 2014-02-13 they both result in 2014-02-06). So we group by user and that groupday and count the days. Then we group by user to find the longest series.
Here is a solution for a dbms with analytic window functions (e.g. Oracle):
select userid, max(days)
from
(
select userid, groupday, count(*) as days
from
(
select
userid, logindate - row_number() over (partition by userid order by logindate) as groupday
from mytable
)
group by userid, groupday
)
group by userid
--having max(days) >= 3
And here is a MySQL query (untested, because I don't have MySQL available):
select
userid, max(days)
from
(
select
userid, date_add(logindate, interval -row_number day) as groupday, count(*) as days
from
(
select
userid, logindate,
#row_num := #row_num + 1 as row_number
from mytable
cross join (select #row_num := 0) r
order by userid, logindate
)
group by userid, groupday
)
group by userid
-- having max(days) >= 3
I think the following query will give you a very extensible parametrization:
select z.userid, count(*) continuous_login_days
from
(
with max_dates as
( -- Get max date for every user ID
select t.userid, max(t.logindate) max_date
from test t
group by t.userid
),
ranks as
( -- Get ranks for login dates per user
select t.*,
row_number() over
(partition by t.userid order by t.logindate desc) rnk
from test t
)
-- So here, we select continuous days by checking if rank inside group
-- (per user ID) matches login date compared to max date
select r.userid, r.logindate, r.rnk, m.max_date
from ranks r, max_dates m
where m.userid = r.userid
and r.logindate + r.rnk - 1 = m.max_date -- here is the key
) z
-- Then we only group by user ID to get the number of continuous days
group by z.userid
;
Here is the result:
USERID CONTINUOUS_LOGIN_DAYS
1 1000 2
2 1001 5
3 1002 1
So you can just choose by querying field CONTINUOUS_LOGIN_DAYS.
EDIT : If you want to choose from all ranges (not only the last one), my query structure no longer works because it relied on the last range. But here is a workaround:
with w as
( -- Parameter
select 2 nb_cont_days from dual
)
select *
from
(
select t.*,
-- Get number of days around
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days + 1
and t.logindate) m1,
-- Get also number of days more in the past, and in the future
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days
and t.logindate + 1) m2,
w.nb_cont_days
from w, test t
) x
-- If these 2 fields match, then we have what we want
where x.m1 = x.nb_cont_days
and x.m2 = x.nb_cont_days
order by 1, 2
You just have to change the parameter in the WITH clause, so you can even create a function from this query to call it with this parameter.
SELECT userID,count(userID) as numOfDays FROM LOGINTABLE WHERE logindate between '2014-01-01' AND '2014-02-28'
GROUP BY userID
In this case you can check the login days per user, in a specific period