How to calculate the retention rates of my customers? - mysql

I'm currently trying to calculate the retention rates of my users, and I am having difficulties to get what I need. What I want to do is for any given date be able to calculate all the new registered users and see how many come back the following days, weeks, months, etc.
I have therefore different retention metrics which I'm calling D1 (one day retention rate), D7 (one week retention rate) and this up to D365 (so one year retention rate).
Below is a simplified example of my table structure:
loginDate
userId
installDate
01/01/2023
1
01/01/2023
01/01/2023
2
01/01/2023
01/01/2023
3
01/01/2023
02/01/2023
1
01/01/2023
02/01/2023
4
02/01/2023
03/01/2023
4
02/01/2023
08/01/2023
1
01/01/2023
And here is a simplified example of the result that my query should return:
Date
D1
D7
02/01/2023
33%
NULL
03/01/2023
100%
NULL
08/01/2023
NULL
33%
Finally, here is my current query:
SELECT
loginDate AS Date,
sum(CASE WHEN DATEDIFF(loginDate, installDate) = 1 THEN 1 END) / count(*) AS D1,
sum(CASE WHEN DATEDIFF(loginDate, installDate) = 7 THEN 1 END) / count(*) AS D7
FROM logins
GROUP BY loginDate
While I believe the sum part is correct, dividing by count(*) looks wrong. I've been trying to divide by count(distinct userId) but no luck as well. Overall, I'm not too sure on how to take into account each relevant day for my calculations. For instance, if we take the D1 metric, I'm struggling to find out how to divide the users who came back on T+1 by the total users who created their account on T.
Could anyone help please?
EDIT:
Replying to #FanoFN.
Please find below the code I used based on yours:
WITH cte1 AS
(SELECT loginDate,
DATEDIFF(loginDate, installDate) Intervals,
COUNT(loginDate=installDate) userCount
FROM logins
GROUP BY Intervals, loginDate),
cte2 AS (
SELECT loginDate,
SUM(CASE WHEN Intervals=0 THEN userCount ELSE 0 END) totalInstalledUser,
SUM(CASE WHEN Intervals=1 THEN userCount ELSE 0 END) D1,
SUM(CASE WHEN Intervals=7 THEN userCount ELSE 0 END) D7
FROM cte1
GROUP BY loginDate)
SELECT loginDate,
(D1/totalInstalledUser)*100 D1Percentage,
(D7/totalInstalledUser)*100 D7Percentage
FROM cte2
GROUP BY loginDate

You can do it using WITH, COUNT OVER PARTITION As follows :
cte and cte2 is to get number of loggings per day.
cte3 is to get the Amount of users and If the range is D1 or D7.
cte4 to get the percentages.
WITH cte AS (
SELECT *,
count(case when loginDate=installDate then 1 end)
OVER (
PARTITION BY installDate ORDER BY loginDate
) AS Count_Logins
FROM logins
),
cte2 AS (
select cte.*, count(1) OVER (PARTITION BY loginDate, installDate order by loginDate) as today_logs
from cte
),
cte3 AS (
select loginDate, installDate,
MAX(CASE WHEN DATEDIFF(loginDate, installDate) = 1 THEN 1 END) AS D1,
MAX(CASE WHEN DATEDIFF(loginDate, installDate) = 7 THEN 1 END) AS D7,
MAX(today_logs)/MAX(Count_Logins) AS Amount
from cte2
WHERE DATEDIFF(loginDate, installDate) > 0
GROUP BY loginDate, installDate
),
cte4 AS (
select loginDate, D1*Amount*100 AS D1, D7*Amount*100 AS D7
from cte3
)
select *
from cte4;
Demo here

I misunderstood your expected result as the result you're getting with your current query and a couple of crucial information that you missed out in your question is that the loginDate is in fact DATETIME datatype and " sometimes a user would login more than once in a day...... We therefore need to make sure each userId is being counted only once". I've made another attempt here for you to test:
WITH cte1 AS (
SELECT installDate,
SUM(DATE(loginDate)=installDate) InstalledCount
FROM logins
GROUP BY installDate
),
cte2 AS (
SELECT DATE(l.loginDate) LgnDt,
DATEDIFF(l.loginDate, l.installDate) Intervals,
COUNT(DISTINCT userId) TotalLogin,
cte1.InstalledCount
FROM logins l
JOIN cte1
ON l.installDate=cte1.installDate
WHERE DATE(l.loginDate) != l.installDate
GROUP BY LgnDt, Intervals, cte1.InstalledCount)
SELECT LgnDt,
CASE WHEN Intervals=1 THEN (TotalLogin/InstalledCount)*100 ELSE 0 END AS D1,
CASE WHEN Intervals=7 THEN (TotalLogin/InstalledCount)*100 ELSE 0 END AS D7
FROM cte2
With a fiddle
The changes I made are:
Using DATE() function to extract date from loginDate column wherever necessary.
Adding DISTINCT in COUNT(DISTINCT userId) TotalLogin, to make sure that a userId is only counted once a day regardless how many times they login.

Related

How would I build an SQL query to select first time deposits, second time deposits and additional deposits from a transactions table

In this scenario I have two tables users and transactions. I would like to filter all the transactions for a specified time period into 3 categories, first time deposit, second time deposit and additional deposits.
To work out a first time deposit you would check if the user has no transactions before that one using the created_at field, for second time deposit they would have one other transaction before that one and for the rest they should have 2 or more before that one.
The transactions table has 2 fields we care about here:
user (user id)
created_at (time transaction was created)
Here is my attempt but I am having trouble visualising the whole query. Any ideas on how I would do this?
SELECT
COUNT(t.id) as first_time_deposits
FROM
transactions t
WHERE
status = 'approved' AND DATE(t.created_at) BETWEEN (CURDATE() - INTERVAL 0 DAY) AND CURDATE()
GROUP BY user
HAVING NOT EXISTS
(
SELECT
u.id
FROM
transactions u
WHERE
u.created_at < t.created_at
)
I use the date interval here just for filtering transactions between a day, week etc. This query doesn't work because I am trying to reference the date of outer query in the sub query. I am also missing second time deposits and additionald deposits.
Example output I am looking for:
first_time_deposits
second_time_deposits
additional_deposits
15
5
6
All for a selected time period.
Any help would be greatly appreciated.
This is how I'd do that. The solution works fine if, for example, "first" transactions took place at the same time. Same for others
"first_to_last" is a recursive query just to display numbers we need to get transactions for (1 to 3 in your case). This makes the query easy adjustable in case if you suddenly need not first 3 but first 10 transactions
"numbered" - ranks transactions by date
Main query joins first 2 CTEs and replaces numbers with words like "first", "second", and "third". I didn't find other way rather than to hardcode values.
with recursive first_to_last(step) as (
select 1
union all
select step + 1
from first_to_last
where step < 3 -- how many lines to display
),
numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select user_id,
concat(case when f.step = 1 then 'first_deposit: '
when f.step = 2 then 'second_deposit: '
when f.step = 3 then 'third_deposit: '
end,
count(rnk))
from numbered n
join first_to_last f
on n.rnk = f.step
group by user_id, f.step
order by user_id, f.step
dbfiddle
UPD. Answer to the additional question: ". I just want the count of all first, second and any deposit that isn't first or second"
Just remove the "first_to_last" cte
with numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select user_id,
concat(case when n.rnk = 1 then 'first_deposit: '
when n.rnk = 2 then 'second_deposit: '
else 'other_deposits: '
end,
count(rnk))
from numbered n
group by user_id, case when n.rnk = 1 then 'first_deposit: '
when n.rnk = 2 then 'second_deposit: '
else 'other_deposits: '
end
order by user_id, rnk
UPD2. output in 3 columns: first, second and others
with numbered as (
select dense_rank() over(partition by user_id order by created_at) rnk, created_at, user_id
from transactions
)
select
sum(case when n.rnk = 1 then 1 else 0 end) first_deposit,
sum(case when n.rnk = 2 then 1 else 0 end) second_deposit,
sum(case when n.rnk not in (1,2) then 1 else 0 end) other_deposit
from numbered n

Count differently an event if the same event from the same user happened in the past

I have a table listing subscription events. When there is a "NEW" event added to the table, it means either a new subscription from a brand new customer OR the renewal of a monthly subscription from an existing customer.
I want to be able to be able to summarize the data by month and split it depending on whether that is a new customer or just a renewal.
I am looking for a formula that says "if the user_ID is unknown and the event is "NEW", then count +1 in the "new customer" column, otherwise +1 in the "renewal" column
SOURCE TABLE
User_id
Event
Date
2
NEW
26/9/2021
2
NEW
26/8/2021
1
NEW
15/8/2021
DESIRED OUTPUT
Sept 20: 1 renewal; 0 new subscriptions
Aug 20: 2 new subscriptions
You may use a window function MIN to determine the earliest subscription date for each user and compare that to determine whether they are a new user or not. You may then aggregate/sum on this to determine the number of new subscriptions or renewals per year and month.
SELECT
YEAR(`Date`) as `year`,
MONTH(`Date`) as `month`,
SUM(is_new=true) as `new subscriptions`,
SUM(is_new=false) as `renewals`
FROM (
SELECT
*,
`Date`=MIN(`Date`) OVER (PARTITION BY `User_id`) as is_new
FROM
events
WHERE
`Event`='NEW'
) e
GROUP BY
YEAR(`Date`),
MONTH(`Date`)
ORDER BY
YEAR(`Date`),
MONTH(`Date`);
year
month
new subscriptions
renewals
2021
8
2
0
2021
9
0
1
or if you are using a mysql version which does not support window functions you may perform a left join on a subquery that finds the earliest subscription date. Using the same logic, the we can determine and count the number of new and renewed subscriptions.
SELECT
YEAR(`Date`) as `year`,
MONTH(`Date`) as `month`,
SUM(new_sub.min_date IS NOT NULL) as `new subscriptions`,
SUM(new_sub.min_date IS NULL) as `renewals`
FROM
events e
LEFT JOIN (
SELECT
`User_id`,
MIN(`Date`) as min_date
FROM
events
WHERE
`Event`='NEW'
GROUP BY
`User_id`
) as new_sub ON e.`User_id`=new_sub.`User_id` AND
e.`Date`=new_sub.min_date
GROUP BY
YEAR(`Date`),
MONTH(`Date`)
ORDER BY
YEAR(`Date`),
MONTH(`Date`)
year
month
new subscriptions
renewals
2021
8
2
0
2021
9
0
1
View working demo on DB Fiddle
Let me know if this works for you.
You can use ROW_NUMBER() to identify if a row is the first one for each client.
For example you can do:
select
year(date) as y,
month(date) as m,
sum(case when rn = 1 then 1 else 0 end) as new_subscriptions,
sum(case when rn <> 1 then 1 else 0 end) as renewals
from (
select *, row_number() over(partition by user_id order by date) as rn
from t
where event = 'NEW'
) x
group by y, m
order by y, m
Result:
y m new_subscriptions renewals
----- -- ------------------ --------
2021 8 2 0
2021 9 0 1
See running example at DB Fiddle.

A query for getting results separated by a date gap

ID
TIMESTAMP
1
2020-01-01 12:00:00
2
2020-02-01 12:00:00
3
2020-05-01 12:00:00
4
2020-06-01 12:00:00
5
2020-07-01 12:00:00
I am looking for a way to get records in a MySQL database that are within a certain range of each other. In the above example, notice that there is a month between the first two records, then a three month gap, before we see another three records with a month between.
What is a way to group these into two result sets, so I will get Ids 1, 2 and 3, 4, 5 A solution using days would be probably work the best as thats easier to modify.
You can use lag() and then logic to see where a gap is big enough to start a new set of records. A cumulative sum gives you the groups you want:
select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t;
If you want to summarize this with a start and end date:
select min(timestamp), max(timestamp)
from (select t.*,
sum(case when prev_timestamp >= timestamp - interval 1 month then 0 else 1 end) over (order by timestamp) as grouping
from (select t.*,
lag(timestamp) over (order by timestamp) as prev_timestamp
from t
) t
) t
group by grouping;
For example, the following query:
select group_concat(ID)
from (
select w1.ID,w1.TS,w2.ID flag
from work1 w1 left outer join work1 w2
on timestampdiff(month,w2.TS,w1.TS)=1
order by w1.ID
) w
group by
case when flag is null then #str:=ID else #str end
See db fiddle

MySQL Date Range Multi Column Count and Group By Select Statement

I have a query I have been working on for a while but I cannot seem to get it down. The other answers on here work well for counting an amount with a certain date range then grouping by the date to get the count. However I need to have two columns counted and grouped by date.
For example here is the query I have tried to get to work:
(SELECT COUNT(*) arrived, DATE(arrived) date, 'arrived' AS source
FROM products
WHERE arrived BETWEEN '2016-01-01' AND '2016-01-31'
GROUP BY DATE(date)
ORDER BY date ASC)
UNION ALL
(SELECT COUNT(*) released, DATE(released) date, 'released' AS source
FROM products
WHERE released BETWEEN '2016-01-01' AND '2016-01-31'
GROUP BY DATE(date)
ORDER BY date ASC)
However this returns the following:
arrived date source
3 2016-01-12 arrived
2 2016-01-28 arrived
1 2016-01-29 arrived
1 2016-01-05 released
What I am requiring is something like this:
date arrived released
2016-01-05 0 1
2016-01-12 3 0
2016-01-28 2 0
2016-01-29 1 0
Any suggestions? Thank you.
You can apply conditional aggregation to a derived table obtained by a UNION ALL operation for 'arrived' and 'released' dates:
SELECT `date`,
COUNT(CASE WHEN type = 'arrived' THEN 1 END) AS arrived,
COUNT(CASE WHEN type = 'released' THEN 1 END) AS released
FROM (
SELECT arrived AS `date`, 'arrived' as type
FROM products
WHERE arrived BETWEEN '2016-01-01' AND '2016-01-31'
UNION ALL
SELECT released AS `date`, 'released' as type
FROM products
WHERE released BETWEEN '2016-01-01' AND '2016-01-31') AS t
GROUP BY `date`
Demo here

Find first buisness day of next month MySQL without function

We have a date_value column and another Boolean column which indicates whether the day is a business day or not.
We are trying to find the first business day of the next month( example, for September, 2015 I want it to return 2015-10-01)
We have tried a couple different methods involving last_day, intervals and subqueries but can't quite get it to work.
We also don't have the ability to create custom functions, which makes this a little more difficult.
I think you want something like this:
select min(date_value) fwd
from tablename
where isWorkDay = 1 and
extract(year from date_value)=extract(year from curdate()) and
extract(month from date_value)=extract(month from curdate()) + 1
For all months (v0.3) (please note that I can test this now, so it might have some error):
select t1.month_number, min(t2.date_value)
from tablename t1 join
tablename t2 on extract(year from t1.date_value) * 12 + t1.month_number = extract(year from t2.date_value) * 12 + t2.month_number - 1
where t2.isWorkDay = 1
group by t1.month_number
I was able to get it using the below
SELECT
d.year
,d.month_number
,first_business_period as next_month_first_period
,d2.previous_business_day
FROM lk_date d
JOIN (
SELECT
a.*
, MAX(CASE WHEN d2.business_period_in_days<>0 THEN d2.date_value ELSE NULL END) AS previous_business_day
FROM(
SELECT
d1.year
,d1.month_number
, MIN(CASE WHEN d1.business_period_in_days <> 0 THEN d1.date_value END) AS first_business_period
FROM lk_date d1
GROUP BY 1,2
) a
JOIN lk_date d2 ON d2.date_Value < a.first_business_period
GROUP BY 1,2,3) d2 on d2.previous_business_day = d.date_value