How can we calculate the month on month cumulative retention rate in SQL
Bill_Date Customer_id
2021-01-01 1
2021-01-23 2
2021-01-29 3
2021-02-17 1
2021-02-19 2
2021-03-01 3
Retention Rate= (Total Number of unique Customers in present Month)/(Total Number of Customers in previous Months)
Expected Output
January : 100%
February : 66.7%
March : 25%
February =(Unique customers in feb)/((Unique customers in jan)
March=(Unique customers in march)/((Unique customers in jan)+(Unique
customers in feb)
Consider a window function for cumulative sum of unique customer per year/month:
WITH sub AS (
SELECT YEAR(c.Bill_Date) AS bill_year,
MONTH(c.Bill_Date) AS bill_month,
COUNT(DISTINCT c.customer_id) AS unq_customers
FROM customer_bills c
GROUP BY YEAR(c.Bill_Date),
MONTH(c.Bill_Date)
)
SELECT bill_year,
bill_month,
unq_customers,
IFNULL(
unq_customers /
(SUM(unq_customers) OVER(ORDER BY bill_month) -
unq_customers),
1
) * 100 AS retention_rate
FROM sub
Online Demo
Related
I have requirement where i will need to get the number of days a role an employee was on.
Scenario 1
EmployeeId role effectiveFrom
1 A 1-Jan-2021
1 B 15-Jan-2021
No further roles are available for the month of Jan for role A therefore the number of days for role A would be 14.
Scenario 2
EmployeeId role effectiveFrom
1 A 1-Jan-2021
No further roles are available for the month of Jan therefore the number of days for role A would be 31 i.e the entire month of January. For the month of February i would expect to get 28 as the role would be effective for the entire month of february as well.
Scenario 3
EmployeeId role effectiveFrom
1 A 1-Jan-2021
1 B 15-Jan-2021
1 A 25-Jan-2021
To get the number of days for role A the logic would be
1 to 15th is 14 days.
25th to 31st(31st of Jan) would be 6 days.
14 + 6 = 20 days
The query i have come up with so far is this,
SELECT
DATEDIFF(MAX(effectiveFrom),
IF(MIN(effectiveFrom) = MAX(effectiveFrom),
MIN(effectiveFrom),
MIN(effectiveFrom))) + 1 daysWorked
FROM
EmployeeRoles
WHERE grade = 'A'
GROUP BY `employeeId`,effectiveFrom;
which would only give the result as 1 day for Scenario 1. Could someone guide me on the practical way of handling the scenarios. I have looked at loops, window functions but i am at a loss on the best way to proceed.
dbfiddle
When scenario2 has 31 days from 1-jan, until the end of the month, I would suspect that from 25-jan, until the end of the month, is 7 days, and not 6, as you write in scenario3.
The number of days, using above calculation:
SELECT
employeeID,
grade,
effectiveFrom,
DATEDIFF(COALESCE(LEAD(effectiveFrom)
OVER (PARTITION BY employeeID, grade ORDER By effectiveFrom),
DATE_ADD(LAST_DAY(effectiveFrom),INTERVAL 1 DAY)),
effectiveFrom) as '#Days'
FROM EmployeeRole;
This can be grouped, and summed giving:
SELECT
employeeID,
grade,
SUM(`#Days`)
FROM (
SELECT
employeeID,
grade,
effectiveFrom,
DATEDIFF(COALESCE(LEAD(effectiveFrom)
OVER (PARTITION BY employeeID, grade ORDER By effectiveFrom),
DATE_ADD(LAST_DAY(effectiveFrom),INTERVAL 1 DAY)),
effectiveFrom) as '#Days'
FROM EmployeeRole
) x
GROUP BY
employeeID,
grade;
output:
employeeID
grade
SUM(#Days)
1
A
14
1
B
17
2
A
31
3
A
21
3
B
10
see: DBFIDDLE
EDIT: The results were incorrect because the next effectiveFrom date was determined using OVER (PARTITION BY employeeID ORDER By effectiveFrom). this is not correct, because the grade should be taken into account too.
I corrected it to OVER (PARTITION BY employeeID, grade ORDER By effectiveFrom)
P.S. I also corrected this in the piece above the EDIT!
see: DBFIDDLE
I have this query from Uploads table:
select
Costumer as Customer,
max(Week) as 'Max Week',
count(distinct(POS)) as 'Total POS'
from Uploads
where year = 2022
group by Costumer;
and returns this:
Customer
Max Week
Total POS
Customer A
3
65
Customer B
5
27
Customer C
3
33
This table has an additional column named Inventory and I want to know the SUM(Inventory) but with the weeks filtered before.
For example:
Customer
Max Week
Total POS
Inventory
Customer A
3
65
456
Customer B
5
27
123
Customer C
3
33
2345
You can solve this issue by using Row_Number like this
SELECT t.[Total POS],
t.customer,
t.week MaxWeek,
t.SumInventoryPerWeek SumForMaxWeek
FROM (
select
Costumer as Customer,
Week as week,
count(distinct(POS)) as 'Total POS',
SUM(Inventory) SumInventoryPerWeek,
ROW_NUMBER() OVER(PARTITION BY Costumer ORDER BY Week DESC) rw
from Uploads
where year = 2022
group by Costumer,Week
) t
WHERE t.rw=1
I have an orders table
Order_id User_id Order_date
1 32 2020-07-19
2 24 2020-07-21
3 27 2020-07-27
4 24 2020-08-14
5 32 2020-08-18
6 32 2020-08-19
7 58 2020-08-20
Now I want to find how many of the users ordered in 1st month also ordered in the next month. In this case, user_id's 32,24,27 ordered in 7th month but only 24 and 32 ordered in the next month.
I want the result to be like :
Date Retained_Users Total_users
2020-07 Null 3
2020-08 2 3
I'm lost here. Can someone please help me with this?
In MySQL 8.0, you can do this with window functions:
select
order_month,
count(distinct case when cnt_orders_last_month > 0 then user_id end) retained_users,
count(distinct user_id) total_users
from (
select
user_id,
date_format(order_date, '%Y-%m-01') as order_month,
count(*) over(
partition by user_id
order by date(date_format(order_date, '%Y-%m-01'))
range between interval 1 month preceding and interval 1 day preceding
) cnt_orders_last_month
from mytable
) t
group by order_month
The logic lies in the range specification of the window function; it orders record by month, and counts how many orders the customer placed last month. Then all that is left to do is aggregate and count distinct users.
Demo on DB Fiddle
Changing the question because of a misunderstanding in use case.
Amazon Redshift Query for the following problem statement.
The data structure:
id - primary key
acc_id - id unique to a loan account (this id will be same for all
emi's for a particular loan account, this maybe repeated 6 times or
12 times based on loan tenure which can be 6 months or 12 months
respectively)
status - PAID or UNPAID (the emi's unpaid are followed my unpaid
emi's only)
s_id - just a scheduling id which would be consecutive numbers for a
a particular loan id
due_date - the due date for that particular emi
principal - amount that is due
The table:
id acc_id status s_id due_date principal
9999957 10003 PAID 102 2018-07-02 12:00:00 4205
9999958 10003 UNPAID 103 2018-08-02 12:00:00 4100
9999959 10003 UNPAID 104 2018-09-02 12:00:00 4266
9999960 10003 UNPAID 105 2018-10-02 12:00:00 4286
9999962 10004 PAID 106 2018-07-02 12:00:00 3200
9999963 10004 PAID 107 2018-08-02 12:00:00 3100
9999964 10004 UNPAID 108 2018-09-02 12:00:00 3266
9999965 10004 UNPAID 109 2018-10-02 12:00:00 3286
The use case -
The unpaid amount becomes delinquent (overdue) after the due_date.
So I need to calculate delinquent amount at the end of every month from the first due_date in this case is 2nd July to last due_date (assume it to be 2nd November which is the current month)
I also need to calculate days past due at the end of that month.
Illustration from the above data:
From the sample data provided, no EMI is due at the end of July so amount delinquent is 0
But at the end of August - the id 9999958 is due - as of 31st August
the amount delinquent is 4100 and days past due is 29 (31st August minus 2nd August)
The catch: I need to calculate this for the loan (acc_id) and not the emi.
To further explain, A first EMI will be 29 days due on 1st month and 59 days due on second month, also second EMI will be 29 days due on second month. But I need this at loan level (acc_id).
The same example continued for 30th september, the acc_id 10003 is due since 2nd August so as of 30th September the due amount is 8366 (4100 + 4266) and DPD (days_past_due) is 59 (29 + 30).
Also acc_id 10004 is due 3100 and DPD is 28 (30th september - 2nd september).
The final output would be something like this:
Month_End DPD_Band Amount
2018/08/31 0-29 4100
2018/08/31 30-59 0
2018/08/31 60-89 0
2018/08/31 90+ 0
2018/09/30 0-29 3100
2018/09/30 30-59 8366
2018/09/30 60-89 0
2018/09/30 90+ 0
Query attempt: DPD bands can be created based on case statements on delinquent days. I need real help in first creating End-of-months and then finding the portfolio level amounts as explained above for different delinquent days.
Edited to be RedShift compatible after the op clarified which RDBMS. (MySQL would need a different answer)
The following creates one record for each month between your first record, and the end of last month.
It then joins on to your unpaid records, and the aggregation chooses which bracket to put the results in to.
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
SUM(CASE WHEN days_past_due >= 00 AND days_past_due < 30 THEN principal ELSE 0 END) AS dpd_00_29,
SUM(CASE WHEN days_past_due >= 30 AND days_past_due < 60 THEN principal ELSE 0 END) AS dpd_30_59,
SUM(CASE WHEN days_past_due >= 60 AND days_past_due < 90 THEN principal ELSE 0 END) AS dpd_60_89,
SUM(CASE WHEN days_past_due >= 90 THEN principal ELSE 0 END) AS dpd_90plus
FROM
monthly_delinquents
GROUP BY
month_end_date
ORDER BY
month_end_date
That said, normally the idea of pivoting things like this is a bad idea. What happens when something is a year past due? It just sits in the 90plus category and never moves. And, if you want to expand it you need to change the query and any other query you ever write that depends on it.
Instead, you could normalise your output...
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
(days_past_due / 30) * 30 AS days_past_due_band,
SUM(principal) AS total_principal,
COUNT(*) AS total_rows
FROM
monthly_delinquents
GROUP BY
month_end_date,
(days_past_due / 30) * 30
ORDER BY
month_end_date,
(days_past_due / 30) * 30
I need to calculate the rank in mysql. Suppose I have list of sum of my product sales values of entire month then i need to rank the product from highest sales value in order to rank like 1 ,2 ,3 etc
Month Product Sum of Sales
Jan Latop 450000
jan Latop 150000
Jan Latop 250000
Feb Desktop 200000
Feb Desktop 150000
Feb Desktop 180000
so from above data output will be like
Month Product Sum of Sales rank
Jan Latop 450000 1
Jan Latop 250000 2
jan Latop 150000 3
Feb Desktop 200000 1
Feb Desktop 180000 2
Feb Desktop 150000 3
You can do something like this:
SELECT month,product,sumOfSales, #curRank := #curRank + 1 AS rank
FROM products p, (
SELECT #curRank := 0
) q
ORDER BY sumOfSales DESC;
I am assuming the table name is product and column name is sumOfSales.
You Can use this Query
SELECT sales FROM TABLE Order by sales DESC
sales is your column name where sum of sale is stored
Query Will Return Record with most sales first and so on.