Cumulative SQL to work out no of payers - sql-server-2008

Currently trying to create a query that shows how many accounts have paid month on month but on a cumulative basis (penetration). So as an example I have a table with Month paid and account number, which shows what month that account paid.
Month | AccountNo
Jan-14 | 123456
Feb-14 | 321654
So using the above the result set would show
Month | Payers
Jan-14 | 1
Feb-14 | 2
being because one account paid in Jan, then one in Feb meaning that there have been by the end of Feb 2 payments overall, but only one in Jan. Tried a few inner joins back onto the table itself with a t1.Month >= t2.Month as i would for a normal cumulative query but the result is always out.
Any questions please ask, unsure if the above will be clear to anyone but me.

If you have date in the table then you can try the following query.
SELECT [Month]
,(SELECT COUNT(AccountNo)
FROM theTable i
-- This is to make sure to add until the last day of the current month.
WHERE i.[Date] <= DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,o.[Date])+1,0)) AS CumulativeCount
FROM theTable o

Ok, several things. You need to have an actual date field, as you can't order by the month column you have.
You need to consider there may be gaps in the months - i.e. some months where there is no payment (not sure if that is true or not)
I'd recommend a recursive common table expression to do the actual aggregation
Heres how it works out:
-- setup
DECLARE #t TABLE ([Month] NCHAR(6), AccountNo INT)
INSERT #t ( [Month], AccountNo )
VALUES ( 'Jan-14',123456),('Feb-14',456789),('Apr-14',567890)
-- assume no payments in march
; WITH
t2 AS -- get a date column we can sort on
(
SELECT [Month],
CONVERT(DATETIME, '01 ' + REPLACE([Month], '-',' '), 6) AS MonthStart,
AccountNo
FROM #t
),
t3 AS -- group by to get the number of payments in each month
(
SELECT [Month], MonthStart, COUNT(1) AS PaymentCount FROM t2
GROUP BY t2.[Month], t2.MonthStart
),
t4 AS -- get a row number column to order by (accounting for gaps)
(
SELECT [Month], MonthStart, PaymentCount,
ROW_NUMBER() OVER (ORDER BY MonthStart) AS rn FROM t3
),
t5 AS -- recursive common table expression to aggregate subsequent rows
(
SELECT [Month], MonthStart, PaymentCount AS CumulativePaymentCount, rn
FROM t4 WHERE rn = 1
UNION ALL
SELECT t4.[Month], t4.MonthStart,
t4.PaymentCount + t5.CumulativePaymentCount AS CumulativePaymentCount, t4.rn
FROM t5 JOIN t4 ON t5.rn + 1 = t4.rn
)
SELECT [Month], CumulativePaymentCount FROM t5 -- select desired results
and the results...
Month CumulativePaymentCount
Jan-14 1
Feb-14 2
Apr-14 3

If your month column is date type then its easy to work on else you need some additional conversion for it. Here the query goes...
create table example (
MONTHS datetime,
AccountNo INT
)
GO
insert into example values ('01/Jan/2009',300345)
insert into example values ('01/Feb/2009',300346)
insert into example values ('01/Feb/2009',300347)
insert into example values ('01/Mar/2009',300348)
insert into example values ('01/Feb/2009',300349)
insert into example values ('01/Mar/2009',300350)
SELECT distinct datepart (m,months),
(SELECT count(accountno)
FROM example b
WHERE datepart (m,b.MONTHS) <= datepart (m,a.MONTHS)) AS Total FROM example a

Related

Finding the month and year with Max sales in SQL

I have a table with the following columns -
ID, Year, Month, Sales
The data is in long format. So, for 10 unique IDs and 5 years of data I will have 10(IDs) * 5(Years) * 12(Months) = 600 rows
I want to extract the following information -
Find out the Year and Month in which the sales was maximum for each ID
Find out the Year in which there was maximum sales for each ID
What should be the query in SQL. I use MySQL 5.6
Since you are using mysql 5.6 window function will not work. So you can use subquery in where clause to get your desired result:
create table yourtable (ID int, Year int, Month int, Sales int);
insert into yourtable values(1,2020,1,119);
insert into yourtable values(2,2020,1,105);
insert into yourtable values(1,2020,2,110);
insert into yourtable values(1,2021,1,120);
Query#1
select id, year, month, sales from yourtable a
where sales= (select max(sales) from yourtable b where a.id=b.id)
Output:
id
year
month
sales
2
2020
1
105
1
2021
1
120
Query#2:
select id,year,sum(sales)from yourtable a
group by id,year
having sum(sales)=(
select sum(sales) from yourtable b where a.id=b.id
group by id,year
order by sum(sales)
desc limit 1
)
Output:
id
year
sum(sales)
1
2020
229
2
2020
105
db<fiddle here

SQL Find the average of 3 day closest

I have an SQL structure like this:
Create Table Transactions (
Id integer primary key not null auto_increment,
ResourceId varchar(255),
Price Integer,
TransactionTime date
);
I would like to get the time (TransactionTime) along with the average of 3 days price. For example, the 3 day average of the 22nd will be the average of the 20th, 21st, and 22nd.
Thanks so much.
Presumably, you want this information on each row and for a given resource. If so:
select t.*,
avg(price) over (partition by resourceid
order by transactiontime
range between interval 2 day preceding and current row
) as avg_3
from transactions t;
For SQL server:
SELECT AVG(Price), MAX(TransactionTime) FROM Transactions GROUP BY FLOOR(DATEDIFF(DAY, GETDATE(), TransactionTime) / 3);
You can use nested select:
select t.TransactionTime,
(select sum(t1.Price) / 3
from Transactions t1 where t1.Data in (t.Data, t.Date-2);) as avg3;
from Transactions t;

Aggregating data to get a running total month on month

I have a table which holds the below data
This issue im having is that i need a running total for each month, I've managed to create this is an excel sheet pretty easily but when i try anything in SQL the data result varies.
The image below shows the sum of each paid amount by month, then a total of each one added onto it. I've edited excel to show the formula and the result of the formula. Also have the result i get from SQL 2008 when using (example only)
***UPDATE - The result set im trying to achieve that is in the excel document is for example month 117 + Month 118 gives Month118 TotalToDate, then month 118 + 119 gives Months 119 Total to Date.
Not sure how else to explain this?
( select sum(paid) from #tmp005 t2 where t2.[monthid] <=
t5.[monthid] ) as paid
Really feel that this is less complicated than what I think!
As I understand this you are trying to get a running total month by month, the below CTE should do what you want.
--create table #temp (M_ID Int, Paid Float)
--Insert Into #temp VALUES (116, '50.00'), (117, '50.00'),(117, '5.00'),(117, '20.00'),(117, '10.00'),(117, '75.40'),(118, '125.00'),(118, '200.00'),(118, '5.00')
;WITH y AS
(
SELECT M_ID, paid, rn = ROW_NUMBER() OVER (ORDER BY M_ID)
FROM #temp
), x AS
(
SELECT M_ID, rn, paid, rt = paid
FROM y
WHERE rn = 1
UNION ALL
SELECT y.M_ID, y.rn, y.paid, x.rt + y.paid
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT M_ID, MAX(rt) as RunningTotal
FROM x
Group By M_ID
OPTION (MAXRECURSION 10000);
It is based on the first 3 M_ID of your sample data, just change around the #temp to your specific table, I didn't know whether you had another unique identifier in the table which is why I had to use the ROW_NUMBER()but this should order it correctly based on the M_ID field.
I guess that you are storing the month in a separated table and using M_ID to reference it. So, to get the sum of each month do this:
SELECT [M_ID]
,sum([Paid])
FROM #tmp005
GROUP BY [M_ID]
I think I'd use a correlated sub query:-
select r.m_id,
(
select sum(csq.paid)
from #tmp005 csq
where csq.m_id<=r.m_id
)
from (
select distinct m_id
from #tmp005
) r
Hopefully you can figure out how to apply it to your circumstance/schema.

T-SQL: Find the last occurence of a aggregate state before today

I have a table with inventory transactions. A simplified example:
--Inventory Transactions
Date Sold Purchased Balance(not in table)
Today 1 -5
Yesterday 6 -4
5 days ago 5 +2
10 days ago 103 -3
20 days ago 100 +100
Requirements indicate that a report should contain the day since an article had negative balance (stockout). In the example above it would mean yesterday as the answer.
I'm trying to translate this into SQL but I'm having some trouble. I have tried using a CTE:
with Stockouts as (
select getdate() as [Date],
(calculation) as Balance
from [Inventory Transactions]
--some constraints to get the correct article are omitted
union all
select dateadd(dd, -1, Stockouts.[Date]) as [Date],
Stockouts.Balance - (calculation) as Balance
from [Inventory Transactions]
inner join Stockouts
)
But there is the problem that I cannot use a subquery in the recursive part (to find the last transaction before the current one) and an inner join will stop looping when there is no transaction on a certain date (so the dateadd part will fail as well).
What would be the best approach to solve this issue?
I think the best approach is to use OUTER APPLY like so:
DECLARE #InventoryTransactions TABLE ([Date] DATE, Sold INT, Purchased INT)
INSERT #InventoryTransactions VALUES
('20120504', 1, 0),
('20120503', 6, 0),
('20120501', 0, 5),
('20120425', 103, 0),
('20120415', 0, 100)
SELECT trans.Date,
trans.Sold,
trans.Purchased,
ISNULL(Balance, 0) [BalanceIn],
ISNULL(Balance, 0) + (Purchased - Sold) [BalanceOut]
FROM #InventoryTransactions trans
OUTER APPLY
( SELECT SUM(Purchased - Sold) [Balance]
FROM #InventoryTransactions bal
WHERE Bal.Date < trans.Date
) bal
Your approach is not well suited to recursion. If you require all dates then it would be best to create a date table and LEFT JOIN the results from the above to the table containing all dates. It is probably best to have a permanent table of dates (something like dbo.Calendar) as they are usable in a number of situations, but you can always create a temp table using either Loops, a CTE, or system views to manipulate it. The question on how to generate a list of incrementing dates has been answered before
EDIT
Just re-read your requirements and I think this is a better approach to get what you actually want (uses the same sample data).
;WITH Transactions AS
( SELECT trans.Date,
trans.Sold,
trans.Purchased,
ISNULL(Balance, 0) [BalanceIn],
ISNULL(Balance, 0) + (Purchased - Sold) [BalanceOut]
FROM #InventoryTransactions trans
OUTER APPLY
( SELECT SUM(Purchased - Sold) [Balance]
FROM #InventoryTransactions bal
WHERE Bal.Date < trans.Date
) bal
)
SELECT DATEDIFF(DAY, MAX(Date), CURRENT_TIMESTAMP) [Days Since Negative Balance]
FROM Transactions
WHERE BalanceIn > 0
EDIT 2
I Have create an SQL Fiddle to demonstrate the difference in query plans between OUTER APPLY and Recursion. You can see that the CTE is masses more work, and when running the same data on my local machine it tells me that when running the two in the same batch the outer apply method has a relative batch cost of 17% less than a quarter of the 83% taken up by the Recursive CTE method.
If you want to do it in a recursive cte. This could be a suggestion:
Test data:
DECLARE #T TABLE(Date DATETIME,Sold INT, Purchased INT)
INSERT INTO #T
VALUES
(GETDATE(),1,NULL),
(GETDATE()-1,6,NULL),
(GETDATE()-5,NULL,5),
(GETDATE()-10,103,NULL),
(GETDATE()-20,NULL,100)
Query
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY Date ASC) AS RowNbr, t.* FROM #T AS T
)
, CTE2
AS
(
SELECT
CTE.RowNbr,
CTE.Date,
CTE.Sold,
CTE.Purchased,
(ISNULL(CTE.Purchased,0)-ISNULL(CTE.Sold,0)) AS Balance
FROM
CTE
WHERE
CTE.RowNbr=1
UNION ALL
SELECT
CTE.RowNbr,
CTE.Date,
CTE.Sold,
CTE.Purchased,
CTE2.Balance+ISNULL(CTE.Purchased,0)-ISNULL(CTE.Sold,0) AS Balance
FROM
CTE
JOIN CTE2
ON CTE.RowNbr=CTE2.RowNbr+1
)
SELECT * FROM CTE2 ORDER BY CTE2.RowNbr DESC
Output
5 2012-05-04 11:49:45.497 1 NULL -5
4 2012-05-03 11:49:45.497 6 NULL -4
3 2012-04-29 11:49:45.497 NULL 5 2
2 2012-04-24 11:49:45.497 103 NULL -3
1 2012-04-14 11:49:45.497 NULL 100 100

Running total over date range - fill in the missing dates

I have the following table.
DATE | AMT
10/10 | 300
12/10 | 300
01/11 | 200
03/11 | 100
How do I get the monthly total? A result like -
DATE | TOT
1010 | 300
1110 | 300
1210 | 600
0111 | 800
0211 | 800
0311 | 900
A sql statement like
SELECT SUM(AMT) FROM TABLE1 WHERE DATE BETWEEN '1010' AND '0111'
would result in the 800 for 0111 but...
NOTE There is not a date restriction. which is my dilemma. How do I populate this column without doing a loop for all dates and have the missing months displayed as well?
To cater for missing months, create a template table to join against.
Think of it as caching. Rather than looping through and filling gaps, just have a calendar cached in your database.
You can even combine multiple calendars (start of month, start of week, bank holidays, working day, etc) all into one table, with a bunch of search flags and indexes.
You end up with something like...
SELECT
calendar.date,
SUM(data.amt)
FROM
calendar
LEFT JOIN
data
ON data.date >= calendar.date
AND data.date < calendar.date + INTERVAL 1 MONTH
WHERE
calendar.date >= '20110101'
AND calendar.date < '20120101'
GROUP BY
calendar.date
EDIT
I just noticed that the OP wants a running total.
This -is- possible in SQL but it is extremely inefficient. The reason being that the result from one month isn't used to calculate the following month. Instead the whole running-total has to be calculated again.
For this reason It is normally strongly recommended that you calculate the monthly total as above, then use your application to itterate through and make the running total values.
If you really must do it in SQL, it would be something like...
SELECT
calendar.date,
SUM(data.amt)
FROM
calendar
LEFT JOIN
data
ON data.date >= #yourFirstDate
AND data.date < calendar.date + INTERVAL 1 MONTH
WHERE
calendar.date >= #yourFirstDate
AND calendar.date < #yourLastDate
GROUP BY
calendar.date
the main problem is the and have the missing months displayed as well?
I don't see how to do it with out an aux table containing the combination of month\year to be displayed:
create table table1(
date datetime,
amt int
)
insert into table1 values ('10/10/2010',100)
insert into table1 values ('12/12/2010',200)
insert into table1 values ('01/01/2011',50)
insert into table1 values ('03/03/2011',500)
truncate table #dates
create table #dates(
_month int,
_year int
)
insert into #dates values(10,2010)
insert into #dates values(11,2010) --missing month
insert into #dates values(12,2010)
insert into #dates values(01,2011)
insert into #dates values(02,2011)--missing month
insert into #dates values(03,2011)
select D._month, D._year, sum(amt)
from #dates D left join TABLE1 T on D._month=month(T.date) and D._year=year(T.date)
group by D._month, D._year
You can also generate a range on the fly, pass its value as the interval to DATE_ADD, and basically project a sequence of month values.
As #Dems said, you need to have a correlated subquery calculate the running total, which will be very inefficient, because it will run a nested loop internally.
To see how to generate the sequence, check my post here:
How to generate a range of numbers in Mysql
The end query should look something like this: (Incidentally, you should have a date column, not this varchar mess).
/*NOTE: This assumes a derived table (inline view) containing the sequence of date values and their corresponding TOT value*/
SELECT
DATEVALUES.DateValue,
(
SELECT SUM(TABLE1.AMT) FROM TABLE1 WHERE TABLE1.DateValue <= DATEVALUES.DateValue)
) AS RunningSubTotal
FROM
DATEVALUES
Or something like that.
select sum(AMT) from TABLE1 group by Date