SQL Query issue with joining data don't exist in table [duplicate] - mysql

I've got a SQL Server CE 3.5 table (Transactions) with the following Schema:
ID
Transaction_Date
Category
Description
Amount
Query:
SELECT Transaction_Date, SUM(Amount)
FROM Transactions
GROUP BY Transaction_Date;
I'm trying to do a SUM(Amount) and group by transaction_date just so I can get the total amount for each day but I want to get back values even for days there were no transactions so basically the record for a day with no transactions would just have $0.00 for amount.
Thanks for the help!

You need a Calendar table to select over the dates. Alternatively, if you have a Numbers table, you could turn that effectively into a Calendar table. Basically, it's just a table with every date in it. It's easy enough to build and generate the data for it and it comes in very handy for these situations. Then you would simply use:
SELECT
C.calendar_date,
SUM(T.amount)
FROM
Calendar C
LEFT OUTER JOIN Transactions T ON
T.transaction_date = C.calendar_date
GROUP BY
C.calendar_date
ORDER BY
C.calendar_date
A few things to keep in mind:
If you're sending this to a front-end or reporting engine then you should just send the dates that you have (your original query) and have the front end fill in the $0.00 days itself if that's possible.
Also, I've assumed here that the date is an exact date value with no time component (hence the "=" in the join). Your calendar table could include a "start_time" and "end_time" so that you can use BETWEEN for working with dates that include a time portion. That saves you from having to strip off time portions and potentially ruining index usage. You could also just calculate the start and end points of the day when you use it, but since it's a prefilled work table it's easier IMO to include a start_time and end_time.

You'll need to upper and lower bound your statement somehow, but perhaps this will help.
DECLARE #Start smalldatetime, #End smalldatetime
SELECT #Start = 'Jan 1 2010', #End = 'Jan 18 2010';
--- make a CTE of range of dates we're interested in
WITH Cal AS (
SELECT CalDate = convert(datetime, #Start)
UNION ALL
SELECT CalDate = dateadd(d,1,convert(datetime, CalDate)) FROM Cal WHERE CalDate < #End
)
SELECT CalDate AS TransactionDate, ISNULL(SUM(Amount),0) AS TransactionAmount
FROM Cal AS C
LEFT JOIN Transactions AS T On C.CalDate = T.Transaction_Date
GROUP BY CalDate ;

Once you have a Calendar table (more on that later) you can then do an inner join on the range of your data to fill in missing dates:
SELECT CalendarDate, NULLIF(SUM(t.Amount),0)
FROM (SELECT CalendardDate FROM Calendar
WHERE CalendarDate>= (SELECT MIN(TransactionDate) FROM Transactions) AND
CalendarDate<= (SELECT MAX(TransactionDate) FROM Transactions)) c
LEFT JOIN
Transactions t ON t.TransactionDate=c.CalendarDate
GROUP BY CalendarDate
To create a calendar table, you can use a CTE:
WITH CalendarTable
AS
(
SELECT CAST('20090601' as datetime) AS [date]
UNION ALL
SELECT DATEADD(dd, 1, [date])
FROM CTE_DatesTable
WHERE DATEADD(dd, 1, [date]) <= '20090630' /* last date */
)
SELECT [date] FROM CTE_DatesTable
OPTION (MAXRECURSION 0);
Combining the two, we have
WITH CalendarTable
AS
(
SELECT MIN(TransactionDate) FROM Transactions AS [date]
UNION ALL
SELECT DATEADD(dd, 1, [date])
FROM CTE_DatesTable
WHERE DATEADD(dd, 1, [date]) <= (SELECT MAX(TransactionDate) FROM Transactions)
)
SELECT c.[date], NULLIF(SUM(t.Amount),0)
FROM Calendar c
LEFT JOIN
Transactions t ON t.TransactionDate=c.[date]
GROUP BY c.[date]

Not sure if any this works with CE
With common table expressions
DECLARE #StartDate DATETIME
DECLARE #EndDate DATETIME
SET #StartDate = '2010-07-10'
SET #EndDate = '2010-07-20'
;WITH Dates AS (
SELECT #StartDate AS DateValue
UNION ALL
SELECT DateValue + 1
FROM Dates
WHERE DateValue + 1 <= #EndDate
)
SELECT Dates.DateValue, ISNULL(SUM(Transactions.Amount), 0)
FROM Dates
LEFT JOIN Transactions ON
Dates.DateValue = Transactions.Transaction_Date
GROUP BY Dates.DateValue;
With loop + temporary table
DECLARE #StartDate DATETIME
DECLARE #EndDate DATETIME
SET #StartDate = '2010-07-10'
SET #EndDate = '2010-07-20'
SELECT #StartDate AS DateValue INTO #Dates
WHILE #StartDate <= #EndDate
BEGIN
SET #StartDate = #StartDate + 1
INSERT INTO #Dates VALUES (#StartDate)
END
SELECT Dates.DateValue, ISNULL(SUM(Transactions.Amount), 0)
FROM #Dates AS Dates
LEFT JOIN Transactions ON
Dates.DateValue = Transactions.Transaction_Date
GROUP BY Dates.DateValue;
DROP TABLE #Dates

If you want dates that don't have transactions to appear
you can add a DUMMY transaction for each day with the amount of zero
it won't interfere with SUM and would so what you want

Related

Expand rows to contain monthly rows between 2 dates

I have a table 'positions' with the following variable:
- id
- role
- startdate
- enddate
I need to generate monthly series using data of anothr table. Is there a way to expand each of the rows of the table 'positions' so that each role is expanded into n-rows of months between start date and end date.
For example, Row 1 containing:
(0001, 'Salesperson', '2020-01', '2020-05')
I need to expand into something like:
(0001, 'Salesperson', '2020-01')
(0001, 'Salesperson', '2020-02')
(0001, 'Salesperson', '2020-03')
(0001, 'Salesperson', '2020-04')
(0001, 'Salesperson', '2020-05')
Thanks!
I tried iterating through the table but haven't been able to get the reuslt
Your question is tagged with 2 Databases. I am using SQL Server to answer. You can try something like this:
SQL Fiddle DEMO
DECLARE #StartDate DATETIME = '2018-01-01 00:00:00.000'; -- this can be any date below the minimum StartDate
WITH tt AS (
SELECT '0001' AS EmployeeID, '2020-01-01 00:00:00.000' AS StartDate, '2020-05-31 00:00:00.000' AS EndDate
),
MyTable AS
(
SELECT #StartDate AS myDate
UNION ALL
SELECT DATEADD(Month,1,myDate)
FROM MyTable
WHERE DATEADD(Month,1,myDate) <= '2020-05-31 00:00:00.000'
)
SELECT
EmpId.EmployeeID,
CONVERT(VARCHAR(7), a.myDate, 126)
FROM
MyTable a
INNER JOIN
(
SELECT EmployeeID, MIN(StartDate) MinStartDate
FROM tt
GROUP BY EmployeeID
) EmpId ON
a.MyDate >= EmpId.MinStartDate
LEFT JOIN
tt ON
EmpId.EmployeeID = tt.EmployeeID AND
a.myDate >= tt.StartDate AND
a.myDate <= ISNULL(tt.EndDate, GETDATE())
ORDER BY EmpId.EmployeeID DESC, a.MyDate
OPTION (MAXRECURSION 0)
Also I have assumed that your startDate and EndDate are in Date format inside your table.

Calculate churn rate using a query instead of a stored procedure

I am writing queries for some KPIs (Key Performance Indicators) to track user engagement. One such KPI is "Churn Rate", which I am calculating for a given month by:
Churn rate = (Total users deleted in month)/(Total users on the 1st of month)
I am using a users table with the following columns:
created_at, deleted_at
My process is to get all relevant months of user activity (in this case, based on "created_at" column, since we are getting several new users per month. We also have an activity log table which might technically be more accurate to use but doesn't go back as far) and then loop over them in a stored procedure. For each month, I'm calculating who was deleted that month and who was active on the first of that month (created on or before the 1st of the month and either not deleted or deleted after the first of that month). Then I'm dividing them to find churn rate and inserting into a temporary table. Here is my stored procedure:
DROP PROCEDURE ChurnRate;
DELIMITER $$
CREATE PROCEDURE ChurnRate()
BEGIN
DECLARE start_date DATETIME;
DECLARE end_date DATETIME;
DECLARE cur_date DATETIME;
DECLARE current_month VARCHAR(255);
DECLARE end_month VARCHAR(255);
DECLARE deleted_count BIGINT;
DECLARE active_user_count BIGINT;
DECLARE churn_rate FLOAT;
SELECT created_at FROM users ORDER BY created_at ASC LIMIT 1 INTO start_date;
SELECT created_at FROM users ORDER BY created_at DESC LIMIT 1 INTO end_date;
SET cur_date = start_date;
SET current_month = SUBSTR(cur_date,1,7);
SET end_month = SUBSTR(end_date,1,7);
DROP TEMPORARY TABLE IF EXISTS churn_table;
CREATE TEMPORARY TABLE churn_table
(
user_month VARCHAR(255),
deleted_count BIGINT,
active_user_count BIGINT,
churn_rate FLOAT
);
loop_label: LOOP
SELECT COUNT(U.id) FROM users AS U WHERE SUBSTR(U.deleted_at,1,7) = current_month INTO deleted_count;
SELECT COUNT(U.id) FROM users AS U
WHERE (U.deleted_at >= DATE_ADD(DATE_ADD(LAST_DAY(cur_date),INTERVAL 1 DAY),INTERVAL -1 MONTH) OR U.deleted_at IS NULL)
AND SUBSTR(U.created_at,1,7) <= current_month
INTO active_user_count;
INSERT INTO churn_table (user_month, deleted_count, active_user_count, churn_rate) VALUES (current_month, deleted_count, active_user_count, (deleted_count/active_user_count));
SET cur_date = DATE_ADD(cur_date, INTERVAL 1 MONTH);
SET current_month = SUBSTR(cur_date,1,7);
IF current_month <= end_month THEN
ITERATE loop_label;
END IF;
LEAVE loop_label;
END LOOP;
SELECT * FROM churn_table;
END$$
DELIMITER ;
CALL ChurnRate();
Here is a sample of some data that was produced:
user_month
churn_rate_percentage
2019-12
0
2020-01
0.0396982
2020-02
0
2020-03
0
2020-04
0
2020-05
0.112116
2020-06
0.59691
2020-07
0.26689
2020-08
0.144374
2020-09
0.141767
2020-10
0.125
2020-11
0.272904
2020-12
0.14937
My problem is this: I am using an API that requires this to be a select query. I have previously tried writing select queries for this, but they have been flawed. Grouping by "deleted_at" will not work because we will not show months for which no users have been deleted. Grouping by "created_at" and using subqueries ends up being extremely slow, as we have about 50k users. Is there a clean, efficient way to write this as a select query without affecting performance?
If there is not, I will have to write a chron to run this procedure and export the data.
Thank you
You shouldn't use loops in SQL that is often an indication you are doing something wrong.
Here is how to do this in a single query:
-- recursive CTE to create list of months of interest
with RECURSIVE base_months(d,y,m) AS
(
SELECT DateSerial(Year(min(create_at)), Month(min(create_at)), "1"),
min(create_at) , year(min(create_at)) , month(min(create_at))
FROM users
UNION ALL
SELECT data_add(d INTERVAL 1 MONTH) , year(data_add(d INTERVAL 1 MONTH)) , month(data_add(d INTERVAL 1 MONTH))
FROM base_months
WHERE YEAR(d) <= YEAR(CURDATE()) && MONTH(d) <= MONTH(CURDATE())
)
select
b.y as year,
b.m as month,
count(u.created_at) as total_user
sum(case when month(u.deleted_at) = b.m and year(u.deleted_at = b.y) then 1 else 0 end) as left_this_month
from base_months b
-- for each month join to the users table
join user u on u.created_at < b.d and (u.deleted_at > b.d or u.deleted_at is null)
group by b.y, b.m
If this isn't clear, first we use a recursive CTE to get all the months and years of interest -- you could do a non-recursive query on the table with a group by if only want to include create date months that are in the table -- but I think that would give you interesting results since months that don't have anyone created in that month would not be included.
Then I join that back to the users table with filters on the join to only include the rows we want to count for the given year and month. We use group by and aggregation functions to find the results.
Looping is likely to be terribly slow.
Is this how you decide if a user exists on Nov 1, 2020?
WHERE created_at < '2020-11'
AND deleted_at > '2020-11'
Hence, a COUNT(*) with that test would give that count?
For deletions for that month:
WHERE LEFT(deleted_at, 7) = '2020-11'
Putting those together into a single query or all months:
SELECT LEFT(created_at, 7) AS yyyymm,
( SELECT COUNT(*)
FROM users
WHERE created_at < yyyymm
AND deleted_at > yyyymm
) AS new_users,
( SELECT COUNT(*)
FROM users
WHERE deleted_at >= yyyymm
AND deleted_at < CONCAT(yyyymm, '-01')
) AS deleted_users
FROM users
GROUP BY yyyymm
ORDER BY yyyymm
That gives you 3 columns; check it out. To get the churn:
SELECT LEFT(created_at, 7) AS yyyymm,
( SELECT ... ) / ( SELECT ... ) AS churn
FROM users
GROUP BY yyyymm
ORDER BY yyyymm

How do I select SQL data in buckets when data doesn't exist for one bucket?

I'm trying to get a complete set of buckets for a given dataset, even if no records exist for some buckets.
For example, I want to display totals by day of week, with zero total for days with no records.
SELECT
WEEKDAY(transaction_date) AS day_of_week,
SUM(sales) AS total_sales
FROM table1
GROUP BY day_of_week
If I have sales every day, I'll get 7 rows in my result representing total sales on days 0-6.
If I don't have sales on Day 2, I get no result for Day 2.
What's the most efficient way to force a zero value for day 2?
Should I join to a temporary table or array of defined buckets? ['0','1','2','3','4','5','6']
Or is it better to insert zeros outside of MySQL, after I've done the query?
I am using MySQL, but this is a general SQL question.
In MySQL, you could simply use a derived table of numbers from 1 to 7, left join it with the table, then aggregate:
select d.day_of_week, sum(sales) AS total_sales
from (
select 1 day_of_week union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
) d
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Very recent versions have the values(row...) syntax, which shortens the query:
select d.day_of_week, sum(sales) AS total_sales
from (values row(1), row(2), row(3), row(4), row(5), row(6), row(7)) d(day_of_week)
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Basically you want the answer to be 0 when the data is actually null for that bucket, therefore you want the max(null, 0). A max function wouldn't natively work with NULL in this way, however, you can use COALESCE to force it:
COALESCE(MAX(SUM(sales)),0)
as suggested by this answer
First off you need a calendar table; something like this or this. Or create calendar subset on the fly. I am not sure of the mySQL syntax, but here is what it would look like in SQL Server.
DECLARE
#FromDate DATE
, #ToDate DATE
-- set these variables to appropriate values
SET #FromDate = '2020-03-01';
SET #ToDate = '2020-03-31';
;WITH cteCalendar (MyDate) AS
(
SELECT CONVERT(DATE, #FromDate) AS MyDate
UNION ALL
SELECT DATEADD(DAY, 1, MyDate)
FROM cteCalendar
WHERE DATEADD(DAY, 1, MyDate) <= #ToDate
)
SELECT WEEKDAY(cte.MyDate) AS day_of_week,
SUM(sales) AS total_sales
FROM cteCalendar cte
LEFT JOIN table1 t1 ON cte.MyDate = t1.transaction_date
GROUP BY day_of_week

SUM subquery with condition depends on parent query columns returns NULL

everyone!
I'm trying to calc sum of price of deals by each day. What i do:
SET #symbols_set = "A,B,C,D";
DROP TABLE IF EXISTS temp_deals;
CREATE TABLE temp_deals AS SELECT Deal, TimeMsc, Price, VolumeExt, Symbol FROM deals WHERE TimeMsc >= "2019-04-01" AND TimeMsc <= "2019-06-30" AND FIND_IN_SET(Symbol, #symbols_set) > 0;
SELECT
DATE_FORMAT(TimeMsc, "%d/%m/%Y") AS Date,
Symbol,
(SELECT SUM(Price) FROM temp_deals dap WHERE dap.TimeMsc BETWEEN Date AND Date + INTERVAL 1 DAY AND dap.Symbol = Symbol) AS AvgPrice
FROM temp_deals
ORDER BY Date;
DROP TABLE IF EXISTS temp_deals;
But in result i've got NULL in AvgPrice column. I can't understand what i'm doing wrong.
It's look like i can't pass parent query's column to subquery, am i right?
Qualify your column names. But mostly, don't use a string for comparing dates:
SELECT DATE_FORMAT(d.TimeMsc, '%d/%m/%Y') AS Date,
d.Symbol,
(SELECT SUM(dap.Price)
FROM temp_deals dap
WHERE dap.TimeMsc >= d.TimeMsc AND
dap.TimeMsc < d.TimeMsc + INTERVAL 2 DAY AND -- not sure if you want 1 day or 2 day
dap.Symbol = d.Symbol
) AS AvgPrice
FROM temp_deals d
ORDER BY d.TimeMsc;

How to display all the dates on current month in ssrs expression

Consider May is current month
I have list of dates
Ex:
Date No of Items
05/3/2016 4
05/3/2016 5
05/4/2016 7
05/10/2016 10
05/11/2016 50
05/30/2016 100
I want to display all dates in may and sum of the items in their date and if there is no record in the date then it should be left blank
Ex:
Date No of Items
05/1/2016
05/2/2016
05/3/2016 9
05/4/2016 7
05/5/2016
.
.
.
05/10/2016 10
05/11/2016 50
05/12/2016
05/13/2016
.
.
.
.
.
05/30/2016 100
Any Help on this
There's not a way to do this in SSRS.
Usually when I have a similar situation, I would make a table of the dates needed and then LEFT JOIN my data to it so the dates would appear when the date wasn't in the data.
I use a CTE to create the table in SQL:
DECLARE #START_DATE DATE = '01/01/2016'
DECLARE #END_DATE DATE = '05/31/2016'
;WITH GETDATES AS
(
SELECT #START_DATE AS THEDATE
UNION ALL
SELECT DATEADD(DAY,1, THEDATE) FROM GETDATES
WHERE THEDATE < #END_DATE
)
Then use the table with your data (maybe put your results from your current query in a #TEMP_TABLE).
SELECT *
FROM GETDATES D
LEFT JOIN #TEMP_TABLE T ON T.DATE_FIELD = D.THEDATE
Exactly, we cannot do this things in SSRS.
So to achieve this thing, we need to make a table of the Dates and then by making LEFT JOIN we can achieve our goal.
Let me show you one sample example:
DECLARE #month AS INT = 5
DECLARE #Year AS INT = 2016
CREATE TABLE #Temp ( Dates Date)
;WITH N(N)AS
(SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1))M(N)),
tally(N)AS(SELECT ROW_NUMBER()OVER(ORDER BY N.N)FROM N,N a)
INSERT INTO #Temp
SELECT DATEFROMPARTS(#year,#month,N) dates FROM tally
WHERE N <= DAY(EOMONTH(datefromparts(#year,#month,1)))
SELECT Date, SUM(ISNULL(TotalCount,0)) NoOfItems FROM #Temp T
LEFT JOIN TableName S ON S.Date = T.Dates
GROUP BY Dates
DROP TABLE #Temp
And this will return all dates with NoOfItems. Yes, you have to change above query as per your requirement. Thanks