MySQL Advanced Query Brainteaser - mysql

I've been asked to create a financial report, which needs to give a total commission rate between two dates for several 'referrers'. That's the easy part.
The difficult part is that the commission rate varies depending not only on the referrer but also on the type of referral and also on the number of referrals of that type that have been made by a given referrer.
The tracking of the number of referrals needs to take into account ALL referrals, rather than those in the given date range - in other words, the commission rate is on a sliding scale for each referrer, changing as their total referrals increase. Luckily, there are only a maximum of 3 commission levels for each type of referral.
The referrals are all stored in the same table, 1 row per referral, with a field denoting the referrer and the type of referral. An example to illustrate:
ID Type Referrer Date
1 A X 01/12/08
2 A X 15/01/09
3 A X 23/02/09
4 B X 01/12/08
5 B X 15/01/09
6 A Y 01/12/08
7 A Y 15/01/09
8 B Y 15/01/09
9 B Y 23/02/09
The commission rates are not stored in the referral table - and indeed may change - instead they are stored in the referrer table, like so:
Referrer Comm_A1 Comm_A2 Comm_A3 Comm_B1 Comm_B2 Comm_B3
X 30 20 10 55 45 35
Y 45 35 25 60 40 30
Looking at the above referral table as an example, and assuming the commission rate level increased after referral number 1 and 2 (then remained the same), running a commission report for December 2008 to February 2009 would return the following:
[Edit] - to clarify the above, the commission rate has three levels for each type and each referrer, with the initial rate Comm_A1 for the first referral commission, then Comm_A2 for the second, and Comm_A3 for all subsequent referrals.
Referrer Type_A_Comm Type_A_Ref Type_B_Comm Type_B_Ref
X 60 3 100 2
Y 80 2 100 2
Running a commission report for just February 2009 would return:
Referrer Type_A_Comm Type_A_Ref Type_B_Comm Type_B_Ref
X 10 1 0 0
Y 0 0 40 1
Edit the above results have been adjusted from my original question, in terms of the column / row grouping.
I'm quite sure that any solution will involve a sub-query (perhaps for each referral type) and also some kind of aggregate / Sum If - but I'm struggling to come up with a working query.
[Edit] I'm not sure about writing an equation of my requirements, but I'll try to list the steps as I see them:
Determine the number of previous referrals for each type and each referrer - that is, irrespective of any date range.
Based on the number of previous referrals, select the appropriate commission level - 0 previous = level 1, 1 previous = level 2, 2 or more previous = level 3
(Note: a referrer with no previous referrals but, say, 3 new referrals, would expect a commission of 1 x level 1, 1 x level 2, 1 x level 3 = total commission)
Filter results according to a date range - so that commission payable for a period of activity may be determined.
Return data with column for referrer, and a column with the total commission for each referral type (and ideally, also a column with a count for each referral type).
Does that help to clarify my requirements?

Assuming that you have a table called type that lists your particular referral types, this should work (if not, you could substitute another subselect for getting the distinct types from referral for this purpose).
select
r.referrer,
t.type,
(case
when isnull(ref_prior.referrals, 0) < #max1 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max1 then isnull(ref_period.referrals, 0)
else #max1 - isnull(ref_prior.referrals, 0)
end)
else 0
end) * (case t.type when 'A' then r.Comm_A1 when 'B' then r.Comm_B1 else null end) +
(case when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) > #max1 then
(case
when isnull(ref_prior.referrals, 0) < #max2 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max2 then isnull(ref_period.referrals, 0)
else #max2 - isnull(ref_prior.referrals, 0)
end)
else 0
end) -
(case
when isnull(ref_prior.referrals, 0) < #max1 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max1 then isnull(ref_period.referrals, 0)
else #max1 - isnull(ref_prior.referrals, 0)
end)
else 0
end)
else 0 end) * (case t.type when 'A' then r.Comm_A2 when 'B' then r.Comm_B2 else null end) +
(case when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) > #max2 then
(isnull(ref_period.referrals, 0)) -
(
(case when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) > #max1 then
(case
when isnull(ref_prior.referrals, 0) < #max2 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max2 then isnull(ref_period.referrals, 0)
else #max2 - isnull(ref_prior.referrals, 0)
end)
else 0
end) -
(case
when isnull(ref_prior.referrals, 0) < #max1 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max1 then isnull(ref_period.referrals, 0)
else #max1 - isnull(ref_prior.referrals, 0)
end)
else 0
end)
else 0 end) +
(case
when isnull(ref_prior.referrals, 0) < #max1 then
(case
when isnull(ref_prior.referrals, 0) + isnull(ref_period.referrals, 0) < #max1 then isnull(ref_period.referrals, 0)
else #max1 - isnull(ref_prior.referrals, 0)
end)
else 0
end)
)
else 0 end) * (case t.type when 'A' then r.Comm_A3 when 'B' then r.Comm_B3 else null end) as Total_Commission
from referrer r
join type t on 1 = 1 --intentional cartesian product
left join (select referrer, type, count(1) as referrals from referral where date < #start_date group by referrer, type) ref_prior on ref_prior.referrer = r.referrer and ref_prior.type = t.type
left join (select referrer, type, count(1) as referrals from referral where date between #start_date and #end_date group by referrer, type) ref_period on ref_period.referrer = r.referrer and ref_period.type = t.type
This assumes that you have a #start_date and #end_date variable, and you'll obviously have to supply the logic missing from the case statement to make the proper selection of rates based upon the type and number of referrals from ref_total.
Edit
After reviewing the question, I saw the comment about the sliding scale. This greatly increased the complexity of the query, but it's still doable. The revised query now also depends on the presence of two variables #max1 and #max2, representing the maximum number of sales that can fall into category '1' and category '2' (for testing purposes, I used 1 and 2 respectively, and these produced the expected results).

Adam's answer is far more thorough than I'm going to be but I think trying to write this as a single query might not be the right approach.
Have you thought about creating a stored procedure which creates and then populates a temporary table, step by step.
The temporary table would have the shape of the results set you're looking for. The initial insert creates your basic data set (essentially the number of rows you're looking to return with key identifiers and then anything else you're looking to return which can be easily assembled as part of the same query).
You then have a series of updates to the temporary table assembling each section of the more complex data.
Finally select it all back and drop the temporary table.
The advantages of this are that it allows you to break it down in your mind and assemble it a bit at a time which allows you to more easily find where you've gone wrong. It also means that the more complex bits can be assembled in a couple of stages.
In addition if some poor sod comes along and has to debug the whole thing afterwards it's going to be far easier for him to trace through what's happening where.

EDIT: this answer does not take the following requirement into account, but there seems to be a bunch of new explanations so I guess I'll leave it here as is...
The tracking of the number of referrals needs to take into account ALL referrals, rather than those in the given date range
Ok, assuming the report period is monthly, and using a CASE where actually an IF could distinguish the two valid rates (for count = 1 and count > 1) as well, what about:
select
ref.month,
ref.referrer,
ref.type,
( ref.count *
case ref.type
when 'A' then
case ref.count
-- not useful: when 0 then com.Comm_A1
when 1 then com.Comm_A2
else com.Comm_A3
end case
when 'B' then
case ref.count
-- not useful: when 0 then com.Comm_B1
when 1 then com.Comm_B2
else com.Comm_B3
end case
end case
) as total_commission
from
( select
date_format(date, '%Y-%m') as month,
referrer,
type,
count(*) as count
from referrals
group by month, referrer, type
) as ref,
join commissions com on com.referrer = ref.referrer
(I guess the names such as 'ref' and 'count' are not too well chosen above.)

Related

Very slow MySQL COUNT DISTINCT query, even with indexes — how can this be optimised?

I have a MySQL (MariaDB 10.3) query, which takes almost 60 seconds to run. I need to optimise this significantly, as it's frustrating users of my web app.
The query returns the name of a user then 12 columns showing how many customers they signed up, by month, who are eligible to earn commission. It then returns a further 12 columns showing how many commission entries were recorded for the user within each month. (The query needs to return in this 24-column format for compatibility reasons.)
Here's the query:
SELECT
people.full_name AS "Name",
/* Count how many unique customers are eligible for commission in each month, for a rolling 12-month window */
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-08-01" AND "2020-08-31" THEN customers.id END)) AS "eligible_customers_month_1",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-09-01" AND "2020-09-30" THEN customers.id END)) AS "eligible_customers_month_2",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-10-01" AND "2020-10-31" THEN customers.id END)) AS "eligible_customers_month_3",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-11-01" AND "2020-11-30" THEN customers.id END)) AS "eligible_customers_month_4",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-12-01" AND "2020-12-31" THEN customers.id END)) AS "eligible_customers_month_5",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-01-01" AND "2021-01-31" THEN customers.id END)) AS "eligible_customers_month_6",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-02-01" AND "2021-02-28" THEN customers.id END)) AS "eligible_customers_month_7",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-03-01" AND "2021-03-31" THEN customers.id END)) AS "eligible_customers_month_8",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-04-01" AND "2021-04-30" THEN customers.id END)) AS "eligible_customers_month_9",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-05-01" AND "2021-05-31" THEN customers.id END)) AS "eligible_customers_month_10",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-06-01" AND "2021-06-30" THEN customers.id END)) AS "eligible_customers_month_11",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-07-01" AND "2021-07-31" THEN customers.id END)) AS "eligible_customers_month_12",
/* In each month of a rolling 12-month window, count how many unique commission entries were recorded. */
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-08-01" AND "2020-08-31" THEN user_commission.id END)) AS "total_sales_1",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-09-01" AND "2020-09-30" THEN user_commission.id END)) AS "total_sales_2",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-10-01" AND "2020-10-31" THEN user_commission.id END)) AS "total_sales_3",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-11-01" AND "2020-11-30" THEN user_commission.id END)) AS "total_sales_4",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-12-01" AND "2020-12-31" THEN user_commission.id END)) AS "total_sales_5",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-01-01" AND "2021-01-31" THEN user_commission.id END)) AS "total_sales_6",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-02-01" AND "2021-02-28" THEN user_commission.id END)) AS "total_sales_7",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-03-01" AND "2021-03-31" THEN user_commission.id END)) AS "total_sales_8",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-04-01" AND "2021-04-30" THEN user_commission.id END)) AS "total_sales_9",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-05-01" AND "2021-05-31" THEN user_commission.id END)) AS "total_sales_10",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-06-01" AND "2021-06-30" THEN user_commission.id END)) AS "total_sales_11",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-07-01" AND "2021-07-31" THEN user_commission.id END)) AS "total_sales_12"
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN customers ON customers.user_id = users.id
LEFT JOIN user_commission ON user_commission.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
GROUP BY users.id
And here's the output from EXPLAIN SELECT:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
users
index
PRIMARY
PRIMARY
4
16
Using where
1
SIMPLE
people
eq_ref
PRIMARY
PRIMARY
4
users.person_id
1
Using where
1
SIMPLE
customers
ref
user_id
user_id
5
users.id
284
Using where
1
SIMPLE
user_commission
ref
comm_index,user_id
comm_index
4
users.id
465
Using index
comm_index is a UNIQUE index on the user_commission table, covering user_id,order_id,commission_paid_at.
I'm a bit stumped as to what to do next — there are indexes in place, and not many rows for the engine to parse per table.
Any clues would be much appreciated — thanks!
Lets first start that this query going for EVERY user (with the few exceptions you want to EXCLUDE -- I did not include that exclusion list in my query ), I would ask why are you trying to show sales and commission counts for all users to see how all users are doing. I would think that if I was a rep for your company, I only care about how MY activities are going.
Next, this might be a good instance to suggest a pre-aggregation table of the counts per month per user so you dont have to keep re-trying to compute on the fly. If the data does not change such as when a new customer is signed-up, or a sales commission is entered, you may be best to keep those computed at the end of every day for the given user/month/year it represents. But that too is an alternative.
Now, the WHY you are probably getting hit with large delay times, and you are using COUNT( DISTINCT ) on the given customer and commission tables is you are getting a Cartesian result. So, lets go with a scenario you have 100 users. Of those users, in a given month, one user has 3 new customers, 2 commissions because they are new. Yet a long-term rep has 37 new customers and 45 commissions. THESE are the ones killing you. Because your left-join is on user ID, it is taking 1 record from the customers table for a given user and joining that to the commission table for the same user id the sale recorded against.. So the first rep it creates 6 entries to count against (3 * 2). But the second user goes through 1,665 iterations. So, this Cartesian (or cross-join) result is killing you.
So that is the WHY its failing. Now, on to the solution I have for you. You appear to have a bunch of hard-coded dates left-and-right through the code. What happens when next month comes. Do you have to hard-code fix the begin/end dates? If so, then the solution I have for you will simplify that all.
By using the "WITH" (Common-Table-Expression aka CTE), you can pre-write queries and use those "aliase" names AS-IF you wrote each of the queries within a multi-nested query. But the benefit is the query is written once, even if you keep re-using the alias name reference.
So here is the query and I'll describe / break it down next so you can view/follow along.
with Rolling12 as
(
select
#rptMonth := #rptMonth +1 as QryMonth,
#beginDate as AtLeastDate,
date_add( #beginDate, interval 1 month ) as AndLessThanDate,
#beginDate := date_add( #beginDate, interval 1 month )
from
user_commission
JOIN ( select #rptMonth := 0,
#beginDate := date_sub(
date_add(
date_sub( curdate(),
interval day( curdate()) -1 day ),
interval 1 month ),
interval 1 year )
) sqlvars
limit 12
),
MinMaxDates as
(
select
min( AtLeastDate ) MinDate,
max( AndLessThanDate ) MaxDate
from
Rolling12
),
SumCommission as
(
select
uc.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) commission01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) commission02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) commission03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) commission04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) commission05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) commission06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) commission07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) commission08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) commission09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) commission10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) commission11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) commission12
from
user_commission uc
JOIN Rolling12 R12
on uc.commission_paid_at >= R12.AtLeastDate
AND uc.commission_paid_at < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
uc.commission_paid_at >= mm.MinDate
AND uc.commission_paid_at < mm.MaxDate
group by
uc.user_id
),
SumCustomers as
(
select
c.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) customers01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) customers02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) customers03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) customers04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) customers05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) customers06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) customers07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) customers08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) customers09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) customers10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) customers11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) customers12
from
customers c
JOIN Rolling12 R12
on c.commission_start_date >= R12.AtLeastDate
AND c.commission_start_date < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
c.commission_start_date >= mm.MinDate
AND c.commission_start_date < mm.MaxDate
group by
c.user_id
)
select
u.id,
p.full_name AS "Name",
com.Commission01,
com.Commission02,
com.Commission03,
com.Commission04,
com.Commission05,
com.Commission06,
com.Commission07,
com.Commission08,
com.Commission09,
com.Commission10,
com.Commission11,
com.Commission12,
cst.Customers01,
cst.Customers02,
cst.Customers03,
cst.Customers04,
cst.Customers05,
cst.Customers06,
cst.Customers07,
cst.Customers08,
cst.Customers09,
cst.Customers10,
cst.Customers11,
cst.Customers12
from
users u
JOIN People p
ON u.person_id = p.id
LEFT JOIN SumCommission com
on u.id = com.user_id
LEFT JOIN SumCustomers cst
on u.id = cst.user_id;
You state that you are running on a rolling 12-month period. For this, I have my first CTE alias "Rolling12". This query is a setup for the rest of the query. It creates MySQL variables and keeps computing an updated begin/end date for each month represented. It starts by taking the current date ex: July 6 and rolls it back to July 1. Then adds 1 month to get August 1, then subtracts 1 year from that Aug 1, 2020 for the beginning period of your 12-month rolling computation. I then simple join to the commission table and limit to 12 records, each time going forward and making a column for the beginning and ending dates of the pay periods and just assigning a month ID sequence to it.
If you highlight and just run the query inside the With Rolling12 as ( the query ), you will see what it builds out. This prevents all the hard-coding dates associated with your current 24 case/count distinct when conditions.
Then a comma and the next CTE for MinMaxDates. Here, I am querying from this 12-month roll to get the minimum begin and end date for the entire period you are reporting, so when querying the sales customers and commissions, I can join to this as a single row result for the begin/end dates of details.
Next are the SumCommission and SumCustomers. These are joining against the CTE "Rolling12" records with the JOIN so we can associate the specific commission or customer to that one date range entry. So from that, I get the query month of the rolling 12 and sum() it. But since sum() of a null results in null, I wrap it with coalesce( calculation, 0 ) to show 0 as a worst-case.
The reason for each of these being run individually and grouped by user is to prevent the Cartesian result previously mentioned.
Once those individual parts are all done, I now start with the user, join to people to get the name, then LEFT-JOIN to the respective other SUM() queries. So, if a user had only a new customer for a month, but no commission, you would only have a record in that set and not the other, thus preventing the duplication of query results requiring your DISTINCT to begin with.
So, even though it looks long and may be confusing, especially the WITH CTE context, look at it to its individual parts. The SUMs() are pre-grouped by user ID, so each sum() result will only have one possible record per user for that given period.
As for indexes to help optimize the query, I would ensure the commission and customer table have an index on ( dateField, useridField ) respectively.
I would be interested in knowing how well this performs when you give it a shot.
First of all, you select about all rows instead of only the months you are interested in.
Solution: A WHERE clause to restrict the rows taken into consideration.
Then you cross join a user's customers with the user's commissions, thus building a huge intermediate result you don't need and want.
Solution: Aggregate before joining.
In order to
This can look thus for instance:
SELECT
people.full_name AS "Name",
cu.eligible_customers_month_1,
cu.eligible_customers_month_2,
...
co.total_sales_1,
co.total_sales_2,
...
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as eligible_customers_month_1,
max(case when month_index = 2 then cnt else 0 end) as eligible_customers_month_2,
...
from
(
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_start_date) * 12 + month(commission_start_date))
+ 1 as month_index,
count(*) as cnt
from customers
where commission_start_date >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) cu ON cu.user_id = users.id
LEFT JOIN
(
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as total_sales_1,
max(case when month_index = 2 then cnt else 0 end) as total_sales_2,
...
from
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_paid_at) * 12 + month(commission_paid_at))
+ 1 as month_index,
count(*) as cnt
from user_commission
where commission_paid_at >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) co ON co.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
ORDER BY users.id;
Recommended indexes:
create index idx1 on customers (commission_start_date, user_id);
create index idx2 on user_commission (commission_paid_at, user_id);

Diffrence between sum of two products > 0

I want to select the sum of T_No where Transactions are equal to R and subtract it by T_No where Transactions are equal to D and the answer of this should greater than zero for a CustomerID which would be a input (an int input declared in a stored procedure)
((Sum(T_No) where Transactions = R - Sum(T_No) where Transactions = D ) > 0) where CoustomerID = #input
Example : for ID = 1 it would be ((20+15) - 10) > 0
I Have tried so many things but either syntax is wrong, wrong value or it does not accept, and I am literally Stuck, this was my final attempt
SELECT
(select ( select Sum(T_No) where Transactions = R) - (select Sum(T_No) where Transactions = D) as C_T )
FROM CustomerTrans WHERE C_T > 0 ;
Conditional aggregation should help:
SELECT
SUM(CASE WHEN Transaction = 'R' THEN t_no ELSE 0 END) - SUM(CASE WHEN Transaction = 'D' THEN t_no ELSE 0 END)
FROM CustomerTrans
WHERE CoustomerID = #yourCustomerIdVariable
As you're writing a sproc you can assign the result of this to a variable and then decide what to do if the result is negative. (I would personally log an error for example, rather than just hide those results). If the result is null, then there were no transactions for that customer
ps; I used Transaction because that's what your screenshot showed, and I figured a screenshot is less likely to contain a typo than code with syntax errors. Adjust if required
you where kinda close, I would sum like you, only the syntax is a bit off, you can't have aggregate fields in Where, thats why you should use having, also case when syntax is not correct.
Select
CoustomerID,
Sum(case when Transactions = 'R' then T_No else 0 end) -
Sum(case when Transactions = 'D' then T_No else 0 end) as C_T
FROM CustomerTrans
group by CoustomerID
having (Sum(case when Transactions = 'R' then T_No else 0 end) -
Sum(case when Transactions = 'D' then T_No else 0 end))>0

Counting the number of mention depending on sentiment SQL

Apologies if this is a duplicate of anything, I wasn't finding answers which particularly did what I wanted.
I'm trying to write a SQL query which will return the count of rows which contain a positive, negative or neutral sentiment on one of the candidates in the dataset.
Here is a screenshot for reference
Sentiment is one column but the values in it define the tweet to be positive, negative, or neutral. my goal is to have the query return something like this
if anyone could give me an example on how to do this, I'd appreciate!
try using specific COUNT() functions in your query like this.
SELECT name as `Candidate Name`,
COUNT(CASE WHEN sentiment='Negative' THEN 1 END) AS `Negative`,
COUNT(CASE WHEN sentiment='Positive' THEN 1 END) AS `Positive`,
COUNT(CASE WHEN sentiment='Neutral' THEN 1 END) AS `Neutral`,
COUNT(*) AS `Total`
FROM [table]
GROUP BY candidate
I like using IF()'s or CASE WHEN's to solve this type of thing. Pivots are sometimes time consuming to think through.
SELECT
Name as CandidateName,
SUM(IF(Sentiment = 'N', 1, 0)) as Negative,
SUM(IF(Sentiment = 'Y', 1, 0)) as Positive,
SUM(IF(Sentiment = 'N', 1, 0)) as Neutral
COUNT(*) as Total
FROM [TABLE]
GROUP BY
Name
To use t-SQL, or to just use CASE WHEN's, that same code could look like:
SELECT
Name as CandidateName,
SUM(CASE WHEN Sentiment = 'N' THEN 1 ELSE 0 END) as Negative,
SUM(CASE WHEN Sentiment = 'Y' THEN 1 ELSE 0 END) as Positive,
SUM(CASE WHEN Sentiment = 'N' THEN 1 ELSE 0 END) as Neutral
COUNT(*) as Total
FROM [TABLE]
GROUP BY
Name

SSRS : Repeating table with multiple-value parameter

I'm struggling with something which is probably easy in reporting services, but could not find any online help on this.
I'm building a telephony report for my company that show statistics for our clients. The main parameters for this report are : client name, date from, date to
Each client can have several calls queues (not the same number for each) so I need to create a statistics table that shows numbers, and repeats itself for each call queue.
I created a multi-values parameter for the queues that populates itself from this query:
SELECT queue_id FROM customers WHERE customer_name = #Customer
The subreports parameters are:
date_from = [#date_from]
date_to = [#date_to]
queueid = =Split(join(Parameters!client_call_queues.Value,","),",")
And here comes the issue, when I show the report preview, I can see the table with values, but summed for all queues, not splitted.
If I add a custom grouping on the tablix, it returns me a wrong calculation.
To go deeper in details, here's below a first printscreen that shows the numbers for all the queues:
and the table I get from the subreport, without custom grouping:
I tried to add this custom grouping, in the subreport tablix:
And here's the result it gives with that grouping:
Here's below the query used to populate subreport tablix:
SELECT
#queueid,
ISNULL(q1.[Day],'TOTAL') AS [Day],
COUNT(*) AS [Calls In],
SUM(CASE WHEN q1.[Call Type] = 'Answered Within Threshold' THEN 1 ELSE 0
END) + SUM(CASE WHEN q1.[Call Type] = 'Answered After Threshold' THEN 1 ELSE
0 END) AS [Answered],
SUM(CASE WHEN q1.[Call Type] = 'Abandoned Within Threshold' THEN 1 ELSE 0
END) AS [Abd. Within Threshold],
SUM(CASE WHEN q1.[Call Type] IN ('Abandoned Within Threshold') THEN 1 ELSE 0
END)/CONVERT(DECIMAL(10,2),COUNT(*)) AS [Calls Abandoned Within Threshold
Rate],
SUM(CASE WHEN q1.[Call Type] = 'Abandoned After Threshold' THEN 1 ELSE 0
END) AS [Abd. After Threshold],
SUM(CASE WHEN q1.[Call Type] IN ('Abandoned After Threshold') THEN 1 ELSE 0
END)/CONVERT(DECIMAL(10,2),COUNT(*)) AS [Calls Abandoned After Threshold
Rate],
SUM(CASE WHEN q1.[Call Type] = 'Voicemail' THEN 1 ELSE 0 END) AS [Voice
Mail]
FROM
(
SELECT
CUS.[Customer],
SUBSTRING(CONVERT(VARCHAR,ACD.[startdatetime],121),1,10) AS [Day],
ACD.[sessionid],
ACD.[contactdisposition],
ISNULL(ACD.[Squeuetime],0) AS [Squeuetime],
CUS.[ans_speed_secs],
CUS.[abd_tresh],
ACD.[businesshours],
ACD.[contacttype],
ISNULL(ACD.[connecttime],0) AS [connecttime],
ACD.[voicemail],
CUS.[ans_speed_rate],
ACD.[callednumber],
CUS.[Active],
ISNULL(ACD.[Stalktime], 0) AS [Stalktime],
ISNULL(ACD.[Sholdtime], 0) AS [Sholdtime],
CASE
WHEN ACD.[contactdisposition] = 2 AND ISNULL(ACD.[Squeuetime],0) <= CUS.
[ans_speed_secs] THEN 'Answered Within Threshold'
WHEN ACD.[contactdisposition] = 2 AND ISNULL(ACD.[Squeuetime],0) > CUS.
[ans_speed_secs] THEN 'Answered After Threshold'
WHEN ACD.[contactdisposition] <> 2 AND ISNULL(ACD.[Squeuetime],0) <=
CUS.[abd_tresh] THEN 'Abandoned Within Threshold'
WHEN ACD.[contactdisposition] <> 2 AND ISNULL(ACD.[Squeuetime],0) > CUS.
[abd_tresh] THEN 'Abandoned After Threshold'
WHEN ACD.[voicemail] = 'VoiceMail' THEN 'Voicemail'
END AS [Call Type]
FROM [ACD].[dbo].[UccxCallsQuery2] ACD
LEFT OUTER JOIN [ACD].[dbo].[Customers] CUS ON ACD.[callednumber] = CUS.
[Called_id]
WHERE CUS.[ACDSelection] IN (#queueid)
AND ACD.[businesshours] <> 'NBO'
AND ACD.[contacttype] = 1
AND CUS.[Active] = 1
AND CONVERT(DATE,ACD.[startdatetime]) BETWEEN #date_from AND #date_to
) q1 GROUP BY q1.[Day] WITH ROLLUP
I guess this is only a simple thing, but can spot it.
Thanks in advance for any help on this !

Trying to COUNT the same table for different values

I have a table called flags from which I'm trying to extract two COUNTs.
I'd like one COUNT for the number of flags since the start of the year and a separate COUNT for this week's allocation.
The query I'm using is as follows:
SELECT
COUNT(f1.ID) AS `Total Flags`,
COUNT(f2.ID) AS `Weekly Flags`
FROM `frog_flags`.`flags` f1
LEFT JOIN `frog_flags`.`flags` f2
ON f1.`ID` = f2.`ID`
WHERE
f2.`Datetime` > '2013-07-08 00:00:00'
AND
( f1.`Staff_ID` = '12345' AND f2.`Staff_ID` = '12345')
AND
f1.`Datetime` > '2012-09-01 00:00:00'
Even though I have data in place, it's showing 0 for both the Total Flags and the Weekly Flags.
I suspect I've confused my WHERE clauses for trying to JOIN the same table twice.
Am I using my clauses incorrectly when trying to COUNT the same table for different values?
This is a cross-tab SQL query - it's a great design pattern once you get the hang of it:
SELECT
sum( case when `Datetime`> '2012-09-01 00:00:00' then 1 else 0 end) AS `Total Flags`,
sum( case when `Datetime`> '2013-07-08 00:00:00' then 1 else 0 end) AS `Weekly Flags`
FROM `frog_flags`.`flags` f1
WHERE f1.`Staff_ID` = '12345'
You use a condition to create basically boolean flags which get summed up - this allows for a number of predefined new columns instead of rows.
You could take it further and do it for all staff simultaneously:
SELECT
f1.`Staff_ID`,
sum( case when `Datetime`> '2012-09-01 00:00:00' then 1 else 0 end) AS `Total Flags`,
sum( case when `Datetime`> '2013-07-08 00:00:00' then 1 else 0 end) AS `Weekly Flags`
FROM `frog_flags`.`flags` f1
WHERE f1.`Staff_ID` = '12345'
GROUP BY f1.`Staff_ID`