I have a MySQL (MariaDB 10.3) query, which takes almost 60 seconds to run. I need to optimise this significantly, as it's frustrating users of my web app.
The query returns the name of a user then 12 columns showing how many customers they signed up, by month, who are eligible to earn commission. It then returns a further 12 columns showing how many commission entries were recorded for the user within each month. (The query needs to return in this 24-column format for compatibility reasons.)
Here's the query:
SELECT
people.full_name AS "Name",
/* Count how many unique customers are eligible for commission in each month, for a rolling 12-month window */
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-08-01" AND "2020-08-31" THEN customers.id END)) AS "eligible_customers_month_1",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-09-01" AND "2020-09-30" THEN customers.id END)) AS "eligible_customers_month_2",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-10-01" AND "2020-10-31" THEN customers.id END)) AS "eligible_customers_month_3",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-11-01" AND "2020-11-30" THEN customers.id END)) AS "eligible_customers_month_4",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-12-01" AND "2020-12-31" THEN customers.id END)) AS "eligible_customers_month_5",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-01-01" AND "2021-01-31" THEN customers.id END)) AS "eligible_customers_month_6",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-02-01" AND "2021-02-28" THEN customers.id END)) AS "eligible_customers_month_7",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-03-01" AND "2021-03-31" THEN customers.id END)) AS "eligible_customers_month_8",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-04-01" AND "2021-04-30" THEN customers.id END)) AS "eligible_customers_month_9",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-05-01" AND "2021-05-31" THEN customers.id END)) AS "eligible_customers_month_10",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-06-01" AND "2021-06-30" THEN customers.id END)) AS "eligible_customers_month_11",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-07-01" AND "2021-07-31" THEN customers.id END)) AS "eligible_customers_month_12",
/* In each month of a rolling 12-month window, count how many unique commission entries were recorded. */
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-08-01" AND "2020-08-31" THEN user_commission.id END)) AS "total_sales_1",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-09-01" AND "2020-09-30" THEN user_commission.id END)) AS "total_sales_2",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-10-01" AND "2020-10-31" THEN user_commission.id END)) AS "total_sales_3",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-11-01" AND "2020-11-30" THEN user_commission.id END)) AS "total_sales_4",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-12-01" AND "2020-12-31" THEN user_commission.id END)) AS "total_sales_5",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-01-01" AND "2021-01-31" THEN user_commission.id END)) AS "total_sales_6",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-02-01" AND "2021-02-28" THEN user_commission.id END)) AS "total_sales_7",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-03-01" AND "2021-03-31" THEN user_commission.id END)) AS "total_sales_8",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-04-01" AND "2021-04-30" THEN user_commission.id END)) AS "total_sales_9",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-05-01" AND "2021-05-31" THEN user_commission.id END)) AS "total_sales_10",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-06-01" AND "2021-06-30" THEN user_commission.id END)) AS "total_sales_11",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-07-01" AND "2021-07-31" THEN user_commission.id END)) AS "total_sales_12"
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN customers ON customers.user_id = users.id
LEFT JOIN user_commission ON user_commission.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
GROUP BY users.id
And here's the output from EXPLAIN SELECT:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
users
index
PRIMARY
PRIMARY
4
16
Using where
1
SIMPLE
people
eq_ref
PRIMARY
PRIMARY
4
users.person_id
1
Using where
1
SIMPLE
customers
ref
user_id
user_id
5
users.id
284
Using where
1
SIMPLE
user_commission
ref
comm_index,user_id
comm_index
4
users.id
465
Using index
comm_index is a UNIQUE index on the user_commission table, covering user_id,order_id,commission_paid_at.
I'm a bit stumped as to what to do next — there are indexes in place, and not many rows for the engine to parse per table.
Any clues would be much appreciated — thanks!
Lets first start that this query going for EVERY user (with the few exceptions you want to EXCLUDE -- I did not include that exclusion list in my query ), I would ask why are you trying to show sales and commission counts for all users to see how all users are doing. I would think that if I was a rep for your company, I only care about how MY activities are going.
Next, this might be a good instance to suggest a pre-aggregation table of the counts per month per user so you dont have to keep re-trying to compute on the fly. If the data does not change such as when a new customer is signed-up, or a sales commission is entered, you may be best to keep those computed at the end of every day for the given user/month/year it represents. But that too is an alternative.
Now, the WHY you are probably getting hit with large delay times, and you are using COUNT( DISTINCT ) on the given customer and commission tables is you are getting a Cartesian result. So, lets go with a scenario you have 100 users. Of those users, in a given month, one user has 3 new customers, 2 commissions because they are new. Yet a long-term rep has 37 new customers and 45 commissions. THESE are the ones killing you. Because your left-join is on user ID, it is taking 1 record from the customers table for a given user and joining that to the commission table for the same user id the sale recorded against.. So the first rep it creates 6 entries to count against (3 * 2). But the second user goes through 1,665 iterations. So, this Cartesian (or cross-join) result is killing you.
So that is the WHY its failing. Now, on to the solution I have for you. You appear to have a bunch of hard-coded dates left-and-right through the code. What happens when next month comes. Do you have to hard-code fix the begin/end dates? If so, then the solution I have for you will simplify that all.
By using the "WITH" (Common-Table-Expression aka CTE), you can pre-write queries and use those "aliase" names AS-IF you wrote each of the queries within a multi-nested query. But the benefit is the query is written once, even if you keep re-using the alias name reference.
So here is the query and I'll describe / break it down next so you can view/follow along.
with Rolling12 as
(
select
#rptMonth := #rptMonth +1 as QryMonth,
#beginDate as AtLeastDate,
date_add( #beginDate, interval 1 month ) as AndLessThanDate,
#beginDate := date_add( #beginDate, interval 1 month )
from
user_commission
JOIN ( select #rptMonth := 0,
#beginDate := date_sub(
date_add(
date_sub( curdate(),
interval day( curdate()) -1 day ),
interval 1 month ),
interval 1 year )
) sqlvars
limit 12
),
MinMaxDates as
(
select
min( AtLeastDate ) MinDate,
max( AndLessThanDate ) MaxDate
from
Rolling12
),
SumCommission as
(
select
uc.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) commission01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) commission02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) commission03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) commission04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) commission05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) commission06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) commission07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) commission08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) commission09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) commission10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) commission11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) commission12
from
user_commission uc
JOIN Rolling12 R12
on uc.commission_paid_at >= R12.AtLeastDate
AND uc.commission_paid_at < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
uc.commission_paid_at >= mm.MinDate
AND uc.commission_paid_at < mm.MaxDate
group by
uc.user_id
),
SumCustomers as
(
select
c.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) customers01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) customers02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) customers03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) customers04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) customers05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) customers06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) customers07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) customers08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) customers09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) customers10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) customers11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) customers12
from
customers c
JOIN Rolling12 R12
on c.commission_start_date >= R12.AtLeastDate
AND c.commission_start_date < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
c.commission_start_date >= mm.MinDate
AND c.commission_start_date < mm.MaxDate
group by
c.user_id
)
select
u.id,
p.full_name AS "Name",
com.Commission01,
com.Commission02,
com.Commission03,
com.Commission04,
com.Commission05,
com.Commission06,
com.Commission07,
com.Commission08,
com.Commission09,
com.Commission10,
com.Commission11,
com.Commission12,
cst.Customers01,
cst.Customers02,
cst.Customers03,
cst.Customers04,
cst.Customers05,
cst.Customers06,
cst.Customers07,
cst.Customers08,
cst.Customers09,
cst.Customers10,
cst.Customers11,
cst.Customers12
from
users u
JOIN People p
ON u.person_id = p.id
LEFT JOIN SumCommission com
on u.id = com.user_id
LEFT JOIN SumCustomers cst
on u.id = cst.user_id;
You state that you are running on a rolling 12-month period. For this, I have my first CTE alias "Rolling12". This query is a setup for the rest of the query. It creates MySQL variables and keeps computing an updated begin/end date for each month represented. It starts by taking the current date ex: July 6 and rolls it back to July 1. Then adds 1 month to get August 1, then subtracts 1 year from that Aug 1, 2020 for the beginning period of your 12-month rolling computation. I then simple join to the commission table and limit to 12 records, each time going forward and making a column for the beginning and ending dates of the pay periods and just assigning a month ID sequence to it.
If you highlight and just run the query inside the With Rolling12 as ( the query ), you will see what it builds out. This prevents all the hard-coding dates associated with your current 24 case/count distinct when conditions.
Then a comma and the next CTE for MinMaxDates. Here, I am querying from this 12-month roll to get the minimum begin and end date for the entire period you are reporting, so when querying the sales customers and commissions, I can join to this as a single row result for the begin/end dates of details.
Next are the SumCommission and SumCustomers. These are joining against the CTE "Rolling12" records with the JOIN so we can associate the specific commission or customer to that one date range entry. So from that, I get the query month of the rolling 12 and sum() it. But since sum() of a null results in null, I wrap it with coalesce( calculation, 0 ) to show 0 as a worst-case.
The reason for each of these being run individually and grouped by user is to prevent the Cartesian result previously mentioned.
Once those individual parts are all done, I now start with the user, join to people to get the name, then LEFT-JOIN to the respective other SUM() queries. So, if a user had only a new customer for a month, but no commission, you would only have a record in that set and not the other, thus preventing the duplication of query results requiring your DISTINCT to begin with.
So, even though it looks long and may be confusing, especially the WITH CTE context, look at it to its individual parts. The SUMs() are pre-grouped by user ID, so each sum() result will only have one possible record per user for that given period.
As for indexes to help optimize the query, I would ensure the commission and customer table have an index on ( dateField, useridField ) respectively.
I would be interested in knowing how well this performs when you give it a shot.
First of all, you select about all rows instead of only the months you are interested in.
Solution: A WHERE clause to restrict the rows taken into consideration.
Then you cross join a user's customers with the user's commissions, thus building a huge intermediate result you don't need and want.
Solution: Aggregate before joining.
In order to
This can look thus for instance:
SELECT
people.full_name AS "Name",
cu.eligible_customers_month_1,
cu.eligible_customers_month_2,
...
co.total_sales_1,
co.total_sales_2,
...
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as eligible_customers_month_1,
max(case when month_index = 2 then cnt else 0 end) as eligible_customers_month_2,
...
from
(
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_start_date) * 12 + month(commission_start_date))
+ 1 as month_index,
count(*) as cnt
from customers
where commission_start_date >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) cu ON cu.user_id = users.id
LEFT JOIN
(
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as total_sales_1,
max(case when month_index = 2 then cnt else 0 end) as total_sales_2,
...
from
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_paid_at) * 12 + month(commission_paid_at))
+ 1 as month_index,
count(*) as cnt
from user_commission
where commission_paid_at >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) co ON co.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
ORDER BY users.id;
Recommended indexes:
create index idx1 on customers (commission_start_date, user_id);
create index idx2 on user_commission (commission_paid_at, user_id);
I want to select the sum of T_No where Transactions are equal to R and subtract it by T_No where Transactions are equal to D and the answer of this should greater than zero for a CustomerID which would be a input (an int input declared in a stored procedure)
((Sum(T_No) where Transactions = R - Sum(T_No) where Transactions = D ) > 0) where CoustomerID = #input
Example : for ID = 1 it would be ((20+15) - 10) > 0
I Have tried so many things but either syntax is wrong, wrong value or it does not accept, and I am literally Stuck, this was my final attempt
SELECT
(select ( select Sum(T_No) where Transactions = R) - (select Sum(T_No) where Transactions = D) as C_T )
FROM CustomerTrans WHERE C_T > 0 ;
Conditional aggregation should help:
SELECT
SUM(CASE WHEN Transaction = 'R' THEN t_no ELSE 0 END) - SUM(CASE WHEN Transaction = 'D' THEN t_no ELSE 0 END)
FROM CustomerTrans
WHERE CoustomerID = #yourCustomerIdVariable
As you're writing a sproc you can assign the result of this to a variable and then decide what to do if the result is negative. (I would personally log an error for example, rather than just hide those results). If the result is null, then there were no transactions for that customer
ps; I used Transaction because that's what your screenshot showed, and I figured a screenshot is less likely to contain a typo than code with syntax errors. Adjust if required
you where kinda close, I would sum like you, only the syntax is a bit off, you can't have aggregate fields in Where, thats why you should use having, also case when syntax is not correct.
Select
CoustomerID,
Sum(case when Transactions = 'R' then T_No else 0 end) -
Sum(case when Transactions = 'D' then T_No else 0 end) as C_T
FROM CustomerTrans
group by CoustomerID
having (Sum(case when Transactions = 'R' then T_No else 0 end) -
Sum(case when Transactions = 'D' then T_No else 0 end))>0
Apologies if this is a duplicate of anything, I wasn't finding answers which particularly did what I wanted.
I'm trying to write a SQL query which will return the count of rows which contain a positive, negative or neutral sentiment on one of the candidates in the dataset.
Here is a screenshot for reference
Sentiment is one column but the values in it define the tweet to be positive, negative, or neutral. my goal is to have the query return something like this
if anyone could give me an example on how to do this, I'd appreciate!
try using specific COUNT() functions in your query like this.
SELECT name as `Candidate Name`,
COUNT(CASE WHEN sentiment='Negative' THEN 1 END) AS `Negative`,
COUNT(CASE WHEN sentiment='Positive' THEN 1 END) AS `Positive`,
COUNT(CASE WHEN sentiment='Neutral' THEN 1 END) AS `Neutral`,
COUNT(*) AS `Total`
FROM [table]
GROUP BY candidate
I like using IF()'s or CASE WHEN's to solve this type of thing. Pivots are sometimes time consuming to think through.
SELECT
Name as CandidateName,
SUM(IF(Sentiment = 'N', 1, 0)) as Negative,
SUM(IF(Sentiment = 'Y', 1, 0)) as Positive,
SUM(IF(Sentiment = 'N', 1, 0)) as Neutral
COUNT(*) as Total
FROM [TABLE]
GROUP BY
Name
To use t-SQL, or to just use CASE WHEN's, that same code could look like:
SELECT
Name as CandidateName,
SUM(CASE WHEN Sentiment = 'N' THEN 1 ELSE 0 END) as Negative,
SUM(CASE WHEN Sentiment = 'Y' THEN 1 ELSE 0 END) as Positive,
SUM(CASE WHEN Sentiment = 'N' THEN 1 ELSE 0 END) as Neutral
COUNT(*) as Total
FROM [TABLE]
GROUP BY
Name
I'm struggling with something which is probably easy in reporting services, but could not find any online help on this.
I'm building a telephony report for my company that show statistics for our clients. The main parameters for this report are : client name, date from, date to
Each client can have several calls queues (not the same number for each) so I need to create a statistics table that shows numbers, and repeats itself for each call queue.
I created a multi-values parameter for the queues that populates itself from this query:
SELECT queue_id FROM customers WHERE customer_name = #Customer
The subreports parameters are:
date_from = [#date_from]
date_to = [#date_to]
queueid = =Split(join(Parameters!client_call_queues.Value,","),",")
And here comes the issue, when I show the report preview, I can see the table with values, but summed for all queues, not splitted.
If I add a custom grouping on the tablix, it returns me a wrong calculation.
To go deeper in details, here's below a first printscreen that shows the numbers for all the queues:
and the table I get from the subreport, without custom grouping:
I tried to add this custom grouping, in the subreport tablix:
And here's the result it gives with that grouping:
Here's below the query used to populate subreport tablix:
SELECT
#queueid,
ISNULL(q1.[Day],'TOTAL') AS [Day],
COUNT(*) AS [Calls In],
SUM(CASE WHEN q1.[Call Type] = 'Answered Within Threshold' THEN 1 ELSE 0
END) + SUM(CASE WHEN q1.[Call Type] = 'Answered After Threshold' THEN 1 ELSE
0 END) AS [Answered],
SUM(CASE WHEN q1.[Call Type] = 'Abandoned Within Threshold' THEN 1 ELSE 0
END) AS [Abd. Within Threshold],
SUM(CASE WHEN q1.[Call Type] IN ('Abandoned Within Threshold') THEN 1 ELSE 0
END)/CONVERT(DECIMAL(10,2),COUNT(*)) AS [Calls Abandoned Within Threshold
Rate],
SUM(CASE WHEN q1.[Call Type] = 'Abandoned After Threshold' THEN 1 ELSE 0
END) AS [Abd. After Threshold],
SUM(CASE WHEN q1.[Call Type] IN ('Abandoned After Threshold') THEN 1 ELSE 0
END)/CONVERT(DECIMAL(10,2),COUNT(*)) AS [Calls Abandoned After Threshold
Rate],
SUM(CASE WHEN q1.[Call Type] = 'Voicemail' THEN 1 ELSE 0 END) AS [Voice
Mail]
FROM
(
SELECT
CUS.[Customer],
SUBSTRING(CONVERT(VARCHAR,ACD.[startdatetime],121),1,10) AS [Day],
ACD.[sessionid],
ACD.[contactdisposition],
ISNULL(ACD.[Squeuetime],0) AS [Squeuetime],
CUS.[ans_speed_secs],
CUS.[abd_tresh],
ACD.[businesshours],
ACD.[contacttype],
ISNULL(ACD.[connecttime],0) AS [connecttime],
ACD.[voicemail],
CUS.[ans_speed_rate],
ACD.[callednumber],
CUS.[Active],
ISNULL(ACD.[Stalktime], 0) AS [Stalktime],
ISNULL(ACD.[Sholdtime], 0) AS [Sholdtime],
CASE
WHEN ACD.[contactdisposition] = 2 AND ISNULL(ACD.[Squeuetime],0) <= CUS.
[ans_speed_secs] THEN 'Answered Within Threshold'
WHEN ACD.[contactdisposition] = 2 AND ISNULL(ACD.[Squeuetime],0) > CUS.
[ans_speed_secs] THEN 'Answered After Threshold'
WHEN ACD.[contactdisposition] <> 2 AND ISNULL(ACD.[Squeuetime],0) <=
CUS.[abd_tresh] THEN 'Abandoned Within Threshold'
WHEN ACD.[contactdisposition] <> 2 AND ISNULL(ACD.[Squeuetime],0) > CUS.
[abd_tresh] THEN 'Abandoned After Threshold'
WHEN ACD.[voicemail] = 'VoiceMail' THEN 'Voicemail'
END AS [Call Type]
FROM [ACD].[dbo].[UccxCallsQuery2] ACD
LEFT OUTER JOIN [ACD].[dbo].[Customers] CUS ON ACD.[callednumber] = CUS.
[Called_id]
WHERE CUS.[ACDSelection] IN (#queueid)
AND ACD.[businesshours] <> 'NBO'
AND ACD.[contacttype] = 1
AND CUS.[Active] = 1
AND CONVERT(DATE,ACD.[startdatetime]) BETWEEN #date_from AND #date_to
) q1 GROUP BY q1.[Day] WITH ROLLUP
I guess this is only a simple thing, but can spot it.
Thanks in advance for any help on this !