I have a MySQL (MariaDB 10.3) query, which takes almost 60 seconds to run. I need to optimise this significantly, as it's frustrating users of my web app.
The query returns the name of a user then 12 columns showing how many customers they signed up, by month, who are eligible to earn commission. It then returns a further 12 columns showing how many commission entries were recorded for the user within each month. (The query needs to return in this 24-column format for compatibility reasons.)
Here's the query:
SELECT
people.full_name AS "Name",
/* Count how many unique customers are eligible for commission in each month, for a rolling 12-month window */
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-08-01" AND "2020-08-31" THEN customers.id END)) AS "eligible_customers_month_1",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-09-01" AND "2020-09-30" THEN customers.id END)) AS "eligible_customers_month_2",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-10-01" AND "2020-10-31" THEN customers.id END)) AS "eligible_customers_month_3",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-11-01" AND "2020-11-30" THEN customers.id END)) AS "eligible_customers_month_4",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-12-01" AND "2020-12-31" THEN customers.id END)) AS "eligible_customers_month_5",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-01-01" AND "2021-01-31" THEN customers.id END)) AS "eligible_customers_month_6",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-02-01" AND "2021-02-28" THEN customers.id END)) AS "eligible_customers_month_7",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-03-01" AND "2021-03-31" THEN customers.id END)) AS "eligible_customers_month_8",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-04-01" AND "2021-04-30" THEN customers.id END)) AS "eligible_customers_month_9",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-05-01" AND "2021-05-31" THEN customers.id END)) AS "eligible_customers_month_10",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-06-01" AND "2021-06-30" THEN customers.id END)) AS "eligible_customers_month_11",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-07-01" AND "2021-07-31" THEN customers.id END)) AS "eligible_customers_month_12",
/* In each month of a rolling 12-month window, count how many unique commission entries were recorded. */
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-08-01" AND "2020-08-31" THEN user_commission.id END)) AS "total_sales_1",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-09-01" AND "2020-09-30" THEN user_commission.id END)) AS "total_sales_2",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-10-01" AND "2020-10-31" THEN user_commission.id END)) AS "total_sales_3",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-11-01" AND "2020-11-30" THEN user_commission.id END)) AS "total_sales_4",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-12-01" AND "2020-12-31" THEN user_commission.id END)) AS "total_sales_5",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-01-01" AND "2021-01-31" THEN user_commission.id END)) AS "total_sales_6",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-02-01" AND "2021-02-28" THEN user_commission.id END)) AS "total_sales_7",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-03-01" AND "2021-03-31" THEN user_commission.id END)) AS "total_sales_8",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-04-01" AND "2021-04-30" THEN user_commission.id END)) AS "total_sales_9",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-05-01" AND "2021-05-31" THEN user_commission.id END)) AS "total_sales_10",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-06-01" AND "2021-06-30" THEN user_commission.id END)) AS "total_sales_11",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-07-01" AND "2021-07-31" THEN user_commission.id END)) AS "total_sales_12"
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN customers ON customers.user_id = users.id
LEFT JOIN user_commission ON user_commission.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
GROUP BY users.id
And here's the output from EXPLAIN SELECT:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
users
index
PRIMARY
PRIMARY
4
16
Using where
1
SIMPLE
people
eq_ref
PRIMARY
PRIMARY
4
users.person_id
1
Using where
1
SIMPLE
customers
ref
user_id
user_id
5
users.id
284
Using where
1
SIMPLE
user_commission
ref
comm_index,user_id
comm_index
4
users.id
465
Using index
comm_index is a UNIQUE index on the user_commission table, covering user_id,order_id,commission_paid_at.
I'm a bit stumped as to what to do next — there are indexes in place, and not many rows for the engine to parse per table.
Any clues would be much appreciated — thanks!
Lets first start that this query going for EVERY user (with the few exceptions you want to EXCLUDE -- I did not include that exclusion list in my query ), I would ask why are you trying to show sales and commission counts for all users to see how all users are doing. I would think that if I was a rep for your company, I only care about how MY activities are going.
Next, this might be a good instance to suggest a pre-aggregation table of the counts per month per user so you dont have to keep re-trying to compute on the fly. If the data does not change such as when a new customer is signed-up, or a sales commission is entered, you may be best to keep those computed at the end of every day for the given user/month/year it represents. But that too is an alternative.
Now, the WHY you are probably getting hit with large delay times, and you are using COUNT( DISTINCT ) on the given customer and commission tables is you are getting a Cartesian result. So, lets go with a scenario you have 100 users. Of those users, in a given month, one user has 3 new customers, 2 commissions because they are new. Yet a long-term rep has 37 new customers and 45 commissions. THESE are the ones killing you. Because your left-join is on user ID, it is taking 1 record from the customers table for a given user and joining that to the commission table for the same user id the sale recorded against.. So the first rep it creates 6 entries to count against (3 * 2). But the second user goes through 1,665 iterations. So, this Cartesian (or cross-join) result is killing you.
So that is the WHY its failing. Now, on to the solution I have for you. You appear to have a bunch of hard-coded dates left-and-right through the code. What happens when next month comes. Do you have to hard-code fix the begin/end dates? If so, then the solution I have for you will simplify that all.
By using the "WITH" (Common-Table-Expression aka CTE), you can pre-write queries and use those "aliase" names AS-IF you wrote each of the queries within a multi-nested query. But the benefit is the query is written once, even if you keep re-using the alias name reference.
So here is the query and I'll describe / break it down next so you can view/follow along.
with Rolling12 as
(
select
#rptMonth := #rptMonth +1 as QryMonth,
#beginDate as AtLeastDate,
date_add( #beginDate, interval 1 month ) as AndLessThanDate,
#beginDate := date_add( #beginDate, interval 1 month )
from
user_commission
JOIN ( select #rptMonth := 0,
#beginDate := date_sub(
date_add(
date_sub( curdate(),
interval day( curdate()) -1 day ),
interval 1 month ),
interval 1 year )
) sqlvars
limit 12
),
MinMaxDates as
(
select
min( AtLeastDate ) MinDate,
max( AndLessThanDate ) MaxDate
from
Rolling12
),
SumCommission as
(
select
uc.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) commission01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) commission02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) commission03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) commission04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) commission05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) commission06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) commission07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) commission08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) commission09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) commission10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) commission11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) commission12
from
user_commission uc
JOIN Rolling12 R12
on uc.commission_paid_at >= R12.AtLeastDate
AND uc.commission_paid_at < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
uc.commission_paid_at >= mm.MinDate
AND uc.commission_paid_at < mm.MaxDate
group by
uc.user_id
),
SumCustomers as
(
select
c.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) customers01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) customers02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) customers03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) customers04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) customers05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) customers06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) customers07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) customers08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) customers09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) customers10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) customers11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) customers12
from
customers c
JOIN Rolling12 R12
on c.commission_start_date >= R12.AtLeastDate
AND c.commission_start_date < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
c.commission_start_date >= mm.MinDate
AND c.commission_start_date < mm.MaxDate
group by
c.user_id
)
select
u.id,
p.full_name AS "Name",
com.Commission01,
com.Commission02,
com.Commission03,
com.Commission04,
com.Commission05,
com.Commission06,
com.Commission07,
com.Commission08,
com.Commission09,
com.Commission10,
com.Commission11,
com.Commission12,
cst.Customers01,
cst.Customers02,
cst.Customers03,
cst.Customers04,
cst.Customers05,
cst.Customers06,
cst.Customers07,
cst.Customers08,
cst.Customers09,
cst.Customers10,
cst.Customers11,
cst.Customers12
from
users u
JOIN People p
ON u.person_id = p.id
LEFT JOIN SumCommission com
on u.id = com.user_id
LEFT JOIN SumCustomers cst
on u.id = cst.user_id;
You state that you are running on a rolling 12-month period. For this, I have my first CTE alias "Rolling12". This query is a setup for the rest of the query. It creates MySQL variables and keeps computing an updated begin/end date for each month represented. It starts by taking the current date ex: July 6 and rolls it back to July 1. Then adds 1 month to get August 1, then subtracts 1 year from that Aug 1, 2020 for the beginning period of your 12-month rolling computation. I then simple join to the commission table and limit to 12 records, each time going forward and making a column for the beginning and ending dates of the pay periods and just assigning a month ID sequence to it.
If you highlight and just run the query inside the With Rolling12 as ( the query ), you will see what it builds out. This prevents all the hard-coding dates associated with your current 24 case/count distinct when conditions.
Then a comma and the next CTE for MinMaxDates. Here, I am querying from this 12-month roll to get the minimum begin and end date for the entire period you are reporting, so when querying the sales customers and commissions, I can join to this as a single row result for the begin/end dates of details.
Next are the SumCommission and SumCustomers. These are joining against the CTE "Rolling12" records with the JOIN so we can associate the specific commission or customer to that one date range entry. So from that, I get the query month of the rolling 12 and sum() it. But since sum() of a null results in null, I wrap it with coalesce( calculation, 0 ) to show 0 as a worst-case.
The reason for each of these being run individually and grouped by user is to prevent the Cartesian result previously mentioned.
Once those individual parts are all done, I now start with the user, join to people to get the name, then LEFT-JOIN to the respective other SUM() queries. So, if a user had only a new customer for a month, but no commission, you would only have a record in that set and not the other, thus preventing the duplication of query results requiring your DISTINCT to begin with.
So, even though it looks long and may be confusing, especially the WITH CTE context, look at it to its individual parts. The SUMs() are pre-grouped by user ID, so each sum() result will only have one possible record per user for that given period.
As for indexes to help optimize the query, I would ensure the commission and customer table have an index on ( dateField, useridField ) respectively.
I would be interested in knowing how well this performs when you give it a shot.
First of all, you select about all rows instead of only the months you are interested in.
Solution: A WHERE clause to restrict the rows taken into consideration.
Then you cross join a user's customers with the user's commissions, thus building a huge intermediate result you don't need and want.
Solution: Aggregate before joining.
In order to
This can look thus for instance:
SELECT
people.full_name AS "Name",
cu.eligible_customers_month_1,
cu.eligible_customers_month_2,
...
co.total_sales_1,
co.total_sales_2,
...
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as eligible_customers_month_1,
max(case when month_index = 2 then cnt else 0 end) as eligible_customers_month_2,
...
from
(
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_start_date) * 12 + month(commission_start_date))
+ 1 as month_index,
count(*) as cnt
from customers
where commission_start_date >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) cu ON cu.user_id = users.id
LEFT JOIN
(
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as total_sales_1,
max(case when month_index = 2 then cnt else 0 end) as total_sales_2,
...
from
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_paid_at) * 12 + month(commission_paid_at))
+ 1 as month_index,
count(*) as cnt
from user_commission
where commission_paid_at >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) co ON co.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
ORDER BY users.id;
Recommended indexes:
create index idx1 on customers (commission_start_date, user_id);
create index idx2 on user_commission (commission_paid_at, user_id);
Can anyone help me to understand this big sql query. How do I break it down in small chunks to understand it ?
select t.Instrument as Instrument ,ClearingId as ClearingId,
ISNULL( PrevQty ,0) AS PrevQty,SettlePrice,
ISNULL(TodayBuyQty,0) as TodayBuyQty,
ISNULL( TodaySellQty ,0) AS TodaySellQty,
ISNULL(PrevQty +TodayBuyQty-TodaySellQty,0) as NetQty,
TodayBuyPrice, TodaySellPrice,LTP,PnL,Token
from
(
select Instrument,w.ClearingId as ClearingId,
ISNULL( PrevQty ,0) AS PrevQty,ISNULL(TodayBuyQty,0) as TodayBuyQty,
ISNULL( TodaySellQty ,0) AS TodaySellQty,
TodayAvgBuyPrice as TodayBuyPrice,TodayAvgSellPrice as TodaySellPrice,
LTP,PnL,w.Token
from
(
select Symbol as Instrument, ClearingId,
ISNULL(TodayBuyQty,0) as TodayBuyQty,
TodayAvgBuyPrice,
ISNULL( -TodaySellQty ,0) AS TodaySellQty,
TodayAvgSellPrice, NULL as LTP ,NULL as PnL ,
w.Token as Token
from
(
select Token, sum(Qty) as NetPositionQty, ClearingId,
sum(CASE WHEN Qty < 0 THEN Qty ELSE 0 END) as TodaySellQty,
sum(CASE WHEN Qty > 0 THEN Qty ELSE 0 END) as TodayBuyQty,
sum(CASE WHEN Qty < 0 THEN Qty * Price ELSE 0 END)
/
NULLIF(sum(CASE WHEN Qty < 0 THEN Qty ELSE 0 END), 0) as TodayAvgSellPrice,
sum(CASE WHEN Qty > 0 THEN Qty * Price ELSE 0 END)
/
NULLIF(sum(CASE WHEN Qty > 0 THEN Qty ELSE 0 END), 0) as TodayAvgBuyPrice
from
(
select m.Token,
(CASE WHEN ClearingId = 'SATP' THEN 'STRAITS' ELSE CASE WHEN ClearingId = 'BATP' THEN 'BPI' ELSE 'UOB' END END ) as ClearingId ,
Price/CAST(Multiplier AS float ) as Price,Qty
from
(
select Token , Exchange as ClearingId ,
LastTradePrice as Price ,
CASE WHEN Side = 'S' THEN -LastTradeQuantity ELSE LastTradeQuantity END as Qty
from Strategy_Orders
where ExchangeStatus in (9,10) )m
JOIN TokenMap t ON ( m.Token = t.Token)
UNION ALL
select m.Token, (CASE WHEN ClearingId = 'SATP' THEN 'STRAITS' ELSE CASE WHEN ClearingId = 'BATP' THEN 'BPI' ELSE 'UOB' END END ) as ClearingId ,
Price/CAST(Multiplier AS float ) as Price,
Qty
from
(
select Token , Exchange as ClearingId ,
LastTradePrice as Price ,
CASE WHEN Side = 'S' THEN -LastTradeQuantity ELSE LastTradeQuantity END as Qty
from Manual_Orders
where ExchangeStatus in (9,10) )m
JOIN TokenMap t ON ( m.Token = t.Token)
UNION ALL
select Token , ClearingId , TodayBuyPrice ,
TodayBuyQty as Qty
from EOD_Holdings
where CurrentDate =
( select top 1 CurrentDAte from EOD_Holdings
order by CurrentDAte desc
)
UNION ALL
select Token , ClearingId , TodaySellPrice ,
TodaySellQty as Qty
from EOD_Holdings
where CurrentDate = (
select top 1 CurrentDAte from EOD_Holdings
order by CurrentDAte desc
)
) x
group by Token,ClearingId) w
JOIN(select Token, Symbol from TokenMAp ) h on w.Token = h.Token
) w
FULL OUTER JOIN(
select Token, PrevQty , ClearingId
from EOD_Holdings
where CurrentDate = ( select top 1 CurrentDAte from EOD_Holdings order by CurrentDAte desc
)
) h
on w.Token = h.Token and w.ClearingId = h.ClearingId
)t
JOIN (
select * from LatestSettlePrices
) sp
on t.Instrument = sp.Instrument
You can break the query into chunks by looking at each subquery ("select ..." ) separately and running them to see the results. You need to start with the innermost queries that do not have other select statements in the "from" or "where" clause.
Also note that this query does not seem to be a clear, neither an optimal solution.
You would want to avoid using full outer joins and union all statements for the best performance.
How do I create an entirely new column in SQL while using CASE WHEN, GROUPING_ID(), and ROLLUP() syntax?
So far I have tried:
SELECT Country, ContactTitle, COUNT(ContactTitle) AS Count
,CASE(
WHEN
GROUPING_ID(Legend) = 0 THEN ' '
WHEN
GROUPING_ID(Legend) = 1 THEN 'SUBTOTAL(Country)')
GROUP BY ROLLUP(Country, ContactTitle)
FROM dbo.Customers
SELECT Country, ContactTitle, COUNT(ContactTitle) AS Count,
CASE
WHEN GROUPING_ID(Country,ContactTitle) = 0 THEN ''
WHEN GROUPING_ID(Country,ContactTitle) = 1 THEN CONCAT('Subtotal for ',Country)
END AS Legend
FROM dbo.Customers
GROUP BY ROLLUP(Country, ContactTitle);
A CASE needs an END.
And while testing, you could output the GROUPING_ID or GROUPING to understand what they return.
SELECT Country, ContactTitle, COUNT(*) AS Count,
(CASE
WHEN GROUPING_ID(Country, ContactTitle) = 1
THEN CONCAT('SUBTOTAL(',Country,')')
WHEN GROUPING(Country) = 1
THEN 'TOTAL'
ELSE ' '
END) AS Legend
FROM Customers
GROUP BY Country, ContactTitle WITH ROLLUP;
i have the following query, tried using case statement
SELECT t.inst_id, t.inst_username, tcm.city_name,
SUM(CASE WHEN psb.pms_student_bucket_id IS NULL THEN 1 ELSE 0 END) AS not_assiged,
SUM(CASE WHEN COUNT(psb.pms_student_bucket_id) BETWEEN 1 AND 50 THEN 1 ELSE 0 END) AS '1-50',
SUM(CASE WHEN COUNT(psb.pms_student_bucket_id) > 50 THEN 1 ELSE 0 END) AS ' > 50'
FROM tbl_si_di t
JOIN tbl_city_master tcm ON tcm.city_id = t.city_ref_id
JOIN tbl_si_students tss ON tss.inst_ref_id = t.inst_id
LEFT JOIN pms_student_bucket psb ON psb.user_ref_id = tss.user_ref_id
GROUP BY t.inst_id;
I need SUM of pms_student_bucket_id column when their COUNT is '1-50' and '>50'. Right now this query is saying Invalid use of group function.
How would I SUM on COUNT of "pms_student_bucket_id" equals/between some value in mysql?
you could put it in a subquery
SELECT inst_id, inst_username, city_name,
SUM(pms_student_bucket_id IS NULL ) AS not_assiged,
SUM(num_bucket >=10 AND num_bucket <= 20) AS '10 - 20'
SUM(num_bucket <= 50) AS '1-50',
SUM(num_bucket > 50) AS ' > 50'
FROM
( SELECT t.inst_id, t.inst_username, tcm.city_name,psb.pms_student_bucket_id,
COUNT(psb.pms_student_bucket_id) as num_bucket
FROM tbl_si_di t
JOIN tbl_city_master tcm ON tcm.city_id = t.city_ref_id
JOIN tbl_si_students tss ON tss.inst_ref_id = t.inst_id
LEFT JOIN pms_student_bucket psb ON psb.user_ref_id = tss.user_ref_id
GROUP BY t.inst_id
)t1
GROUP BY inst_id;
note you can use the boolean values returned to do what you want.. aka you don't need a case statement, the boolean value returns a 1 or 0 which can then be summed..