I am trying to optimize my query when utilizing a couple very large data sets. My current query takes a little while to process, even for only a couple days worth of data, whereas this would be intended to pull monthly data.
My question would be what is the best way to pull this off: (I have used one of the datasets in this example, but keep in mind there would be three with basically the same structure, all in the same query)
select u.id, u.name,
count(distinct(case when sales.tag='event' then sales.id end)) as
eventsales,
count(distinct(case when sales.tag='onpremise' then sales.id end))
as onpresales,
count(distinct(case when sales.tag='offpremsales' then sales.id
end)) as offpresales,
count(distinct(case when sales.fulfillment='yes' and
sales.premise='on' then sales.id end)) as fullonsales,
count(distinct(case when sales.fulfillment='no' and
sales.premise='on' then sales.id end)) as fulloffsales
from users u
left join (
select * from sales where org='XXXX' and invoicedate BEWTEEN '2018-
04-01' and '2018-04-10' and status='active'
) sales on sales.user=u.id
where u.status='active'
group by u.org, u.id, u.name
order by u.team
I am no pro, still learning, but is this optimal to perform?
Would it be better to ditch the derived table and utilize 5 subqueries?
Since there are only a couple subtle changes in each, should I create multiple derived tables instead?
Also, my index used in this example would be sales table: org, invoicedate, status
However, from my research, MySQL does not use indexes on derived tables. Is this accurate?
Thanks in advance, let me know if I need to provide any other info.
Entire actual query below
select t.name as team, u.name as "REP NAME",
count(distinct activity.id) as "TOTAL VISITS",
count(distinct activity.account_id) as "UNIQUE VISITS",
ROUND((select sum(s.volumece) from lpmysqldb.sales s where
s.org_id='555b918ae4b07b6ac5050852' and s.account_id IN (select
account_id from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and user_id=u.id and
(completed_at between '2018-04-01' and '2018-04-04') and
tag='visit' and accountname is not null and (status='active' or
status='true' or status='1')) and (s.invoice_date between
DATE_FORMAT(CURDATE(), '%Y-01-01') and DATE_FORMAT(CURDATE(), '%Y-
%m-%d'))),2) as "CURRENT YEAR VOLUME",
ROUND((select sum(s.volumece) from lpmysqldb.sales s where
s.org_id='555b918ae4b07b6ac5050852' and s.account_id IN (select
account_id from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and user_id=u.id and
(completed_at between '2018-04-01' and '2018-04-04') and
tag='visit' and accountname is not null and (status='active' or
status='true' or status='1')) and (s.invoice_date between
(DATE_FORMAT(CURDATE(), '%Y-01-01') - INTERVAL 1 YEAR) and
(DATE_FORMAT(CURDATE(), '%Y-%m-%d') - INTERVAL 1 YEAR))),2) as
"PREVIOUS YEAR VOLUME",
count(distinct placement.id) as "COMMITMENTS ADDED",
CASE WHEN
count(distinct activity.account_id) = 0 THEN (count(distinct
placement.id) / 1)
else (cast(count(distinct placement.id) as decimal(10,2)) /
cast(count(distinct activity.account_id) as decimal(10,2)))
END as "UNIQUE VISIT TO COMMITMENT %",
CASE WHEN o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
then placement.id end))
END as "COMMITMENTS FULFILLED",
CASE WHEN o.mode='basic' then 1 else
(CASE WHEN
count(distinct placement.id) = 0 THEN (count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id end)) /
1)
else (cast(count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id end))
as decimal(10,2)) / cast(count(distinct placement.id) as
decimal(10,2)))
end)
END as "COMMITMENT TO FULFILLMENT %",
CASE WHEN o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
AND (premise = 1 or premise IS NULL) then placement.id end))
END as "ON PREM COMMITMENTS FULFILLED",
CASE WHEN o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
AND premise = 0 then placement.id end))
END
CASE WHEN o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
AND ispackage IN ('1','true','active') then placement.id end))
END as "PACKAGE COMMITMENTS FULFILLED",
CASE WHEN o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
AND isdraft IN ('1','true','active') then placement.id end))
END as "DRAFT COMMITMENTS FULFILLED",
(select count(distinct id) from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and user_id=u.id and
(completed_at between '2018-04-01' and '2018-04-04') and
activity_name IN ('Display','Floor Display') and (activity.status
IN ('1','active','true','') OR activity.status IS NULL)) as
"DISPLAYS BUILT",
(select count(distinct id) from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and user_id=u.id and
(completed_at between '2018-04-01' and '2018-04-04') and
tag='event' and (activity.status IN ('1','active','true','') OR
activity.status IS NULL)) as "EVENTS"
from lpmysqldb.users u
left join lpmysqldb.teams t on t.team_id=u.team_id
left join lpmysqldb.organizations o on o.id=t.org_id
left join (select * from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and (completed_at between '2018-
04-01' and '2018-04-04') and tag='visit' and accountname is not
null and (status IN ('1','active','true','') OR status IS NULL))
activity on activity.user_id=u.id
left join (select * from lpmysqldb.placements where
orgid='555b918ae4b07b6ac5050852' and (placementdate between '2018-
04-01' and '2018-04-04') and (status IN ('1','active','true','') OR
status IS NULL)) placement on placement.userid=u.id
where u.org_id='555b918ae4b07b6ac5050852'
and u.status IN ('active','true','1')
and istestuser!='1'
group by u.org_id, t.name, u.id, u.name, o.mode
order by t.name asc, count(distinct activity.id) desc
Related
I have a MySQL (MariaDB 10.3) query, which takes almost 60 seconds to run. I need to optimise this significantly, as it's frustrating users of my web app.
The query returns the name of a user then 12 columns showing how many customers they signed up, by month, who are eligible to earn commission. It then returns a further 12 columns showing how many commission entries were recorded for the user within each month. (The query needs to return in this 24-column format for compatibility reasons.)
Here's the query:
SELECT
people.full_name AS "Name",
/* Count how many unique customers are eligible for commission in each month, for a rolling 12-month window */
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-08-01" AND "2020-08-31" THEN customers.id END)) AS "eligible_customers_month_1",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-09-01" AND "2020-09-30" THEN customers.id END)) AS "eligible_customers_month_2",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-10-01" AND "2020-10-31" THEN customers.id END)) AS "eligible_customers_month_3",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-11-01" AND "2020-11-30" THEN customers.id END)) AS "eligible_customers_month_4",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2020-12-01" AND "2020-12-31" THEN customers.id END)) AS "eligible_customers_month_5",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-01-01" AND "2021-01-31" THEN customers.id END)) AS "eligible_customers_month_6",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-02-01" AND "2021-02-28" THEN customers.id END)) AS "eligible_customers_month_7",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-03-01" AND "2021-03-31" THEN customers.id END)) AS "eligible_customers_month_8",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-04-01" AND "2021-04-30" THEN customers.id END)) AS "eligible_customers_month_9",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-05-01" AND "2021-05-31" THEN customers.id END)) AS "eligible_customers_month_10",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-06-01" AND "2021-06-30" THEN customers.id END)) AS "eligible_customers_month_11",
COUNT(DISTINCT(CASE WHEN customers.commission_start_date BETWEEN "2021-07-01" AND "2021-07-31" THEN customers.id END)) AS "eligible_customers_month_12",
/* In each month of a rolling 12-month window, count how many unique commission entries were recorded. */
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-08-01" AND "2020-08-31" THEN user_commission.id END)) AS "total_sales_1",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-09-01" AND "2020-09-30" THEN user_commission.id END)) AS "total_sales_2",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-10-01" AND "2020-10-31" THEN user_commission.id END)) AS "total_sales_3",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-11-01" AND "2020-11-30" THEN user_commission.id END)) AS "total_sales_4",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2020-12-01" AND "2020-12-31" THEN user_commission.id END)) AS "total_sales_5",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-01-01" AND "2021-01-31" THEN user_commission.id END)) AS "total_sales_6",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-02-01" AND "2021-02-28" THEN user_commission.id END)) AS "total_sales_7",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-03-01" AND "2021-03-31" THEN user_commission.id END)) AS "total_sales_8",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-04-01" AND "2021-04-30" THEN user_commission.id END)) AS "total_sales_9",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-05-01" AND "2021-05-31" THEN user_commission.id END)) AS "total_sales_10",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-06-01" AND "2021-06-30" THEN user_commission.id END)) AS "total_sales_11",
COUNT(DISTINCT(CASE WHEN user_commission.commission_paid_at BETWEEN "2021-07-01" AND "2021-07-31" THEN user_commission.id END)) AS "total_sales_12"
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN customers ON customers.user_id = users.id
LEFT JOIN user_commission ON user_commission.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
GROUP BY users.id
And here's the output from EXPLAIN SELECT:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
users
index
PRIMARY
PRIMARY
4
16
Using where
1
SIMPLE
people
eq_ref
PRIMARY
PRIMARY
4
users.person_id
1
Using where
1
SIMPLE
customers
ref
user_id
user_id
5
users.id
284
Using where
1
SIMPLE
user_commission
ref
comm_index,user_id
comm_index
4
users.id
465
Using index
comm_index is a UNIQUE index on the user_commission table, covering user_id,order_id,commission_paid_at.
I'm a bit stumped as to what to do next — there are indexes in place, and not many rows for the engine to parse per table.
Any clues would be much appreciated — thanks!
Lets first start that this query going for EVERY user (with the few exceptions you want to EXCLUDE -- I did not include that exclusion list in my query ), I would ask why are you trying to show sales and commission counts for all users to see how all users are doing. I would think that if I was a rep for your company, I only care about how MY activities are going.
Next, this might be a good instance to suggest a pre-aggregation table of the counts per month per user so you dont have to keep re-trying to compute on the fly. If the data does not change such as when a new customer is signed-up, or a sales commission is entered, you may be best to keep those computed at the end of every day for the given user/month/year it represents. But that too is an alternative.
Now, the WHY you are probably getting hit with large delay times, and you are using COUNT( DISTINCT ) on the given customer and commission tables is you are getting a Cartesian result. So, lets go with a scenario you have 100 users. Of those users, in a given month, one user has 3 new customers, 2 commissions because they are new. Yet a long-term rep has 37 new customers and 45 commissions. THESE are the ones killing you. Because your left-join is on user ID, it is taking 1 record from the customers table for a given user and joining that to the commission table for the same user id the sale recorded against.. So the first rep it creates 6 entries to count against (3 * 2). But the second user goes through 1,665 iterations. So, this Cartesian (or cross-join) result is killing you.
So that is the WHY its failing. Now, on to the solution I have for you. You appear to have a bunch of hard-coded dates left-and-right through the code. What happens when next month comes. Do you have to hard-code fix the begin/end dates? If so, then the solution I have for you will simplify that all.
By using the "WITH" (Common-Table-Expression aka CTE), you can pre-write queries and use those "aliase" names AS-IF you wrote each of the queries within a multi-nested query. But the benefit is the query is written once, even if you keep re-using the alias name reference.
So here is the query and I'll describe / break it down next so you can view/follow along.
with Rolling12 as
(
select
#rptMonth := #rptMonth +1 as QryMonth,
#beginDate as AtLeastDate,
date_add( #beginDate, interval 1 month ) as AndLessThanDate,
#beginDate := date_add( #beginDate, interval 1 month )
from
user_commission
JOIN ( select #rptMonth := 0,
#beginDate := date_sub(
date_add(
date_sub( curdate(),
interval day( curdate()) -1 day ),
interval 1 month ),
interval 1 year )
) sqlvars
limit 12
),
MinMaxDates as
(
select
min( AtLeastDate ) MinDate,
max( AndLessThanDate ) MaxDate
from
Rolling12
),
SumCommission as
(
select
uc.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) commission01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) commission02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) commission03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) commission04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) commission05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) commission06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) commission07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) commission08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) commission09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) commission10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) commission11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) commission12
from
user_commission uc
JOIN Rolling12 R12
on uc.commission_paid_at >= R12.AtLeastDate
AND uc.commission_paid_at < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
uc.commission_paid_at >= mm.MinDate
AND uc.commission_paid_at < mm.MaxDate
group by
uc.user_id
),
SumCustomers as
(
select
c.user_id,
coalesce( sum( CASE WHEN R12.QryMonth = 1 then 1 else 0 end ), 0) customers01,
coalesce( sum( CASE WHEN R12.QryMonth = 2 then 1 else 0 end ), 0) customers02,
coalesce( sum( CASE WHEN R12.QryMonth = 3 then 1 else 0 end ), 0) customers03,
coalesce( sum( CASE WHEN R12.QryMonth = 4 then 1 else 0 end ), 0) customers04,
coalesce( sum( CASE WHEN R12.QryMonth = 5 then 1 else 0 end ), 0) customers05,
coalesce( sum( CASE WHEN R12.QryMonth = 6 then 1 else 0 end ), 0) customers06,
coalesce( sum( CASE WHEN R12.QryMonth = 7 then 1 else 0 end ), 0) customers07,
coalesce( sum( CASE WHEN R12.QryMonth = 8 then 1 else 0 end ), 0) customers08,
coalesce( sum( CASE WHEN R12.QryMonth = 9 then 1 else 0 end ), 0) customers09,
coalesce( sum( CASE WHEN R12.QryMonth = 10 then 1 else 0 end ), 0) customers10,
coalesce( sum( CASE WHEN R12.QryMonth = 11 then 1 else 0 end ), 0) customers11,
coalesce( sum( CASE WHEN R12.QryMonth = 12 then 1 else 0 end ), 0) customers12
from
customers c
JOIN Rolling12 R12
on c.commission_start_date >= R12.AtLeastDate
AND c.commission_start_date < R12.AndLessThanDate
-- only a single row returned for MinMaxDates source
JOIN MinMaxDates mm
where
c.commission_start_date >= mm.MinDate
AND c.commission_start_date < mm.MaxDate
group by
c.user_id
)
select
u.id,
p.full_name AS "Name",
com.Commission01,
com.Commission02,
com.Commission03,
com.Commission04,
com.Commission05,
com.Commission06,
com.Commission07,
com.Commission08,
com.Commission09,
com.Commission10,
com.Commission11,
com.Commission12,
cst.Customers01,
cst.Customers02,
cst.Customers03,
cst.Customers04,
cst.Customers05,
cst.Customers06,
cst.Customers07,
cst.Customers08,
cst.Customers09,
cst.Customers10,
cst.Customers11,
cst.Customers12
from
users u
JOIN People p
ON u.person_id = p.id
LEFT JOIN SumCommission com
on u.id = com.user_id
LEFT JOIN SumCustomers cst
on u.id = cst.user_id;
You state that you are running on a rolling 12-month period. For this, I have my first CTE alias "Rolling12". This query is a setup for the rest of the query. It creates MySQL variables and keeps computing an updated begin/end date for each month represented. It starts by taking the current date ex: July 6 and rolls it back to July 1. Then adds 1 month to get August 1, then subtracts 1 year from that Aug 1, 2020 for the beginning period of your 12-month rolling computation. I then simple join to the commission table and limit to 12 records, each time going forward and making a column for the beginning and ending dates of the pay periods and just assigning a month ID sequence to it.
If you highlight and just run the query inside the With Rolling12 as ( the query ), you will see what it builds out. This prevents all the hard-coding dates associated with your current 24 case/count distinct when conditions.
Then a comma and the next CTE for MinMaxDates. Here, I am querying from this 12-month roll to get the minimum begin and end date for the entire period you are reporting, so when querying the sales customers and commissions, I can join to this as a single row result for the begin/end dates of details.
Next are the SumCommission and SumCustomers. These are joining against the CTE "Rolling12" records with the JOIN so we can associate the specific commission or customer to that one date range entry. So from that, I get the query month of the rolling 12 and sum() it. But since sum() of a null results in null, I wrap it with coalesce( calculation, 0 ) to show 0 as a worst-case.
The reason for each of these being run individually and grouped by user is to prevent the Cartesian result previously mentioned.
Once those individual parts are all done, I now start with the user, join to people to get the name, then LEFT-JOIN to the respective other SUM() queries. So, if a user had only a new customer for a month, but no commission, you would only have a record in that set and not the other, thus preventing the duplication of query results requiring your DISTINCT to begin with.
So, even though it looks long and may be confusing, especially the WITH CTE context, look at it to its individual parts. The SUMs() are pre-grouped by user ID, so each sum() result will only have one possible record per user for that given period.
As for indexes to help optimize the query, I would ensure the commission and customer table have an index on ( dateField, useridField ) respectively.
I would be interested in knowing how well this performs when you give it a shot.
First of all, you select about all rows instead of only the months you are interested in.
Solution: A WHERE clause to restrict the rows taken into consideration.
Then you cross join a user's customers with the user's commissions, thus building a huge intermediate result you don't need and want.
Solution: Aggregate before joining.
In order to
This can look thus for instance:
SELECT
people.full_name AS "Name",
cu.eligible_customers_month_1,
cu.eligible_customers_month_2,
...
co.total_sales_1,
co.total_sales_2,
...
FROM users
LEFT JOIN people ON people.id = users.person_id
LEFT JOIN
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as eligible_customers_month_1,
max(case when month_index = 2 then cnt else 0 end) as eligible_customers_month_2,
...
from
(
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_start_date) * 12 + month(commission_start_date))
+ 1 as month_index,
count(*) as cnt
from customers
where commission_start_date >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) cu ON cu.user_id = users.id
LEFT JOIN
(
(
select
user_id,
max(case when month_index = 1 then cnt else 0 end) as total_sales_1,
max(case when month_index = 2 then cnt else 0 end) as total_sales_2,
...
from
select
user_id,
(year(current_date) * 12 + month(current_date))
- (year(commission_paid_at) * 12 + month(commission_paid_at))
+ 1 as month_index,
count(*) as cnt
from user_commission
where commission_paid_at >=
last_day(current_date) + interval 1 day - interval 1 year
group by user_id, month_num
) months
group by user_id
) co ON co.user_id = users.id
WHERE users.id NOT IN (103, 2, 155, 24, 137, 141, 143, 149, 152, 3, 135)
ORDER BY users.id;
Recommended indexes:
create index idx1 on customers (commission_start_date, user_id);
create index idx2 on user_commission (commission_paid_at, user_id);
For my query, the two derived tables at the bottom are causing a crazy slow up for this query. The query, as is, takes about 45-55 seconds to execute.. NOW, when i remove just one of those derived tables (it does not matter which one) the query goes down to 0.1 - 0.3 seconds. My questions; Is there an issue with having multiple derived tables? Is there a better way to execute this? My indexes all seem to be correct, I will also include the explain from this query.
select t.name as team, u.name as "REP NAME",
count(distinct activity.id) as "TOTAL VISITS",
count(distinct activity.account_id) as "UNIQUE VISITS",
count(distinct placement.id) as "COMMITMENTS ADDED",
CASE WHEN
count(distinct activity.account_id) = 0 THEN (count(distinct
placement.id) / 1)
else (cast(count(distinct placement.id) as decimal(10,2)) /
cast(count(distinct activity.account_id) as decimal(10,2)))
end as "UNIQUE VISIT TO COMMITMENT %",
case when o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
then placement.id else 0 end))
end as "COMMITMENTS FULFILLED",
case when o.mode='basic' then 1 else
(CASE WHEN
count(distinct placement.id) = 0 THEN (count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id else 0
end)) / 1)
else (cast(count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id else 0
end)) as decimal(10,2)) / cast(count(distinct placement.id) as
decimal(10,2)))
end) end as "COMMITMENT TO FULFILLMENT %"
from lpmysqldb.users u
left join lpmysqldb.teams t on t.team_id=u.team_id
left join lpmysqldb.organizations o on o.id=t.org_id
left join (select * from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and completed_at>='2018-05-01' and
completed_at<='2018-06-01' and tag='visit' and accountname is not
null and (status='active' or status='true' or status='1')) as
activity on activity.user_id=u.id
left join (select * from lpmysqldb.placements where
orgid='555b918ae4b07b6ac5050852' and placementdate>='2018-05-01' and
placementdate<='2018-06-01' and (status IN ('1','active','true') or
status is null)) as placement on placement.userid=u.id
where u.org_id='555b918ae4b07b6ac5050852'
and (u.status='active' or u.status='true' or u.status='1')
and istestuser!='1'
group by u.org_id, t.name, u.id, u.name, o.mode
order by count(distinct activity.id) desc
Thank you for assistance!
I have edited below with changing the two bottom joins from joining on subqueries to joining on the table directly. Still yielding the same result.
This is a SLIGHTLY restructured query of your same. Might be simplified as the last two subqueries are all pre-aggregated for your respective counts and count distincts so you can use those column names directly instead of showing all the count( distinct ) embedded throughout the query.
I also tried to simplify the division by multiplying a given count by 1.00 to force decimal-based precision as result.
select
t.name as team,
u.name as "REP NAME",
Activity.DistIdCnt as "TOTAL VISITS",
Activity.UniqAccountCnt as "UNIQUE VISITS",
Placement.DistIdCnt as "COMMITMENTS ADDED",
Placement.DistIdCnt /
CASE WHEN Activity.UniqAccountCnt = 0
THEN 1.00
ELSE Activity.UniqAccountCnt * 1.00
end as "UNIQUE VISIT TO COMMITMENT %",
case when o.mode = 'basic'
then Placement.DistIdCnt
else Placement.DistFulfillCnt
end as "COMMITMENTS FULFILLED",
case when o.mode = 'basic'
then 1
else ( Placement.DistFulfillCnt /
CASE when Placement.DistIdCnt = 0
then 1.00
ELSE Placement.DistIdCnt * 1.00
END TRANSACTION )
END as "COMMITMENT TO FULFILLMENT %"
from
lpmysqldb.users u
left join lpmysqldb.teams t
on u.team_id = t.team_id
left join lpmysqldb.organizations o
on t.org_id = o.id
left join
( select
user_id,
count(*) as AllRecs,
count( distinct id ) DistIdCnt,
count( distinct account_id) as UniqAccountCnt
from
lpmysqldb.activity
where
org_id = '555b918ae4b07b6ac5050852'
and completed_at>='2018-05-01'
and completed_at<='2018-06-01'
and tag='visit'
and accountname is not null
and status IN ( '1', 'active', 'true')
group by
user_id ) activity
on u.id = activity.user_id
left join
( select
userid,
count(*) AllRecs,
count(distinct id) as DistIdCnt,
count(distinct( case when commitmentstatus = 'fullfilled'
then id
else 0 end )) DistFulfillCnt
from
lpmysqldb.placements
where
orgid = '555b918ae4b07b6ac5050852'
and placementdate >= '2018-05-01'
and placementdate <= '2018-06-01'
and ( status is null OR status IN ('1','active','true')
group by
userid ) as placement
on u.id = placement.userid
where
u.org_id = '555b918ae4b07b6ac5050852'
and u.status IN ( 'active', 'true', '1')
and istestuser != '1'
group by
u.org_id,
t.name,
u.id,
u.name,
o.mode
order by
activity.DistIdCnt desc
FINALLY, your inner queries are querying for ALL users. If you have a large count of users that are NOT active, you MIGHT exclude those users from each inner query by adding those join/criteria there too such as...
( ...
from
lpmysqldb.placements
JOIN lpmysqldb.users u2
on placements.userid = u2.id
and u2.status IN ( 'active', 'true', '1')
and u2.istestuser != '1'
where … ) as placement
The following query returns the correct data but I'd like to see if there's a better way of doing this. The query should return the number of cases for each month within a 12 month period where a record exists within the past 2 months. The idea is to get number of accounts that ordered during the month in question and at least one of the previous 2 months. Also, please note that every value in the table for data_date will always be the 1st of the month.
SELECT
sum(
case
WHEN a.data_date = '2013-03-01'
and exists(
select 1 from sales mth1
where mth1.client_id = a.client_id
and
data_date BETWEEN '2013-01-01'
and
'2013-02-01'
)
then case_qty
ELSE 0 END
) AS M1 ,
sum(
case
WHEN a.data_date = '2013-04-01'
and exists(
select 1 from sales mth2
where mth2.client_id = a.client_id
and mth2.data_date BETWEEN '2013-02-01'
and
'2013-03-01'
)
then case_qty
ELSE 0 END
) AS M2 ,
sum( case WHEN a.data_date = '2013-05-01' and exists( select 1 from sales mth3 where mth3.client_id = a.client_id and mth3.data_date BETWEEN '2013-03-01' and '2013-04-01' ) then case_qty ELSE 0 END) AS M3 ,
sum( case WHEN a.data_date = '2013-06-01' and exists( select 1 from sales mth4 where mth4.client_id = a.client_id and mth4.data_date BETWEEN '2013-04-01' and '2013-05-01' ) then case_qty ELSE 0 END) AS M4 ,
sum( case WHEN a.data_date = '2013-07-01' and exists( select 1 from sales mth5 where mth5.client_id = a.client_id and mth5.data_date BETWEEN '2013-05-01' and '2013-06-01' ) then case_qty ELSE 0 END) AS M5 ,
sum( case WHEN a.data_date = '2013-08-01' and exists( select 1 from sales mth6 where mth6.client_id = a.client_id and mth6.data_date BETWEEN '2013-06-01' and '2013-07-01' ) then case_qty ELSE 0 END) AS M6 ,
sum( case WHEN a.data_date = '2013-09-01' and exists( select 1 from sales mth7 where mth7.client_id = a.client_id and mth7.data_date BETWEEN '2013-07-01' and '2013-08-01' ) then case_qty ELSE 0 END) AS M7 ,
sum( case WHEN a.data_date = '2013-10-01' and exists( select 1 from sales mth8 where mth8.client_id = a.client_id and mth8.data_date BETWEEN '2013-08-01' and '2013-09-01' ) then case_qty ELSE 0 END) AS M8 ,
sum( case WHEN a.data_date = '2013-11-01' and exists( select 1 from sales mth9 where mth9.client_id = a.client_id and mth9.data_date BETWEEN '2013-09-01' and '2013-10-01' ) then case_qty ELSE 0 END) AS M9 ,
sum( case WHEN a.data_date = '2013-12-01' and exists( select 1 from sales mth10 where mth10.client_id = a.client_id and mth10.data_date BETWEEN '2013-10-01' and '2013-12-01' ) then case_qty ELSE 0 END) AS M10 ,
sum( case WHEN a.data_date = '2014-01-01' and exists( select 1 from sales mth11 where mth11.client_id = a.client_id and mth11.data_date BETWEEN '2013-11-01' and '2013-12-01' ) then case_qty ELSE 0 END) AS M11
FROM sales as a
INNER JOIN Products AS P ON P.product_id = a.product_id
WHERE a.client_id IN ('123')
AND a.data_date BETWEEN '2013-03-01' AND '2013-12-01' AND a.case_qty > 0;
Here's a screen shot of the explain
Here's a screen shot of the indexes
change
data_date BETWEEN '2013-01-01' and '2013-02-01'
to
data_date in ('2013-01-01', '2013-01-02',.....,...'2013-02-01' )
but i would turn this whole query into sum cases then use a wrapper to pull out the accounts i need.
From your query it seems like you want the years sales for a product in an entire year. Or something to that affect. So from your data you are almost trying to create a pivot. I had a simlar problem and solved it like this
select
p.product_id,
s.client_id,
sum(case when DATE_FORMAT(a.data_date,'%Y%m') in (201301) then ifnull(case_qty,0) else 0 end) period1,
sum(case when DATE_FORMAT(a.data_date,'%Y%m') in (201301, 201302) then ifnull(case_qty,0) else 0 end) period2,
sum(case when DATE_FORMAT(a.data_date,'%Y%m') in (201301, 201302, 201303) then ifnull(case_qty,0) else 0 end) period3,
sum(case when DATE_FORMAT(a.data_date,'%Y%m') in (201302, 201303, 201304) then ifnull(case_qty,0) else 0 end) period4,
sum(case when DATE_FORMAT(a.data_date,'%Y%m') in (201303, 201304, 201305) then ifnull(case_qty,0) else 0 end) period5,
.
.
.
up to period12
from
sales s
where
a.data_date between '2013-03-01' and '2013-12-31' and
s.client_id = some_value
group by
p.product_id,
s.client_id
Note that the performance of the query has increased drastically since you are only doing a single scan of your sales table (depending on what's in your where clause and indexes). To speed it up you would need indexes on say client_id and data_date for example.
Ideally this query would be run for a report or something where the start and end data is fixed and all the user can change is the year of that date. I wasn't sure if you wanted to group by product_id since in your query you aren't, but you can always remove it to get total per client.
I have adjusted the query based on my understanding in your comments. I am not 100% sure if you can use the keyword 'in' in a case statement. You could alternatively have more case statements.
In SQL how do you show 0 if record is null?
select sales_id, totalbuy, totalsell, totalbuy + totalsell as total from
(select sales_id, SUM(CASE WHEN side= 'buy' THEN 1 ELSE 0 END) AS totalbuy,
SUM(CASE WHEN side= 'sell' THEN 1 ELSE 0 END) AS totalsell
from car_orders
where sales_id in ('sales1', 'sales2', 'sales3', 'sales4')only
GROUP BY sales_id)q
order by total desc
limit 0, 10;
After car_orders I have tried inserting*(car_orders+ISNULL(car_orders,0)) but get an error.
Building from Sohnee's answer, here is the SQL I think you want to use:
SELECT
sales_id,
IFNULL(totalbuy, 0),
IFNULL(totalsell, 0),
IFNULL(totalbuy, 0) + IFNULL(totalsell, 0) as total
FROM
(
SELECT
sid as sales_id,
SUM(CASE WHEN side = 'buy' THEN 1 ELSE 0 END) AS totalbuy,
SUM(CASE WHEN side = 'sell' THEN 1 ELSE 0 END) AS totalsell
FROM
( SELECT 'sales1' as sid UNION SELECT 'sales2' UNION SELECT 'sales3' UNION SELECT 'sales4' ) mysalesids
LEFT OUTER JOIN car_orders
ON sales_id = sid
GROUP BY
sales_id
) q
ORDER BY total DESC
LIMIT 0, 10;
The key to the above is the "LEFT OUTER JOIN". If you can have the 'sales1', 'sales2', 'sales3' values in their own table, that would be preferable rather than having a sub-select.
Hope this helps,
john...
When you use +, both arguments must be of the same type. I can't tell from your question what car is, but I assume it isn't compatible with a 0.
It is better to use CONCAT in these cases.
CONCAT(car, IFNULL(car_orders, 0))
If car_orders is a table, it isn't valid here - you must use a column, i.e. car_orders.MyColumn
How To Use IFNULL
I don't think you can end up with a null here, given your query, buy you would use IFNULL like this:
SELECT
sales_id,
IFNULL(totalbuy, 0),
IFNULL(totalsell, 0),
IFNULL(totalbuy, 0) + IFNULL(totalsell, 0) as total
FROM
(
SELECT
sales_id,
SUM(CASE WHEN side = 'buy' THEN 1 ELSE 0 END) AS totalbuy,
SUM(CASE WHEN side = 'sell' THEN 1 ELSE 0 END) AS totalsell
FROM
car_orders
WHERE
sales_id in ('sales1', 'sales2', 'sales3', 'sales4')
GROUP BY
sales_id
) q
ORDER BY total DESC
LIMIT 0, 10;
I have this query, which I know doesn't work, but I've left it as it is as pseudo-code to help explain what I'm doing. I'm trying to get "Booking" and "Sales" totals from a Booking table by day-of-the-week for the past week. Hence, Mon1B = Bookings for Monday and Mon1S = Sales for Monday.
SELECT
CASE WEEKDAY(b.created)
WHEN 0 THEN (SELECT COUNT(uuid) as Mon1B, SUM(amount) as Mon1S)
WHEN 1 THEN (SELECT COUNT(uuid) as Tue1B, SUM(amount) as Tue1S)
WHEN 2 THEN (SELECT COUNT(uuid) as Wed1B, SUM(amount) as Wed1S)
WHEN 3 THEN (SELECT COUNT(uuid) as Thu1B, SUM(amount) as Thu1S)
WHEN 4 THEN (SELECT COUNT(uuid) as Wed1B, SUM(amount) as Wed1S)
WHEN 5 THEN (SELECT COUNT(uuid) as Wed1B, SUM(amount) as Wed1S)
WHEN 6 THEN (SELECT COUNT(uuid) as Wed1B, SUM(amount) as Wed1S)
END CASE
FROM Bookings b
WHERE b.created > '#week1Start#' and b.created <= '#week1End#'
How can something like this be done in MySQL?
Yes, but case can only return one value. You can do it like this:
SELECT sum(CASE when WEEKDAY(b.created) = 0 then 1 else 0 end) as Mon1b,
sum(case when weekday(b.created) = 0 then amount else 0 end) as Mon1S,
...
FROM Bookings b
WHERE b.created > '#week1Start#' and b.created <= '#week1End#'
You might find it easier as 7 rows, though:
select WEEKDAY(b.created), count(*) as cnt, sum(amount) as amt
from Bookings b
WHERE b.created > '#week1Start#' and b.created <= '#week1End#'
group by WEEKDAY(b.created)
order by 1
I think you want something like this:
SELECT COUNT(IF(WEEKDAY(b.created)=0,uuid,NULL)) AS Mon1B
, SUM(IF(WEEKDAY(b.created)=0,amount,NULL)) AS Mon1S
, COUNT(IF(WEEKDAY(b.created)=1,uuid,NULL)) AS Tue1B
, SUM(IF(WEEKDAY(b.created)=1,amount,NULL)) AS Tue1S
Or, if you prefer the equivalent (but lengthier) CASE expression:
SELECT COUNT(CASE WEEKDAY(b.created) WHEN 0 THEN uuid END) AS Mon1B
, SUM(CASE WEEKDAY(b.created) WHEN 0 THEN amount END) AS Mon1S
, COUNT(CASE WEEKDAY(b.created) WHEN 1 THEN uuid END) AS Tue1B
, SUM(CASE WEEKDAY(b.created) WHEN 1 THEN amount END) AS Tue1S
The result of a CASE expression is a scalar; it can't return more than one value.
SELECT WEEKDAY(b.created),
CASE
WHEN b.weekday='Monday' THEN (SELECT COUNT(b.uuid) as Mon1B, SUM(s.amount) as Mon1S from bookings b,sales s where b.day='Monday' and s.day='Monday')
WHEN b.weekday='Tuesday' THEN (SELECT COUNT(b.uuid) as Tue1B, SUM(s.amount) as Tue1 from bookings b,sales s where b.day='Tuesday' and s.day='tuesday')
WHEN b.weekday='Wednesday' THEN (SELECT COUNT(b.uuid) as Wed1B, SUM(s.amount) as Wed1S from bookings b,sales s where b.day='wednesday' and s.day='wednesday')
WHEN b.weekday='Thursday' THEN (SELECT COUNT(b.uuid) as Thu1B, SUM(s.amount) as Thu1S from bookings b,sales s where b.day='Thursday' and s.day='Thursday')
WHEN b.weekday='Friday' THEN (SELECT COUNT(b.uuid) as Fri1B, SUM(s.amount) as Fri1S from bookings b,sales s where b.day='Friday' and s.day='Friday')
WHEN b.weekday='saturaday' THEN (SELECT COUNT(b.uuid) as Sat1B, SUM(s.amount) as sat1S from bookings b,sales s where b.day='Saturaday' and s.day='Saturaday')
WHEN b.weekday='Sunday' THEN (SELECT COUNT(b.uuid) as sun1B, SUM(s.amount) as sun1S from bookings b,sales s where b.day='Sunday' and s.day='Sunday')
END CASE
FROM Bookings b
WHERE b.created > '#week1Start#' and b.created <= '#week1End#'