What is the best way to optimize this sql query - mysql

I have the following SQL query, but i noticed that it's putting some pressure on my server since every time i run it, the CPU usage jumps with good 20%.
SELECT
c.name, c.billingaddress, c.billingcity, c.billingstate, c.billingzip,c.ifActive,
(SELECT COUNT(l.id) FROM newLoads l WHERE l.idCompany = c.id AND l.smallStatus='1') as numberLoads,
(SELECT (SUM(l.loadRate))/(SUM(l.esMiles)) FROM newLoads l WHERE l.idCompany = c.id AND l.loadRate != '0' AND l.esMiles != '0' AND l.smallStatus='1') as RPM
FROM `companies` c WHERE ifContractor ='0' $cond
ORDER BY numberLoads DESC

This might be more efficient:
SELECT c.name, c.billingaddress, c.billingcity,
c.billingstate, c.billingzip, c.ifActive,
x.numberLoads, x.RPM
FROM
( SELECT l.idCompany,
COUNT(*) AS numberLoads,
SUM(l.loadRate))/(SUM(l.esMiles) AS RPM
FROM newLoads l
WHERE l.smallStatus = '1'
) AS x
JOIN companies AS c ON c.id = x.idCompany
WHERE ifContractor = '0' $cond
ORDER BY x.numberLoads DESC;
Please provide SHOW CREATE TABLE and EXPLAIN SELECT ....

This is your query:
SELECT c.name, c.billingaddress, c.billingcity, c.billingstate, c.billingzip, c.ifActive,
(SELECT COUNT(l.id)
FROM newLoads l
WHERE l.idCompany = c.id AND l.smallStatus = '1'
) as numberLoads,
(SELECT (SUM(l.loadRate))/(SUM(l.esMiles))
FROM newLoads l
WHERE l.idCompany = c.id AND l.loadRate <> '0' AND l.esMiles <> '0' AND l.smallStatus = '1'
) as RPM
FROM `companies` c
WHERE ifContractor = '0' $cond
ORDER BY numberLoads DESC;
I don't know what $cond is supposed to be. It is certainly not valid SQL syntax, so I'll ignore it.
For this query, you wan the following indexes: companies(ifContractor, id) and newload(idCompany, smallstatus, loadrate, esmiles, id).
By the way, if the columns whose values look like numbers really are numbers, then drop the single quotes. Type conversion can confuse the optimizer.

Maybe 20% isn't all that bad? (especially if it's only for a short burst) By the looks of it, it might need to run over quite a bit of data to get its result.
I tried to merge the aggregations on the newLoads table into a single SELECT and ended up with something (very) similar what Rick James already had. The added benefit of my construction is that it keeps more in line with the original query in case there is no matching information in newLoads and/or when one of the fields there is zero. (I think, didn't really test it out)
SELECT c.name, c.billingaddress, c.billingcity, c.billingstate, c.billingzip, c.ifActive, agg.numberLoads, agg.RPM
FROM `companies` c
LEFT OUTER JOIN ( (SELECT l.idCompany,
numberLoads = COUNT(l.id),
RPM = (CASE WHEN SUM((CASE WHEN l.loadRate <> '0' AND l.esMiles <> '0' THEN 1 ELSE 0 END)) = 0 THEN NULL ELSE
SUM((CASE WHEN l.loadRate <> '0' AND l.esMiles <> '0' THEN l.loadRate ELSE 0 END)) / SUM((CASE WHEN l.loadRate <> '0' AND l.esMiles <> '0' THEN l.esMiles ELSE 0 END))
END)
FROM newLoads l
WHERE l.smallStatus = '1'
) AS agg
ON agg.idCompany = c.id
WHERE c.ifContractor = '0' $cond
ORDER BY agg.numberLoads DESC;
Anyway, if duration is an issue, you might want to check if you have (compound) indexes on the relevant fields like Gordon Linoff rightfully suggested, and also on what might be in $cond; it probably would make sense to see what kind of filtering is going on there and what effect it has on the overall performance of the query.
PS: not having much hands-on experience with mysql I was wondering if l.esMiles <> '0' isn't "slower" than l.esMiles <> 0, under the assumption that l.esMiles is a numeric field (e.g. integer or decimal etc..)

Related

Query is neglecting one of where clause, any idea why it is happening?

I have two tables, activity and users. I am trying to fetch data by using multiple where clauses.
SELECT SUM(activity.step_points) AS s_points
, `activity`.`user_id`
, `users`.`id`
, `users`.`app_id`
, `users`.`country_id`
FROM `activity`
LEFT JOIN `users` ON `users`.`id` = `activity`.`user_id`
WHERE `users`.`is_active` = 1 AND
`users`.`is_test_account` = 0 AND
`users`.`app_id` = 3 AND
`users`.`country_id` = 1 AND
`users`.`phone` NOT LIKE "%000000%" OR
`users`.`phone` IS NULL AND
`users`.`is_subscribed` = 1 AND
(`users`.`email` NOT LIKE "%#mycompanyname.net" OR
`users`.`email` IS NULL) AND
YEAR(`activity`.`created_at`) = "2021" AND
MONTH(`activity`.`created_at`) = "06"
GROUP BY `activity`.`user_id`
ORDER BY `s_points` DESC LIMIT 100 OFFSET 0
But I think users.country_id = 1 is getting neglected. You can see I want only rows that belong to country id 1. But I am getting country id 2, 3 too.
Why is it happening?
You need to properly use parentheses in the WHERE clause so the OR does not dominate over the ANDs:
SELECT . . .
FROM activity a JOIN
users u
ON u.id = a.id
WHERE u.is_active = 1 AND
u.is_test_account = 0 AND
u.app_id = 3 AND
u.country_id = 1 AND
(u.phone NOT LIKE '%000000%' OR u.phone IS NULL) AND
u.is_subscribed = 1 AND
(u.email NOT LIKE '%#mycompanyname.net OR u.email IS NULL) AND
(a.created_at >= '2021-01-01' AND a.created_at < '2022-01-01'
Note the other changes to the code:
You are filtering on both tables, so an outer join is not appropriate. The WHERE clause turns it into an inner join anyway.
Table aliases make the code easier to write and to read.
All the backticks just make the code harder to write and read and are not needed.
SQL's standard delimiter for strings is single quotes. Use them unless you have a good reason for preferring double quotes.
For date comparisons, it can be faster to avoid functions, hence the change for the year comparison. This helps the optimizer.

MySQL - Select Field With Nested 'WHERE'

I'm working on a project which uses the SUM command to get a number of values. Now, all of this works fine but there is an issue when it comes to load time as the query takes 3.4 seconds to complete.
Here is an example of what I have so far:
SELECT
p.`player_id`,
p.`player_name` AS `name`,
d.`player_debut` AS `debut`,
SUM(a.`player_order` <= '11' OR a.`player_sub` != '0') AS `apps`,
SUM(a.`player_order` <= '11') AS `starts`,
SUM(a.`player_goals`) AS `goals`
FROM
`table1` r,
`table2` a,
`table3` p
LEFT JOIN `table4` d ON p.`player_id` = d.`player_id`
WHERE
r.`match_id` = a.`match_id` AND
a.`player_id` = p.`player_id` AND
r.`void` = '0'
GROUP BY
a.`player_id`
ORDER BY
p.`player_name` ASC
Cast your mind to line 4. That field is retrieved by making use of the LEFT JOIN further down the query. By taking those two lines out, load time decreases to less than 0.5 seconds - a significant improvement.
What I'm trying to achieve there (line 4), without success, is something similar to lines 5-7, where a sort of invisible WHERE clause has been applied.
The idea would be t4.date WHERE t2.order <= '14', but I'm not sure how I'd be able to get this to work without the aforementioned LEFT JOIN and increased load time that comes with it.
For clarification, here is how table4 was created - with the following query turned into a VIEW.
SELECT a.`player_id`, m.`date` AS `player_debut`
FROM
`table1` r,
`table2` a,
`table3` p
WHERE
a.`match_id` = m.`match_id` AND
a.`player_id` = p.`player_id` AND
m.`match_void` = '0' AND
(
a.`player_order` BETWEEN '1' AND '11' OR
a.`player_sub_on_for` != '0'
)
GROUP BY p.`player_id`
ORDER BY p.`player_name` ASC
Essentially, as I am making use of the same tables for both queries and only utilising a different WHERE clause, I'm trying to establish if there is a way to 'nest' this.
You may just need conditional aggregation
SELECT p.`player_id`, p.`player_name` AS `name`,
min(case when a.player_order <= '11' OR a.`player_sub` != '0' then r.date else 0 end) `debut`,
SUM(case when a.`player_order` <= '11' OR a.`player_sub` != '0' then 1 else 0 end) AS `apps`,
SUM(case when a.`player_order` <= '11' then 1 else 0 end) AS `starts`,
SUM(a.`player_goals`) AS `goals`
FROM
`table1` r,
left join `table2` a on r.`match_id` = a.`match_id`,
left join `table3` p on a.`player_id` = p.`player_id`
WHERE r.`void` = '0'
GROUP BY p.player_id,a.`player_id`
ORDER BY p.player_id,p.`player_name`;
There seem to be some inconsistencies in your column names ( a.player_sub,a.player_sub_on_for, m.match_void, r.void = '0') so I may not have got this quite right , and group by clause without aggregation is pointless.

Query gets very slows if i add a where

SELECT a.emp_id,s.name, s.department,s.register, z.Compoff_Count as Extra, ifnull(COUNT(DISTINCT TO_DAYS(a.punchdate)),0) as Monthly_Count
FROM machinedata a left join
(SELECT a.emp_id, ifnull(COUNT(DISTINCT TO_DAYS(a.punchdate)),0) as Compoff_Count
FROM machinedata a
RIght JOIN time_dimension c on c.db_date = a.punchdate
where ( year(c.db_date) = 2016 and month(c.db_date) = 8 and (c.holiday_flag = 't' or c.weekend_flag ='t' ))
GROUP BY a.emp_id) Z
on z.emp_id = a.emp_id
RIght JOIN time_dimension c on c.db_date = a.punchdate
left join emp s on s.emp_id = a.emp_id
where (year(c.db_date) = 2016 and month(c.db_date) = 8 and c.holiday_flag = 'f' and c.weekend_flag ='f' )
GROUP BY emp_id
The above query works fine.. but if i add s.department='yes' in the last where the query takes more than 40 seconds.
What shall i do to improve the query performance ?
Your initial query can be simplified I believe by using "conditional aggregates" which places case expressions inside the count() function. This avoids repeated sans of data and unnecessary joins to derived tables.
You should also avoid using functions on data to suit where clause conditions i.e. Instead of YEAR() and MONTH() simply use date boundaries. This allows an index on the date column to be used in the query execution.
I'm not sure if you really need to use TO_DAYS() but I suspect it isn't needed either.
SELECT
a.emp_id
, s.name
, s.department
, s.register
, COUNT(DISTINCT CASE WHEN (c.holiday_flag = 't' OR
c.weekend_flag = 't') THEN c.db_date END) AS Compoff_Count
, COUNT(DISTINCT CASE WHEN NOT (c.holiday_flag = 't' OR
c.weekend_flag = 't') THEN a.punchdate END) AS Monthly_Count
FROM time_dimension c
LEFT JOIN machinedata a ON c.db_date = a.punchdate
LEFT JOIN emp s ON a.emp_id = s.emp_id
WHERE c.db_date >= '2016-08-01'
AND c.db_date < '2016-09-01'
GROUP BY
a.emp_id
, s.name
, s.department
, s.register
If this re-write produces correct results then you could try adding and s.department='yes' into the where clause to assess the impact. If it is still substantially slower then get an explain plan and add it to the question. The most likley cause of slowness is lack of an index but without an explain plan it's not possible to be certain.
Please note that this suggestion is just that; and is prepared without sample data and expected results.

mysql sum one table but different type

task_payments
SELECT t.id AS task_id, t.name, t.created_at
,COALESCE(SUM(tp1.amount),0) AS paid
,COALESCE(SUM(tp2.amount),0) AS paid_back
FROM tasks AS t
LEFT JOIN task_payments AS tp1 ON tp1.task_id=t.id AND tp1.type='1'
LEFT JOIN task_payments AS tp2 ON tp2.task_id=t.id AND tp2.type='0'
WHERE t.customer_id='4'
GROUP BY tp1.task_id, tp2.task_id
ORDER BY t.id ASC
Hi, There is two type(1 OR 0) on task_payments. type 0 is paid back. type 1 is paid. I want separately total amount as result. so I want result;
task_id=5
paid=450
paid_back=10
I should use join. If there is a filter request, I am going to use paid and paid_colums on where clause. ex: and paid_back>0
maybe the following query may help you :D
SELECT x.*
FROM
(
SELECT a.task_id,
SUM(CASE WHEN b.type = 1 THEN b.amount ELSE 0 END) paid,
SUM(CASE WHEN b.type = 0 THEN b.amount ELSE 0 END) paidBack
FROM tasks a
LEFT JOIN task_payments b
ON a.id = b.task_id
-- WHERE a.customer_id = 4
GROUP BY a.task_id
) x
-- WHERE x.paid > 100 -- sample Request
Apart from JW's answer I would like to suggest one thing,
If your requirement is to default nulls then go for
nvl(sum(field),0) instead of COALESCE(SUM(tp1.amount),0)
If your db doesnt support nvl then go for IFNULL
Hope this also helps you :)

mysql sub select newbie

SELECT a.lead_id, c.state_name AS COL1DATA, count( c.state_name ) AS leadcount, (
SELECT count( won_loss ) AS wonlosscount
FROM lead_status
WHERE (won_loss = 'loss')
AND lead_id = a.lead_id
) AS losscount
FROM lead AS a
JOIN states AS c ON a.state_id = c.states_id
GROUP BY c.state_name
ORDER BY losscount DESC
the answer i get is
lead_id COL1DATA leadcount losscount
1 Queensland 7 0
8 Victoria 3 0
lead status
lead_id won_loss won_price won_mainreason loss_mainreason loss_attachment_id lost_dont_sell_note add_note dealer_satisfaction
5 win 4655 pricing fghfg somewhat
8 won 34543 pricing sfdgs satisfied
7 loss service Additional Notes verygood
9 loss not_in_stock Additi satisfied
but the loss count should be 1 and 1
any help is appricated
I'm guessing that there's a problem with mixing the non-aggregated lead_id in the correlated query all the while grouping on state_name. Perhaps you can describe what you're looking to get back.
EDIT: Based on OP feedback in comment below.
EDIT 2: Changed to left outer joins based on chat session. Not all leads have a lead_status.
SELECT
s.state_name AS COL1DATA, count(c.state_name) AS leadcount,
sum(case when ls.won_loss = 'loss' then 1 else 0 end) as losscount
FROM
lead AS l
INNER JOIN states AS s ON s.state_id = l.states_id
LEFT OUTER JOIN lead_status as ls on ls.lead_id = l.lead_id
GROUP BY s.state_name
ORDER BY losscount DESC
I might argue that this version is slightly better. But I didn't want to totally change your query. (I did change the aliases because A and C were confusing.)
SELECT
min(s.state_name) AS COL1DATA,
count(l.lead_id) AS leadcount, /* counting non-nullable key on the outer side */
sum(case when ls.won_loss = 'loss' then 1 else 0 end) as losscount
FROM
lead AS l
INNER JOIN states AS s ON s.state_id = l.states_id
LEFT OUTER JOIN lead_status as ls on ls.lead_id = l.lead_id
GROUP BY s.state_id /* might be better to group on the id */
ORDER BY losscount DESC
The lead_id column you have included in the output is unpredictable unless the group only has one row. Based on what you've said, I doubt that you really want it.