I have a query which is pretty that contains LEFT JOIN subquery. It takes 20 minutes to load completely.
Here is my query:
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, COUNT(id) AS count_files, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
WHERE oci.count_files = 1
Is there any way that I can make this query runs faster?
Move Where condition in Temp Table and replace WHERE with HAVING Clause, this will eliminate unnecessary rows from temp table so reduce the filtering and may help to improve performance
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
HAVING COUNT(id) = 1
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
I would suggest to create separate column for Order_id substring and make index on it. Then use this column in WHERE.
Related
I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html
I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?
I have a query which gets the correct result but it is taking 5.5 sec to get the output.. Is there any other way to write a query for this -
SELECT metricName, metricValue
FROM Table sm
WHERE createdtime = (
SELECT MAX(createdtime)
FROM Table b
WHERE sm.metricName = b.metricName
AND b.sinkName='xx'
)
AND sm.sinkName='xx'
In your code, the subselect has to be run for every result row of the outer query, which should be quite expensive. Instead, you could select your filter data in a separate query and join both accordingly:
SELECT `metricName`, `metricValue` FROM Table sm
INNER JOIN (SELECT max(`createdtime`) AS `maxTime, `metricName` from Table b WHERE b.sinkName='xx' GROUP BY `metricName` ) filter
ON (sm.`createdtime` = filter.`maxTime`) AND ( sm.`metricName` = filter.`metricName`)
WHERE sm.sinkName='xx'
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.
im trying to generate a report using CodeIgniter and Datatables.net .
Now i'm trying to the amount of closed jobs (its a human resources system). I used to query all jobs and in PHP do a foreach and then doing the calcs.
Because im want to use all the features of Datatables (sorting specifically) im trying to do all the calcs in mySQL.
The problem is: the second subquery is very very very slow.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
(select count(job_cv.jobs_id) from job_cv where job_cv.jobs_id = jobs.jobs_id) as qtd_int,
(select count(distinct job_cv.user_id) from job_cv_history join job_cv on job_cv.job_cv_id = job_cv_history.job_cv_id where job_cv_history.status = '11' and job_cv.jobs_id = jobs.jobs_id ) as fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
LIMIT 50
Some one can help me? I can provide more information if its needed.
UPDATE
The idea to use JOIN instead SELECT work with the first subquery but with the second one not, there a way to pass a 'variable' to use inside the subquery? Like the current jobs_id?
UPDATE AGAIN
This line works fine by itself. But inside the subquery take about a minute with worng values
SELECT job_cv.jobs_id,count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.jobs_id
It is not subquery that is slow. It's the fact, that you're executing these subqueries for each row returned from outer query. Move these to joins instead, and you should observe increase in performance.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
qtd.qtd_int,
fechadas.fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
JOIN (
SELECT jobs_id, count(jobs_id) AS qtd_int FROM job_cv GROUP BY jobs_id
) AS qtd ON qtd.jobs_id = jobs.jobs_id
JOIN (
SELECT job_cv.user_id, count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.user_id
) AS fechadas ON job_cv.jobs_id = jobs.jobs_id
LIMIT 50
You may try to create these indexes:
ALTER TABLE `job_cv` ADD INDEX `job_cv_cindex` (`job_cv_id` ASC, `jobs_id` ASC, `user_id` ASC);
ALTER TABLE `job_cv_history` ADD INDEX `job_cv_history_cindex` (`job_cv_id` ASC, `status` ASC);
use Joins instead of sub queries. It significantly improves the performance in MySql.
try to use Left join on your case and see if performance improves or not