I have a query running on MySQL DB and is very slow.
Is there anyway I can optimize the following
SELECT mcm.merchant_name,
( ( Sum(ot.price) + Sum(ot.handling_charges)
+ Sum(ot.sales_tax_recd)
+ Sum(ot.shipping_cost) - Sum(ot.sales_tax_payable) ) -
Sum(im.break_even_cost) ) AS PL,
ot.merchant_id
FROM order_table ot,
item_master im,
merchant_master mcm
WHERE ot.item_id = im.item_id
AND ot.merchant_id = mcm.merchant_id
GROUP BY mcm.merchant_name
ORDER BY pl DESC
LIMIT 0, 10;
The Above Query is taking more than 200 seconds to execute.
Explain Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ot ALL "merchant_id,item_id" NULL NULL NULL 507910 "Using temporary; Using filesort"
1 SIMPLE mcm eq_ref "PRIMARY,merchant_id" PRIMARY 4 stores.ot.merchant_id 1
1 SIMPLE im eq_ref "PRIMARY,item_id" PRIMARY 4 stores.ot.item_id 1
Also, I got Error-1003 when I run EXPLAIN EXTENDED
use mysql explain plan to find out why it is taking so long and then maybe create some indexes or change your code.
Update
Based upon this make sure you have an composite index on the order_table on merchant_id,item_id
Related
I have a datehelper table with every YYYY-MM-DD as DATE between the years 2000 and 2100. To this I'm joining a subquery for all unit transactions. unit.end is a DATETIME so my subquery simplifies it to DATE and uses that to join to the datehelper table.
In 5.6 this query takes a couple seconds to run a massive amount of transactions, and it derives a table that is auto keyed based on the DATE(unit.end) in the subquery and uses that to join everything else fairly quickly.
In 5.7, it takes 600+ seconds and I can't get it to derive a table or follow the much better execution plan that 5.6 used. Is there a flag I need to set or some way to prefer the old execution plan?
Here's the query:
EXPLAIN SELECT datehelper.id AS date, MONTH(datehelper.id)-1 AS month, DATE_FORMAT(datehelper.id,'%d')-1 AS day,
IFNULL(SUM(a.total),0) AS total, IFNULL(SUM(a.tax),0) AS tax, IFNULL(SUM(a.notax),0) AS notax
FROM datehelper
LEFT JOIN
(SELECT
DATE(unit.end) AS endDate,
getFinalPrice(unit.id) AS total, tax, getFinalPrice(unit.id)-tax AS notax
FROM unit
INNER JOIN products ON products.id=unit.productID
INNER JOIN prodtypes FORCE INDEX(primary) ON prodtypes.id=products.prodtypeID
WHERE franchiseID='1' AND void=0 AND checkout=1
AND end BETWEEN '2020-01-01' AND DATE_ADD('2020-01-01', INTERVAL 1 YEAR)
AND products.prodtypeID NOT IN (1,10)
) AS a ON a.endDate=datehelper.id
WHERE datehelper.id BETWEEN '2020-01-01' AND '2020-12-31'
GROUP BY datehelper.id ORDER BY datehelper.id;
5.6 result (much faster):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY datehelper range PRIMARY PRIMARY 3 NULL 365 Using where; Using index
1 PRIMARY <derived2> ref <auto_key0> <auto_key0> 4 datehelper.id 10 NULL
2 DERIVED prodtypes index PRIMARY PRIMARY 4 NULL 10 Using where; Using index
2 DERIVED products ref PRIMARY,prodtypeID prodtypeID 4 prodtypes.id
9 Using index
2 DERIVED unit ref productID,end,void,franchiseID productID 9 products.id 2622 Using where
5.7 result (much slower, no auto key found):
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE datehelper NULL range PRIMARY PRIMARY 3 NULL 366 100.00 Using where; Using index
1 SIMPLE unit NULL ref productID,end,void,franchiseID franchiseID 4 const 181727 100.00 Using where
1 SIMPLE products NULL eq_ref PRIMARY,prodtypeID PRIMARY 8 barkops3.unit.productID 1 100.00 Using where
1 SIMPLE prodtypes NULL eq_ref PRIMARY PRIMARY 4 barkops3.products.prodtypeID 1 100.00 Using index
I found the problem. It was the optimizer_switch 'derived_merge' flag which is new to 5.7.
https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
This flag overrides materialization of derived tables if the optimizer thinks the outer WHERE can be pushed down into a subquery. In this case, that optimization was enormously more costly than joining a materialized table on an auto_key.
I'm working with mysql 5.5.52 on a Debian 8 machine and sometimes we have a slow query (>3s) that usually spends 0.1s. I've started with the explain command to find what is happening.
This is the query and the explain info
explain
SELECT
`box`.`message_id` ID
, `messages`.`tipo`
, `messages`.`text`
, TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box`)
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0
AND `messages`.`deleted` = 0
AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `messages`.`id` ASC LIMIT 100;
id| select_type| table | type | possible_keys | key | key_len| ref | rows | Extra
1|SIMPLE |box |ref |user_id,message_id|user_id| 4|const | 2200 |Using where; Using temporary; Using filesort
1|SIMPLE |messages|eq_ref|PRIMARY |PRIMARY| 4|box.message_id| 1 |Using where
I know that temporary table and filesort are a bad thing, and I suppose that the problem is that order key doesn't belong to the first table in the query (box) and changing it to box.message_id, the explain info is
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE box index user_id,message_id message_id 4 443 Using where
1 SIMPLE messages eq_ref PRIMARY PRIMARY 4 box.message_id 1 Using where
It looks better, but I don't understand why it's using the message_id index, and worst, now the query takes 1.5s instead of initial 0.1s
Edit:
Forcing the query to use user_id index, I get the same result (0.1s) as the initial query but without the temporary
explain
SELECT
`box`.`message_id` ID
, `messages`.`tipo`
, `messages`.`text`
, TIME_TO_SEC(TIMEDIFF(NOW(), `messages`.`date`)) `date`
FROM (`box` use index(user_id) )
INNER JOIN `messages` ON `messages`.`id` = `box`.`message_id`
WHERE `box`.`user_id` = '1010231' AND `box`.`deleted` = 0
AND `messages`.`deleted` = 0
AND `messages`.`date` + INTERVAL 10 MINUTE > NOW()
ORDER BY `box`.`message_id` ASC LIMIT 100;
id| select_type| table | type | possible_keys | key | key_len| ref | rows | Extra
1|SIMPLE |box |ref |user_id,message_id|user_id| 4|const | 2200 |Using where; Using filesort
1|SIMPLE |messages|eq_ref|PRIMARY |PRIMARY| 4|box.message_id| 1 |Using where
I think that skipping temporary is better solution than the initial query, next step is check combined index as ysth recommends.
it is not a good idea to calculate on fieldvalues to compare. then you get a FULL TABLE SCAN. MySQL must do it for each ROW before it can check the condition. Its better to do it on the constant piece of condition. Then MySQL can use a Index (if there one on this field)
change from
AND messages.date + INTERVAL 10 MINUTE > NOW()
to
AND messages.date > NOW() - INTERVAL 10 MINUTE
Temporary and file sort are not bad here; they are needed because using the best index (user_id) doesn't naturally produce records sorted in the order you ask for.
It's possible you might do better having a combined user_id,message_id index, but that might also end up worse. Depends on your exact data.
It isn't clear to me if you are seeing longer queries for certain user id's or the same user id sometimes taking much longer.
Update: it seems likely that having a combined index and changing order by to box.user_id,box.message_id will solve your problem, at least for users that don't have a large number of deleted messages.
I have a problem with the count() of paginate still in cakephp version 3.3:
My table has 6,000,000 records.
The fields involved here are name and cityf. Both have index in MySQL
I'm showing 10 and 10 and despite the query is very fast, the count() of paginate is taking more than 50 seconds.
How to solve this in version 3.3 of cakephp. Follows the two SQL statements and times below:
Select query main:
SELECT
Rr.id AS `Rr__id`,
Rr.idn AS `Rr__idn`,
Rr.aniver AS `Rr__aniver`,
Rr.pessoa AS `Rr__pessoa`,
Rr.name AS `Rr__name`,
Rr.phoner AS `Rr__phoner`,
Rr.tipolf AS `Rr__tipolf`,
Rr.addressf AS `Rr__addressf`,
Rr.num_endf AS `Rr__num_endf`,
Rr.complem AS `Rr__complem`,
Rr.bairrof AS `Rr__bairrof`,
Rr.cityf AS `Rr__cityf`,
Rr.statef AS `Rr__statef`,
Rr.cepf AS `Rr__cepf`,
Rr.n1 AS `Rr__n1`,
Rr.n2 AS `Rr__n2`,
Rr.smerc AS `Rr__smerc`,
Rr.n3 AS `Rr__n3`,
Rr.n4 AS `Rr__n4`,
Rr.fone AS `Rr__fone`,
Rr.numero AS `Rr__numero`
FROM
`MG` Rr
WHERE
(
Rr.name like 'MARCOS%'
AND Rr.cityf like 'BELO HORIZONTE%'
)
ORDER BY
name asc
LIMIT
10 OFFSET 0
= 10 ms
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Rr range NAME,CITYF,cityfbairrof,cityfaddressf,cityfbairrofaddressf,namen1n2n3n4 NAME 63 NULL 21345 Using index condition; Using where
-
Select query count:
SELECT
(
COUNT(*)
) AS `count`
FROM
`MG` Rr
WHERE
(
Rr.name like 'MARCOS%'
AND Rr.cityf like 'BELO HORIZONTE%'
)
= 51.247 ms
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Rr range NAME,CITYF,cityfbairrof,cityfaddressf,cityfbairrofaddressf,namen1n2n3n4 NAME 63 NULL 21345 Using index condition; Using where
It's happening in several other cases: Always count of query is very slow.
I appreciate any help.
Marcos
I need to optimise my query which is running very slow, but don't know how to do it. It contains a subquery which is making it very slow. If I remove the inline query then it runs very well.
The query is:
EXPLAIN
SELECT t.service_date,
t.service_time,
(SELECT js.modified_date FROM rej_job_status js WHERE js.booking_id=b.booking_id ORDER BY id DESC LIMIT 1) `cancel_datetime`,
b.booking_id,
b.ref_booking_id,
b.phone, b.city,
b.booking_time,
CONCAT(rc.firstname," ",rc.lastname) customer_name,
rc.phone_no,
rs.service_id,
rs.service_name,
rct.city_name
FROM rej_job_details t
JOIN rej_booking b ON t.booking_id = b.booking_id
JOIN rej_customer rc ON rc.customer_id = b.customer
JOIN rej_service rs ON t.service_id = rs.service_id
JOIN rej_city rct ON rct.city_id=b.city
WHERE t.act_status = 0 AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
ORDER BY b.booking_time DESC
LIMIT 0 , 50
The explain plan shows this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY b ALL PRIMARY NULL NULL NULL 32357 Using where; Using filesort
1 PRIMARY rct eq_ref PRIMARY PRIMARY 4 crmdb.b.city 1 NULL
1 PRIMARY t ref booking_id booking_id 4 crmdb.b.booking_id 1 Using where
1 PRIMARY rs eq_ref PRIMARY,service_id PRIMARY 4 crmdb.t.service_id 1 NULL
1 PRIMARY rc eq_ref PRIMARY PRIMARY 4 crmdb.b.customer 1 Using where
2 DEPENDENT SUBQUERY js index NULL PRIMARY 4 NULL 1 Using where
a) How to read this explain plan and know what it means?
b) How can I optimize this query?
booking_time is hiding inside a function, so INDEX(booking_time) cannot be used. That leads to a costly table scan.
AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
-->
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-15' -- note 3 differences in this line
Or, this might be simpler (by avoiding second date calculation):
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-01' + INTREVAL 2 WEEK
In the EXPLAIN, I expect the 'ALL' to become 'range', and 'Filesort' to vanish.
To understand the full explain-plan, you should read the documentation, but the most important information it includes is the indexes mysql uses, or, usually more revealing, which it doesn't use.
For your DEPENDENT SUBQUERY (that is your "inline query"), it doesn't use a good index, which makes your query slow, so you need to add the index rej_job_status(booking_id) on your table rej_job_status.
Create it, test it and check your explain plan again, it should then list that new index under key for your DEPENDENT SUBQUERY.
Another optimization might be to add an index rej_booking(booking_time) for your table rej_booking. It depends on your data if it improves the query, but you should try it, since right now, mysql doesn't use an index for that selection.
Please consider the following query
SELECT * FROM PC_SMS_OUTBOUND_MESSAGE AS OM
JOIN MM_TEXTOUT_SERVICE AS TOS ON TOS.TEXTOUT_SERVICE_ID = OM.SERVICE_ID
JOIN PC_SERVICE_NUMBER AS SN ON OM.TO_SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN PC_SUBSCRIBER AS SUB ON SUB.SERVICE_NUMBER_ID = SN.SERVICE_NUMBER_ID
JOIN MM_CONTACT CON ON CON.SUBSCRIBER_ID = SUB.SUBSCRIBER_ID
--AND CON.MM_CLIENT_ID = 1
AND OM.CLIENT_ID= 1
AND OM.CREATED>='2013-05-08 11:47:53' AND OM.CREATED<='2014-05-08 11:47:53'
ORDER BY OM.SMS_OUTBOUND_MESSAGE_ID DESC LIMIT 50
To get the dataset I require I need to filter on the (commented out) CONTACTS client_id as well as the OUTBOUND_MESSAGES client_id but this is what changes the performance from milliseconds to tens of minutes.
Execution plan without "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE OM index FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX PRIMARY 8 NULL 6741 3732.00 Using where
1 SIMPLE SUB ref PRIMARY,FKA1845E3459A7AEF FKA1845E3459A7AEF 9 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.OM.TO_SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE CON ref FK2BEC061CA525D30,SUB_CL_IDX FK2BEC061CA525D30 8 mmlive.SUB.SUBSCRIBER_ID 1 100.00
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Execution plan with "AND CON.MM_CLIENT_ID = 1":
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE CON ref FK2BEC061CA525D30,FK2BEC06134399E2A,SUB_CL_IDX FK2BEC06134399E2A 8 const 18306 100.00 Using temporary; Using filesort
1 SIMPLE SUB eq_ref PRIMARY,FKA1845E3459A7AEF PRIMARY 8 mmlive.CON.SUBSCRIBER_ID 1 100.00
1 SIMPLE OM ref FK4E518EAA19F2EA2B,SERVICEID_IDX,CREATED_IDX,CLIENTID_IDX,CL_CR_ST_IDX,CL_CR_STYPE_ST_IDX,SID_TOSN_CL_CREATED_IDX FK4E518EAA19F2EA2B 9 mmlive.SUB.SERVICE_NUMBER_ID 3 100.00 Using where
1 SIMPLE SN eq_ref PRIMARY PRIMARY 8 mmlive.SUB.SERVICE_NUMBER_ID 1 100.00 Using where
1 SIMPLE TOS eq_ref PRIMARY,FKDB3DF298AB3EF4E2 PRIMARY 8 mmlive.OM.SERVICE_ID 1 100.00
Any suggestions on how to format the above to make it a little easier on the eye would be good.
ID fields are primary keys.
There are indexes on all joining columns.
You may be able to fix this problem by using a subquery:
JOIN (SELECT C.* FROM CONTACTS C WHERE C.USER_ID = 1) ON C.SUBSCRIBER_ID = SUB.ID
This will materialize the matching rows, which could have downstream effects on the query plan.
If this doesn't work, then edit your query and add:
The explain plans for both queries.
The indexes available on the table.
EDIT:
Can you try creating a composite index:
PC_SMS_OUTBOUND_MESSAGE(CLIENT_ID, CREATED, SERVICE_ID, TO_ SERVICE_ID, SMS_OUTBOUND_MESSAGE_ID);
This may change both query plans to start on the OM table with the appropriate filtering, hopefully making the results stable and good.
I've solved the riddle! For my case anyway so I'll share.
This all came down to the join order changing once I added that extra clause, which you can clearly see in the execution plan. When the query is fast, the Outbound Messages are at the top of the plan but when slow (after adding the clause), the Contacts table is at the top.
I think this means that the Outbound Messages index can no longer be utilised for the sorting which causes the dreaded;
"Using temporary; Using filesort"
By simply adding STRAIGHT_JOIN keyword directly after the select I could force the execution plan to join in the order denoted directly by the query.
Happy for anyone with a more intimate knowledge of this field to contradict any of the above in terms of what is actually happening but it definitely worked.