Mysql query optimisation - Very slow - mysql

I need to optimise my query which is running very slow, but don't know how to do it. It contains a subquery which is making it very slow. If I remove the inline query then it runs very well.
The query is:
EXPLAIN
SELECT t.service_date,
t.service_time,
(SELECT js.modified_date FROM rej_job_status js WHERE js.booking_id=b.booking_id ORDER BY id DESC LIMIT 1) `cancel_datetime`,
b.booking_id,
b.ref_booking_id,
b.phone, b.city,
b.booking_time,
CONCAT(rc.firstname," ",rc.lastname) customer_name,
rc.phone_no,
rs.service_id,
rs.service_name,
rct.city_name
FROM rej_job_details t
JOIN rej_booking b ON t.booking_id = b.booking_id
JOIN rej_customer rc ON rc.customer_id = b.customer
JOIN rej_service rs ON t.service_id = rs.service_id
JOIN rej_city rct ON rct.city_id=b.city
WHERE t.act_status = 0 AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
ORDER BY b.booking_time DESC
LIMIT 0 , 50
The explain plan shows this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY b ALL PRIMARY NULL NULL NULL 32357 Using where; Using filesort
1 PRIMARY rct eq_ref PRIMARY PRIMARY 4 crmdb.b.city 1 NULL
1 PRIMARY t ref booking_id booking_id 4 crmdb.b.booking_id 1 Using where
1 PRIMARY rs eq_ref PRIMARY,service_id PRIMARY 4 crmdb.t.service_id 1 NULL
1 PRIMARY rc eq_ref PRIMARY PRIMARY 4 crmdb.b.customer 1 Using where
2 DEPENDENT SUBQUERY js index NULL PRIMARY 4 NULL 1 Using where
a) How to read this explain plan and know what it means?
b) How can I optimize this query?

booking_time is hiding inside a function, so INDEX(booking_time) cannot be used. That leads to a costly table scan.
AND DATE(b.booking_time) >= '2016-06-01'
AND DATE(b.booking_time) <= '2016-06-14'
-->
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-15' -- note 3 differences in this line
Or, this might be simpler (by avoiding second date calculation):
AND b.booking_time >= '2016-06-01'
AND b.booking_time < '2016-06-01' + INTREVAL 2 WEEK
In the EXPLAIN, I expect the 'ALL' to become 'range', and 'Filesort' to vanish.

To understand the full explain-plan, you should read the documentation, but the most important information it includes is the indexes mysql uses, or, usually more revealing, which it doesn't use.
For your DEPENDENT SUBQUERY (that is your "inline query"), it doesn't use a good index, which makes your query slow, so you need to add the index rej_job_status(booking_id) on your table rej_job_status.
Create it, test it and check your explain plan again, it should then list that new index under key for your DEPENDENT SUBQUERY.
Another optimization might be to add an index rej_booking(booking_time) for your table rej_booking. It depends on your data if it improves the query, but you should try it, since right now, mysql doesn't use an index for that selection.

Related

Why is this Mysql statement slow AND using wrong indices

the SQL at the bottom is super slow ~12-15 seconds. And I don't understand why. Before you read the whole one, just check the first Coalesce part of the first Coalesce. If I replace it with "0", then it is super fast (0.0051s). If I only query the contained Subquery with set-in "client_id", it is super fast, too.
The table "rest_io_log" which is used in the Coalesce contains a lot of entries (more than 5 Million) and therefore got lots of indices to check the contents fast.
The two most important indices for this topic are these:
timestamp - contains only this column
account_id, client_id, timestamp - contains these 3 columns in this order
When I prepend this statement with an "EXPLAIN" it says:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
cl
NULL
range
PRIMARY, index_user_id
index_user_id
485
NULL
2
100.00
Using index condition
1
PRIMARY
rates
NULL
eq_ref
PRIMARY
PRIMARY
4
oauth2.cl.rate
1
100.00
NULL
4
DEPENDENT SUBQUERY
traffic
NULL
ref
unique, unique_account_id_client_id_date, index_date, index_account_id_warning_100_client_id_date
unique
162
const, const, oauth2.cl.client_id
1
100.00
Using index condition
3
DEPENDENT SUBQUERY
traffic
NULL
ref
unique, unique_account_id_client_id_date, index_account_id_warning_100_client_id_date
unique_account_id_client_id_date
158
const, oauth2.cl.client_id
56
100.00
Using where; Using index; Using filesort
2
DEPENDENT SUBQUERY
rest_io_log
NULL
index
index_client_id, index_account_id_client_id_timestamp, index_account_id_timestamp, index_account_id_duration_timestamp, index_account_id_statuscode, index_account_id_client_id_statuscode, index_account_id_rest_path, index_account_id_client_id_rest_path
timestamp
5
NULL
2
5.00
Using where
on the bottem line we can see there are tons of indices available and it chooses "timestamp" which is actually not the best choice because account_id and client_id is available, too.
If I enforce the right index by adding "USE INDEX (index_account_id_client_id_timestamp)" to the subquery the execution time is reduced to 8 seconds and the EXPLAIN looks like this:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
cl
NULL
range
PRIMARY, index_user_id
index_user_id
485
NULL
2
100.00
Using index condition
1
PRIMARY
rates
NULL
eq_ref
PRIMARY
PRIMARY
4
oauth2.cl.rate
1
100.00
NULL
4
DEPENDENT SUBQUERY
traffic
NULL
ref
unique, unique_account_id_client_id_date, index_date...
unique
162
const, const, oauth2.cl.client_id
1
100.00
Using index condition
3
DEPENDENT SUBQUERY
traffic
NULL
ref
unique, unique_account_id_client_id_date, index_acco...
unique_account_id_client_id_date
158
const, oauth2.cl.client_id
56
100.00
Using where; Using index; Using filesort
2
DEPENDENT SUBQUERY
rest_io_log
NULL
ref
index_account_id_client_id_timestamp
index_account_id_client_id_timestamp
157
const, oauth2.cl.client_id
1972
100.00
Using where; Using index; Using filesort
SELECT
cl.timestamp AS active_since,
GREATEST
(
COALESCE
(
(
SELECT
timestamp AS last_request
FROM
rest_io_log USE INDEX (index_account_id_client_id_timestamp)
WHERE
account_id = 12345 AND
client_id = cl.client_id
ORDER BY
timestamp DESC
LIMIT
1
),
"0000-00-00 00:00:00"
),
COALESCE
(
(
SELECT
CONCAT(date, " 00:00:00") AS last_request
FROM
traffic
WHERE
account_id = 12345 AND
client_id = cl.client_id
ORDER BY
date DESC
LIMIT
1
),
"0000-00-00 00:00:00"
)
) AS last_request,
(
SELECT
requests
FROM
traffic
WHERE
account_id = 12345 AND
client_id = cl.client_id AND
date=NOW()
) AS traffic_today,
cl.client_id AS user_account_name,
t.rate_name,
t.rate_traffic,
t.rate_price
FROM
clients AS cl
LEFT JOIN
(
SELECT
id AS rate_id,
name AS rate_name,
daily_max_traffic AS rate_traffic,
price AS rate_price
FROM
rates
) AS t
ON cl.rate=t.rate_id
WHERE
cl.user_id LIKE "12345|%"
AND
cl.client_id LIKE "api_%"
AND
cl.client_id LIKE "%_12345"
;
the response of the total query looks like this:
active_since
last_request
traffic_today
user_account_name
rate_name
rate_traffic
rate_price
2019-01-16 15:40:34
2019-04-23 00:00:00
NULL
api_some_account_12345
Some rate name
1000
0.00
2019-01-16 15:40:34
2022-10-27 00:00:00
NULL
api_some_other_account_12345
Some rate name
1000
0.00
Can you help?
Why is this Mysql statement slow
Fetching the same row multiple times. Use a JOIN instead of repeated subqueries.
Use MAX instead of ORDER BY and LIMIT 1:
SELECT MAX(timestamp)
FROM ...
WHERE a=12345 AND c=...
Don't use USE INDEX -- what helps today may hurt tomorrow.
Do you really need to fetch both date and timestamp?? Don't they mean the same thing? Or does the data entry need to simplify those down to a single column?
CONCAT(date, " 00:00:00") is identical to date. Making that change, let's you combine those first two subqueries.
cl.client_id LIKE "api_%" AND cl.client_id LIKE "%_12345" ==> cl.client_id LIKE 'api%12345'.
Doesn't use LEFT JOIN ( SELECT ... ) ON ... Instead, simply do LEFT JOIN rates ON ...
Suggested indexes:
rest_io_log: INDEX(account_id, client_id, timestamp)
clients: INDEX(user_id, client_id, rate, timestamp)
rates: INDEX(rate_id, rate_name, rate_traffic, rate_price) -- assuming the above change

How to avoid Using temporary; Using filesort on MySql Query

Currently I am facing a rather slow query on a website, which also slows down the server on more traffic. How can I rewrite the query or what index can I write to avoid "Using temporary; Using filesort"? Without "order by" everything works fast, but without the wanted result/order.
SELECT cams.name, models.gender, TIMESTAMPDIFF(YEAR, models.birthdate, CURRENT_DATE) AS age, lcs.viewers
FROM cams
LEFT JOIN cam_tags ON cams.id = cam_tags.cam_id
INNER JOIN tags ON cam_tags.tag_id = tags.id
LEFT JOIN model_cams ON cams.id = model_cams.cam_id
LEFT JOIN models ON model_cams.model_id = models.id
LEFT JOIN latest_cam_stats lcs ON cams.id = lcs.cam_id
WHERE tags.name = '?'
ORDER BY lcs.time_stamp_id DESC, lcs.viewers DESC
LIMIT 24 OFFSET 96;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
tags
NULL
const
PRIMARY,tags_name_uindex
tags_name_uindex
766
const
1
100
Using temporary; Using filesort
1
SIMPLE
cam_tags
NULL
ref
PRIMARY,cam_tags_cams_id_fk,cam_tags_tags_id_fk
cam_tags_tags_id_fk
4
const
75565047
100
Using where
1
SIMPLE
cams
NULL
eq_ref
PRIMARY
PRIMARY
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
model_cams
NULL
eq_ref
model_platforms_platforms_id_fk
model_platforms_platforms_id_fk
4
cam_tags.cam_id
1
100
NULL
1
SIMPLE
models
NULL
eq_ref
PRIMARY
PRIMARY
4
model_cams.model_id
1
100
NULL
1
SIMPLE
lcs
NULL
eq_ref
PRIMARY,latest_cam_stats_cam_id_time_stamp_id_viewers_index
PRIMARY
4
cam_tags.cam_id
1
100
NULL
There are many cases where it is effectively impossible to avoid "using temporary, using filesort".
"Filesort" does not necessarily involve a "file"; it is often done in RAM. Hence performance may not be noticeably hurt.
That said, I will assume your real question is "How can this query be sped up?".
Most of the tables are accessed via PRIMARY or "eq_ref" -- all good. But the second table involves touching an estimated 75M rows! Often that happens as the first table, not second. Hmmmm.
Sounds like cam_tags is a many-to-many mapping table? And it does not have any index starting with name? See this for proper indexes for such a table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Since the WHERE and ORDER BY reference more than one table, it is essentially impossible to avoid "using temporary, using filesort".
Worse than that, it needs to find all the ones with "name='?'", sort the list, skip 96 rows, and only finally deliver 24.

Radically slower subquery without autokey in MySQL 5.7 vs 5.6 - any way to force index?

I have a datehelper table with every YYYY-MM-DD as DATE between the years 2000 and 2100. To this I'm joining a subquery for all unit transactions. unit.end is a DATETIME so my subquery simplifies it to DATE and uses that to join to the datehelper table.
In 5.6 this query takes a couple seconds to run a massive amount of transactions, and it derives a table that is auto keyed based on the DATE(unit.end) in the subquery and uses that to join everything else fairly quickly.
In 5.7, it takes 600+ seconds and I can't get it to derive a table or follow the much better execution plan that 5.6 used. Is there a flag I need to set or some way to prefer the old execution plan?
Here's the query:
EXPLAIN SELECT datehelper.id AS date, MONTH(datehelper.id)-1 AS month, DATE_FORMAT(datehelper.id,'%d')-1 AS day,
IFNULL(SUM(a.total),0) AS total, IFNULL(SUM(a.tax),0) AS tax, IFNULL(SUM(a.notax),0) AS notax
FROM datehelper
LEFT JOIN
(SELECT
DATE(unit.end) AS endDate,
getFinalPrice(unit.id) AS total, tax, getFinalPrice(unit.id)-tax AS notax
FROM unit
INNER JOIN products ON products.id=unit.productID
INNER JOIN prodtypes FORCE INDEX(primary) ON prodtypes.id=products.prodtypeID
WHERE franchiseID='1' AND void=0 AND checkout=1
AND end BETWEEN '2020-01-01' AND DATE_ADD('2020-01-01', INTERVAL 1 YEAR)
AND products.prodtypeID NOT IN (1,10)
) AS a ON a.endDate=datehelper.id
WHERE datehelper.id BETWEEN '2020-01-01' AND '2020-12-31'
GROUP BY datehelper.id ORDER BY datehelper.id;
5.6 result (much faster):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY datehelper range PRIMARY PRIMARY 3 NULL 365 Using where; Using index
1 PRIMARY <derived2> ref <auto_key0> <auto_key0> 4 datehelper.id 10 NULL
2 DERIVED prodtypes index PRIMARY PRIMARY 4 NULL 10 Using where; Using index
2 DERIVED products ref PRIMARY,prodtypeID prodtypeID 4 prodtypes.id
9 Using index
2 DERIVED unit ref productID,end,void,franchiseID productID 9 products.id 2622 Using where
5.7 result (much slower, no auto key found):
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE datehelper NULL range PRIMARY PRIMARY 3 NULL 366 100.00 Using where; Using index
1 SIMPLE unit NULL ref productID,end,void,franchiseID franchiseID 4 const 181727 100.00 Using where
1 SIMPLE products NULL eq_ref PRIMARY,prodtypeID PRIMARY 8 barkops3.unit.productID 1 100.00 Using where
1 SIMPLE prodtypes NULL eq_ref PRIMARY PRIMARY 4 barkops3.products.prodtypeID 1 100.00 Using index
I found the problem. It was the optimizer_switch 'derived_merge' flag which is new to 5.7.
https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
This flag overrides materialization of derived tables if the optimizer thinks the outer WHERE can be pushed down into a subquery. In this case, that optimization was enormously more costly than joining a materialized table on an auto_key.

I need a MySQL query to be optimized

I have a query running on MySQL DB and is very slow.
Is there anyway I can optimize the following
SELECT mcm.merchant_name,
( ( Sum(ot.price) + Sum(ot.handling_charges)
+ Sum(ot.sales_tax_recd)
+ Sum(ot.shipping_cost) - Sum(ot.sales_tax_payable) ) -
Sum(im.break_even_cost) ) AS PL,
ot.merchant_id
FROM order_table ot,
item_master im,
merchant_master mcm
WHERE ot.item_id = im.item_id
AND ot.merchant_id = mcm.merchant_id
GROUP BY mcm.merchant_name
ORDER BY pl DESC
LIMIT 0, 10;
The Above Query is taking more than 200 seconds to execute.
Explain Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ot ALL "merchant_id,item_id" NULL NULL NULL 507910 "Using temporary; Using filesort"
1 SIMPLE mcm eq_ref "PRIMARY,merchant_id" PRIMARY 4 stores.ot.merchant_id 1
1 SIMPLE im eq_ref "PRIMARY,item_id" PRIMARY 4 stores.ot.item_id 1
Also, I got Error-1003 when I run EXPLAIN EXTENDED
use mysql explain plan to find out why it is taking so long and then maybe create some indexes or change your code.
Update
Based upon this make sure you have an composite index on the order_table on merchant_id,item_id

Mysql Query taking more time to execute?

SELECT BB.NAME BranchName,VI.NAME Village,COUNT(BAC.CBSACCOUNTNUMBER) "No.Of Accounts",
SUM(BAC.CURRENTBALANCE) SumOfAmount,
SUM(CASE WHEN transactiontype = 'C' THEN amount ELSE 0 END) AS CreditTotal,
SUM(CASE WHEN transactiontype = 'D' THEN amount ELSE 0 END) AS DebitTotal,
SUM(CASE WHEN transactiontype = 'C' THEN amount WHEN transactiontype = 'D' THEN -1 * amount ELSE 0 END) AS CurrentBalance
FROM CUSTOMER CU,APPLICANT AP,ADDRESS AD,VILLAGE VI,BANKBRANCH BB,BANKACCOUNT BAC
LEFT OUTER JOIN accounttransaction ACT ON ACT.BANKACCOUNT_CBSACCOUNTNUMBER=BAC.CBSACCOUNTNUMBER
AND DATE_FORMAT(ACT.TRANDATE,'%Y-%m-%d')<='2013-05-09'
AND DATE_FORMAT(BAC.ACCOUNTOPENINGDATE,'%Y-%m-%d') <'2013-05-09'
AND ACT.BANKACCOUNT_CBSACCOUNTNUMBER IS NOT NULL
WHERE CU.CODE=AP.CUSTOMER_CODE AND BAC.ENTITY='CUSTOMER' AND BAC.ENTITYCODE=CU.CODE
AND AD.ENTITY='APPLICANT' AND AD.ENTITYCODE=AP.CODE
AND AD.VILLAGE_CODE=VI.CODE AND VI.STATE_CODE=AD.STATE_CODE AND VI.DISTRICT_CODE=AD.DISTRICT_CODE
AND VI.BLOCK_CODE=AD.BLOCK_CODE AND VI.PANCHAYAT_CODE=AD.PANCHAYAT_CODE
AND CU.BANKBRANCH_CODE=BB.CODE AND BAC.CBSACCOUNTNUMBER IS NOT NULL AND ACT.TRANSACTIONTYPE IS NOT NULL
GROUP BY BB.NAME,VI.NAME LIMIT 10;
and
below is my explain plan
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE AD index ADDRESS_ENTITYCODE ADDRESS_ENTITYCODE 598 NULL 47234 Using where; Using index; Using temporary; Using filesort
1 SIMPLE VI ref PRIMARY PRIMARY 62 fiserveraupgb.AD.VILLAGE_CODE 1 Using where
1 SIMPLE AP eq_ref PRIMARY,AppCodeIndex PRIMARY 62 fiserveraupgb.AD.ENTITYCODE 1
1 SIMPLE BAC ref BANKACCOUNT_ENTITYCODE BANKACCOUNT_ENTITYCODE 63 fiserveraupgb.AP.CUSTOMER_CODE 1 Using where; Using index
1 SIMPLE CU eq_ref PRIMARY,CustCodeIndex PRIMARY 62 fiserveraupgb.AP.CUSTOMER_CODE 1
1 SIMPLE BB ref PRIMARY,Bankbranch_CodeName PRIMARY 62 fiserveraupgb.CU.BANKBRANCH_CODE 1
1 SIMPLE ACT index NULL accounttransaction_sysidindes 280 NULL 22981 Using where; Using index; Using join buffer
Mysql server version 5.5 and I am using mysql workbench below is my query it is taking 13 min to execute, please suggestion the best method I have created the indexes for all the columns which are involved.
You mainly need indexes on columns that are used in joins and in your where clause. Other indexes don't add value for your select statements and slow down your inserts and updates.
In this case, you're using the column values in functions. Due to this the indexes cannot be used efficiently.
An expression like this is very inefficient:
DATE_FORMAT(ACT.TRANDATE,'%Y-%m-%d')<='2013-05-09'
It causes a lot of string conversions, because all TRANDATES are converted to a string representation of their value. These values need to be temporarily stored and are not indexed, so apart from the conversion, any index on ACT.TRANDATE is no longer used. That is probably causing the rather expensive 'Using join buffer' at the end of your explain plan.
Rather convert the string '2013-05-09' to a date value and use this value as a constant in or parameter for your query.
Another thing to do, is create not separate indexes for separate columns, but one index for a group of columns that is used in a where and/or join. For instance this part:
AD.ENTITY = 'APPLICANT' AND
AD.ENTITYCODE = AP.CODE AND
AD.VILLAGE_CODE = VI.CODE
Having one index on the columns ENTITY, ENTITYCODE, and VILLAGE_CODE together would be more efficient than having a separate index for each of them. And it may help to include the other columns as well.
And last: If a column or combination of columns is guaranteed to be unique, ad a unique index. It is slightly faster in selects.
A general advise: Don't mix old join syntax with ansi joins. It makes your query hard to read.
These hints (apart from the last one) should speed up your query, but it can still be slow, depending on the amount of data, the hardware and the load.