Related
I am trying to debug a MYSQL Query for its taking time around 4-5 minutes on one server and it fails on another MYSQL server.
The query is like this:
SELECT
*,
TM.tutor_id as tutor_id,
TIMESTAMPDIFF( YEAR, birthdate, CURDATE( ) ) AS age
from
tutor_master as TM
LEFT JOIN category_master as CM
on TM.category = CM.category_id
LEFT JOIN tutor_expected_rate TER
ON FIND_IN_SET( TER.tutor_id, TM.tutor_id ) > 0
LEFT JOIN admin_shortlist_master SHM
ON TM.tutor_id = SHM.tutor_id
AND (SHM.user_auth_id = 'c84258e9c39059a89ab77d846ddab909')
INNER JOIN level_master LM
ON FIND_IN_SET(LM.level_id, TER.level_id) > 0
WHERE
1=1
GROUP BY
TM.tutor_id
order by
TM.is_priority DESC,
TM.tutor_id DESC
LIMIT
0, 10
and MYSQL explain result is like this:
1 SIMPLE TM NULL ALL PRIMARY,tutor_id NULL NULL NULL 27530 100.00 Using temporary; Using filesort
1 SIMPLE SHM NULL ref tutor_id,user_auth_id user_auth_id 257 const 1 100.00 Using where
1 SIMPLE CM NULL eq_ref PRIMARY PRIMARY 4 toprecru_portal_db.TM.category 1 100.00 NULL
1 SIMPLE LM NULL ALL NULL NULL NULL NULL 11 100.00 Using join buffer (Block Nested Loop)
1 SIMPLE TER NULL ALL NULL NULL NULL NULL 13223 100.00 Using where; Using join buffer (Block Nested Loop)
Now my question is, it there any apparent mistake, which I am overlooking to make the query more efficient.
Thanks.
This MySQL query takes over 6 minutes to run. The bottleneck is the last piece which generates the "Activity Completed" column. The reason I've written it as a subquery is because if the itemtype = "course" I need to find the MAX date of all other item types for each user in each class, not the MAX date of the whole table column. I'm sure there's a much better way, but I don't know what it is. Can anyone help me?
SELECT DISTINCT
u.firstname AS 'First' , u.lastname AS 'Last', c.fullname AS 'Course',
IF (gi.itemtype = 'course',
CONCAT(c.fullname," COURSE TOTAL"),
gi.itemname) AS 'Item Name',
IFNULL(CONCAT(ROUND(gg.finalgrade,2),'/',ROUND(gi.grademax,2)), '0')
AS 'Points Earned',
IFNULL(CONCAT(ROUND((gg.finalgrade/gi.grademax),2)*100,'%'), '0%')
AS 'Percentage',
(SELECT
IF(`Percentage`>=90, 'A',
IF(`Percentage`<90 AND `Percentage`>=80, 'B',
IF(`Percentage`<80 AND `Percentage`>=70, 'C',
IF(`Percentage`<70 AND `Percentage`>=60, 'D', 'F')))))
AS 'Grade',
IF(gi.itemtype = 'course',
(SELECT
FROM_UNIXTIME(GREATEST(MAX(gg.timemodified), MAX(gg.timecreated)),
'%m/%d/%Y')
FROM mdl_course c
JOIN mdl_grade_items gi
JOIN mdl_grade_grades gg
WHERE gg.userid=u.id
AND gi.courseid=c.id),
CASE
WHEN gg.timecreated IS NULL AND gg.timemodified IS NULL THEN '-'
WHEN gg.timecreated IS NULL AND gg.timemodified IS NOT NULL THEN FROM_UNIXTIME(gg.timemodified, '%m/%d/%Y')
WHEN gg.timecreated IS NOT NULL AND gg.timemodified IS NULL THEN FROM_UNIXTIME(gg.timecreated, '%m/%d/%Y')
WHEN gg.timecreated<=gg.timemodified THEN FROM_UNIXTIME(gg.timemodified,'%m/%d/%Y')
END)
AS 'Activity Completed'
FROM mdl_user u
JOIN mdl_user_enrolments ue on ue.userid=u.id
JOIN mdl_enrol e ON e.id=ue.enrolid
JOIN mdl_course c on c.id = e.courseid
JOIN mdl_context AS ctx ON ctx.instanceid = c.id
JOIN mdl_role_assignments AS ra ON ra.contextid = ctx.id
JOIN mdl_role AS r ON r.id = e.roleid
JOIN mdl_grade_items AS gi ON gi.courseid=c.id
LEFT JOIN mdl_grade_grades AS gg ON gg.userid=u.id
AND gg.itemid=gi.id
WHERE ue.status='0'
AND gi.hidden = '0'
AND gi.itemtype <> 'category'
AND gg.excluded = '0'
OR gg.excluded IS NULL
ORDER BY u.lastname, c.fullname, gi.itemtype DESC, gi.sortorder ASC
I want the output to look like this:
First, Last, Course, Item Name, Points Earned, Percentage, Grade, Activity Completed
Test, Student1, Class1, Assignment 1, 10/10, 100%, A, 1/2/2013
Test, Student1, Class1, Assignment 2, 15/15, 100%, A, 3/4/2013
Test, Student1, Class1, Assignment 3, 0/15, 0%, F, -
Test, Student1, Class1, Class1 COURSE TOTAL, 25/40, 63%, D, 3/4/2013
Test, Student1, Class2, Assignment1, 20/30, 66%, D, 2/12/2013
Test, Student1, Class2, Assignment2, 1/5, 20%, F, 1/31/2013
Test, Student1, Class2, Class2 COURSE TOTAL, 21/35, 60%, D, 2/12/2013
Test, Student2, Class1, Assignment1, etc...
This is what I get when I prepend EXPLAIN to the query.
I couldn't copy the column headers. They are:
id, select_type, table, type, possible_keys, key, key_len, ref, rows, extra
1 PRIMARY u ALL PRIMARY 449 Using temporary; Using filesort
1 PRIMARY ue ref mdl_userenro_enruse_uix,mdl_userenro_enr_ix,mdl_userenro_use_ix mdl_userenro_use_ix 8 skol_dev.u.id 1 Using where
1 PRIMARY e eq_ref PRIMARY,mdl_enro_cou_ix PRIMARY 8 skol_dev.ue.enrolid 1
1 PRIMARY r eq_ref PRIMARY PRIMARY 8 skol_dev.e.roleid 1 Using index
1 PRIMARY c eq_ref PRIMARY PRIMARY 8 skol_dev.e.courseid 1
1 PRIMARY ctx ref PRIMARY,mdl_cont_ins_ix mdl_cont_ins_ix 8 skol_dev.c.id 1 Using where; Using index
1 PRIMARY gi ref mdl_graditem_itenee_ix,mdl_graditem_cou_ix mdl_graditem_cou_ix 9 skol_dev.e.courseid 7 Using where
1 PRIMARY ra ref mdl_roleassi_con_ix mdl_roleassi_con_ix 8 skol_dev.ctx.id 4 Using index
1 PRIMARY gg eq_ref mdl_gradgrad_useite_uix,mdl_gradgrad_ite_ix,mdl_gradgrad_use_ix mdl_gradgrad_useite_uix 16 skol_dev.u.id,skol_dev.gi.id 1 Using where
3 DEPENDENT SUBQUERY gg ref mdl_gradgrad_useite_uix,mdl_gradgrad_use_ix mdl_gradgrad_use_ix 8 skol_dev.u.id 6
3 DEPENDENT SUBQUERY c index PRIMARY mdl_cour_cat_ix 8 124 Using index; Using join buffer
3 DEPENDENT SUBQUERY gi ref mdl_graditem_cou_ix mdl_graditem_cou_ix 9 skol_dev.c.id 7 Using where; Using index
so I have this:
EXPLAIN (SELECT f.prn AS 'prn', f.id AS 'xnyz', 'f' AS 'omnf', f.wwn, (COUNT(fr.rid) > 0 OR COUNT(frr.rid) > 0) AS 'rrr', fr.rid AS 'rid', f.xyzz
FROM scmf f
LEFT JOIN scmr fr
ON fr.omnf = 'f' AND fr.xnyz = f.id
LEFT JOIN scmr frr
ON frr.fid = f.id
WHERE f.prn IN (0,43570,43571,43572,43573,43574,43575) AND f.xyzz IN (490923) AND f.id IN (0,43570,43571,43572,43573,43574,43575)
GROUP BY f.id)
UNION
(SELECT fi.fxyz AS 'prn', fi.iix AS 'xnyz', 'n' AS 'omnf', fi.wwn,(COUNT(fir.rid) > 0) AS 'rrr', 0 AS 'rid', fif.xyzz
FROM scmf_item fi
LEFT JOIN scmr fir ON fir.omnf = 'n' AND fir.xnyz = fi.iix
LEFT JOIN scmf fif ON fif.id = fi.fxyz
WHERE fi.fxyz IN (0,43570,43571,43572,43573,43574,43575) AND fif.xyzz IN (490923)
GROUP BY fi.iix
)
ORDER BY wwn ASC;
And an explain reveals:
1 PRIMARY f range PRIMARY,xyzz,prn PRIMARY 4 7 Using where
1 PRIMARY fr ref omnf_id omnf_id 9 const,s.f.id 1 Using index
1 PRIMARY frr ref Index1 Index1 4 s.f.id 1 Using index
2 UNION fi range fxyz fxyz 4 24 Using where; Using temporary; Using filesort
2 UNION fif eq_ref PRIMARY,xyzz PRIMARY 4 s.fi.fxyz 1 Using where
2 UNION fir ref omnf_id omnf_id 9 const,s.fi.iix 1 Using index
UNION RESULT <union1,2> ALL Using filesort
Why is the UNION clause using filesort/have type as ALL
is this bad?
Can I use indices to fix this?
the following query executes in 1.6 seconds
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
adding AND shops.shop_id=86 to the final WHERE clause causes the query to execute in 292 seconds:
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id AND
shops.shop_id=86
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
I would have thought limiting the shops table with AND shops.shop_id=86 would reduce execution time. Instead, execution time appears to depend upon the number of rows in the products table with products.shop_id equal to the specified shops.shop_id. There are about 34K rows in the products table with products.shop_id=86, and execution time is 292 seconds. For products.shop_id=50, there are about 28K rows, and execution time is 210 seconds. For products.shop_id=175, there are about 2K rows, and execution time is 2.8 seconds. What is going on?
EXPLAIN EXTENDED for the 1.6 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1203 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 1203 100.00
3 DERIVED sex ALL product_id_2,product_id NULL NULL NULL 526846 75.00 Using where; Using temporary; Using filesort
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED shops eq_ref PRIMARY PRIMARY 4 mydatabase.p1.shop_id 1 100.00
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
-----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select `mydatabase`.`shops`.`shop` AS `shop`,`mydatabase`.`shops`.`shop_id` AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`fav4`.`product_id` = `mydatabase`.`sex`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`mydatabase`.`shops`.`shop_id` = `mydatabase`.`p1`.`shop_id`) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by `mydatabase`.`shops`.`shop`,`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+------
EXPLAIN EXTENDED for the 292 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 36 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 36 100.00
3 DERIVED shops const PRIMARY PRIMARY 4 1 100.00 Using temporary; Using filesort
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED sex eq_ref product_id_2,product_id product_id_2 5 mydatabase.p1.product_id 1 100.00 Using where
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select 'shop.nordstrom.com' AS `shop`,'86' AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`fav4`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`p1`.`shop_id` = 86) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by 'shop.nordstrom.com',`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+-----
I am running MySQL client version: 5.1.56. The shops table has a primary index on shop_id:
Action Keyname Type Unique Packed Column Cardinality Collation Null Comment
Edit Drop PRIMARY BTREE Yes No shop_id 163 A
I have analyzed the shop table but this did not help.
I notice that if I remove the LEFT JOIN the difference in execution times drops to 0.12 seconds versus 0.28 seconds.
Cez's solution, namely to use the 1.6-second version of the query and remove irrelevant results by adding rowed_results.shop_dummy=86 to the outer query (as below), executes in 1.7 seconds. This circumvents the problem, but the mystery remains why 292-second query is so slow.
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND sex.sex=0
INNER JOIN shops ON shops.shop_id = p1.shop_id
WHERE sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7) AND
rowed_results.shop_dummy=86;
After the chat room, and actually creating tables/columns to match the query, I've come up with the following query.
I have started my inner-most query to be on the sex, product (for shop_id) and favorites table. Since you described that ProductX at ShopA = Product ID = 1 but same ProductX at ShopB = Product ID = 2 (example only), each product is ALWAYS unique per shop and never duplicated. That said, I can get the product and shop_id WITH the count of favorites (if any) at this query, yet group on just the product_id .. as shop_id won't change per product I am using MAX(). Since you are always looking by a date of "yesterday" and gender (sex=0 female), I would have the SEX table indexed on ( date, sex, product_id )... I would guess you are not adding 1000's of items every day... Products obviously would have an index on product_id (primary key), and favorites SHOULD have an index on product_id.
From that result (alias "sxFav") we can then do a direct join to the sex and products table by that "Product_ID" to get any additional information you may want, such as name of shop, date product added, product description, etc. This result is then ordered by the shop_id the product is being sold from, date and finally product ID (but you may consider grabbing a description column at inner query and using that as sort-by). This results in alias "PreQuery".
With the order being all proper by shop, we can now add the #MySQLVariable references to get each product assigned a row number similar to how you originally attempted. However, only reset back to 1 when a shop ID changes.
SELECT
PreQuery.*,
#num := IF( #current_shop_id = PreQuery.shop_id, #num +1, 1 ) AS RowPerShop,
#current_shop_id := PreQuery.shop_id AS shop_dummy
from
( SELECT
sxFav.product_id,
sxFav.shop_id,
sxFav.Favorites_Count
from
( SELECT
sex.product_id,
MAX( p.shop_id ) shop_id,
SUM( CASE WHEN F.current = 1 AND F.closeted = 1 THEN 1
WHEN F.current = 1 AND F.closeted = 0 THEN -1
ELSE 0 END ) AS favorites_count
from
sex
JOIN products p
ON sex.Product_ID = p.Product_ID
LEFT JOIN Favorites F
ON sex.product_id = F.product_ID
where
sex.date >= subdate( now(), interval 1 day)
and sex.sex = 0
group by
sex.product_id ) sxFav
JOIN sex
ON sxFav.Product_ID = sex.Product_ID
JOIN products p
ON sxFav.Product_ID = p.Product_ID
order by
sxFav.shop_id,
sex.date,
sxFav.product_id ) PreQuery,
( select #num :=0,
#current_shop_id := 0 ) as SQLVars
Now, if you are looking for specific "paging" information (such as 7 entries per shop), wrap the ENTIRE query above into something like...
select * from ( entire query above ) where RowPerShop between 1 and 7
(or between 8 and 14, 15 and 21, etc as needed)
or even
RowPerShop between RowsPerPage*PageYouAreShowing and RowsPerPage*(PageYouAreShowing +1)
You should move the shops.shop_id=86 to the JOIN condition for shops. No reason to put it outside the JOIN, you run the risk of MySQL JOINing first, then filtering. A JOIN can do the same job the a WHERE clause does, especially if you are not referencing other tables.
....
INNER JOIN shops ON shops.shop_id = p1.shop_id AND shops.shop_id=86
....
Same thing with the sex join:
...
INNER JOIN shops ON shops.shop_id = p1.shop_id
AND sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
...
Derived tables are great, but they have no indexes on them. Usually this doesn't matter since they are generally in RAM. But between filtering and sorting with no indexes, things can add up.
Note that in the second query that take much longer, the table processing order changes. The shop table is at the top in the slow query and the p1 table retrieves 11799 rows instead of 1 row in the fast query. It also doesn't use the primary key any more. That's likely where your problem is.
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
Judging by the discussion, the query planner is performing badly when specifying the shop at a lower level.
Add rowed_results.shop_dummy=86 to the outer query to get the results that you are looking for.
I have a query (see below) that I have a custom developed UDF that is used to calculate whether or not certain points are within a polygon (first query in UNION) or circular (second query in UNION) shape.
select e.inquiry_match_type_id
, a.geo_boundary_id
, GeoBoundaryContains(c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, a.lat, a.lon, a.geo_boundary_vertex_id ) in_out
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary_vertex a
join geo_boundary b on b.geo_boundary_id = a.geo_boundary_id
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id left
outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id = 3
and e.expiration_date >= now()
group by
a.geo_boundary_id
UNION
select e.inquiry_match_type_id
, b.geo_boundary_id
, GeoBoundaryContains( c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, b.centroid_lat, b.centoid_lon, b.radius ) in_out
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id
left outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id = 2
and e.expiration_date >= now()
group by
b.geo_boundary_id
When I run an explain for the query I get the following:
id select_type table type possible_keys key key_len ref rows Extra
------ -------------- ---------- ------- --------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------- ---------- ------------------------ ------- -------------------------------
1 PRIMARY d const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
1 PRIMARY c ref PRIMARY,fk_mtp_idx_geo_boundary_mtp_idx,fk_mtp_idx_geo_boundary_geo_boundary,fk_mtp_idx_geo_boundary_mtp_mem_idx,fk_mtp_idx_geo_boundary_geo_boundary_mem fk_mtp_idx_geo_boundary_mtp_idx 4 const 9
1 PRIMARY b eq_ref PRIMARY,fk_geo_boundary_inquiry,fk_geo_boundary_geo_boundary_type PRIMARY 4 gothim.c.geo_boundary_id 1 Using where
1 PRIMARY e eq_ref PRIMARY PRIMARY 4 gothim.b.inquiry_id 1 Using where
1 PRIMARY f ref fk_inquiry_device_mem_inquiry fk_inquiry_device_mem_inquiry 4 gothim.e.inquiry_id 2
1 PRIMARY a ref fk_geo_boundary_vertex_geo_boundary fk_geo_boundary_vertex_geo_boundary 4 gothim.b.geo_boundary_id 11 Using where
2 UNION d const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
2 UNION c ref PRIMARY,fk_mtp_idx_geo_boundary_mtp_idx,fk_mtp_idx_geo_boundary_geo_boundary,fk_mtp_idx_geo_boundary_mtp_mem_idx,fk_mtp_idx_geo_boundary_geo_boundary_mem fk_mtp_idx_geo_boundary_mtp_idx 4 const 9
2 UNION b eq_ref PRIMARY,fk_geo_boundary_inquiry,fk_geo_boundary_geo_boundary_type PRIMARY 4 gothim.c.geo_boundary_id 1 Using where
2 UNION e eq_ref PRIMARY PRIMARY 4 gothim.b.inquiry_id 1 Using where
2 UNION f ref fk_inquiry_device_mem_inquiry fk_inquiry_device_mem_inquiry 4 gothim.e.inquiry_id 2
(null) UNION RESULT <union1,2> ALL (null) (null) (null) (null) (null) Using filesort
12 record(s) selected [Fetch MetaData: 1ms] [Fetch Data: 5ms]
Now, I can split the queries up and use the ORDER BY NULL trick to get rid of the filesort however when I attempt to add that to the end of a UNION it doesn't work.
I am considering splitting the query apart into 2 queries or possibly re-writing it completely not to use a UNION (though that is a bit more difficult of course). The other thing I have working against me is that we have this in production and I'd like to limit changes - I would have loved just to be able to add ORDER BY NULL to the end of the query and be done with it, but it doesn't work w/ the UNION.
Any help would be greatly appreciated.
Normally, ORDER BY can be used for the individual queries within a UNION like this:
(
SELECT *
FROM table1, …
GROUP BY
id
ORDER BY
NULL
)
UNION ALL
(
SELECT *
FROM table2, …
GROUP BY
id
ORDER BY
NULL
)
However, as the docs state:
However, use of ORDER BY for individual SELECT statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, the use of ORDER BY in this context is typically in conjunction with LIMIT, so that it is used to determine the subset of the selected rows to retrieve for the SELECT, even though it does not necessarily affect the order of those rows in the final UNION result. If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it will have no effect anyway.
This is of course a smart move, however, not too smart, since they forgot to optimize away the ordering behavior of GROUP BY as well.
So as for now, you should add a very high LIMIT to your individual queries:
(
SELECT *
FROM table1, …
GROUP BY
id
ORDER BY
NULL
LIMIT 100000000
)
UNION ALL
(
SELECT *
FROM table2, …
GROUP BY
id
ORDER BY
NULL
LIMIT 100000000
)
I'll post it as a bug to MySQL, hope they'll fix it in the next release, but meanwhile you could use this solution.
Note that a similar solution (using TOP 100%) was used to force ordering of the subqueries in SQL Server 2000, however, it stopped working in 2005 (ORDER BY has no effect in subqueries with TOP 100% for the optimizer).
It is safe to use it though since it won't break your queries even if the optimizer behavior changes in the next releases, but will just make them as slow as they are now.
Maybe try something like
SELECT *
FROM
(
[your entire query here]
) DerivedTable
ORDER BY NULL
I've never used MySQL so forgive me if I'm missing the plot :)
EDIT: What if you run each individual query separately (which, as you say, works), but insert the data into a temporary table. Then, at the end just do a select from the temp table.
Have you tried changing the UNION to UNION ALL?
A UNION tries to remove duplicate rows. In order to do that, it would have to sort the intermediate results what might explain what you are seeing in your execution plan.
From MySQL Union
By default the MySQL UNION removes all
duplicate rows from the result set
even if you don’t explicit using
DISTINCT after the keyword UNION.
If you use UNION ALL explicitly, the
duplicate rows remain in the result
set. You only use this in the cases
that you want to keep duplicate rows
or you are sure that there is no
duplicate rows in the result set.
Edit
I doubt it will make any difference (might even be worse) but could you try following "equivalent" query
select *
from (
select b.geo_boundary_id
, GeoBoundaryContains( c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, b.centroid_lat, b.centoid_lon, b.radius ) in_out
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
where b.geo_boundary_type_id = 2
group by
b.geo_boundary_id
union all
select a.geo_boundary_id
, GeoBoundaryContains(c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, a.lat, a.lon, a.geo_boundary_vertex_id ) in_out
from geo_boundary_vertex a
join geo_boundary b on b.geo_boundary_id = a.geo_boundary_id
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
where b.geo_boundary_type_id = 3
group by
a.geo_boundary_id
) s
inner join (
select e.inquiry_match_type_id
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id left
outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id IN (2, 3)
and e.expiration_date >= now()
) r on r.geo_boundary_id = s.geo_boundary_id