I am trying to optimize sql query in mysql db. Tried different ways of rewriting it , adding/removing indexes, but nothing seems to decrease the load. Maybe I am missing something.
Query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id,
count(l.id) as num
FROM cities ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 )
WHERE 1 AND ci.main=1
GROUP BY ci.city_id
ORDER BY city
Duration : 2.01sec - 2.20sec
Optimized query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id, count(l.id) as num from
(select ci1.city_name, ci1.city_id, ci1.country_id from cities ci1
where ci1.main=1) as ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 ) GROUP BY ci.city_id ORDER BY city
Duration : 0.82sec - 0.90sec
But i feel that this query can be optimized even more but not getting the ideea how to optimized it. There are 3 tables
Table 1 : countries ( country_id, country_name)
Table 2 : cities ( city_id, city_name, main, country_id)
Table 3 : dancers ( id, country_id, city_id, closed, approved)
I am trying to get all the cities which have main=1 and for each to count all the profiles that are into those cities joining with countries to get the country_name.
Any ideas are welcomed, thank you.
Later edit : - first query explain
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | SIMPLE | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75340 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ci | eq_ref | PRIMARY,state_id_2,state_id,city_name,lat,city_name_shorter,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
| 1 | SIMPLE | co | eq_ref | PRIMARY | PRIMARY | 4 | db.ci.country_id | 1 | Using where |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
Second query explain :
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ref | <auto_key1> | <auto_key1> | 4 | db.co.country_id | 176 | Using where |
| 1 | PRIMARY | l | ref | city_id,closed,approved,city_id_2 | city_id_2 | 4 | ci.city_id | 44 | Using index condition; Using where |
| 2 | DERIVED | ci1 | ALL | NULL | NULL | NULL | NULL | 11765 | Using where |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
#used_by_already query explain :
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | db.co.country_id | 565 | Using where |
| 2 | DERIVED | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75341 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 2 | DERIVED | ci1 | eq_ref | PRIMARY,state_id_2,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
I suggest you try this:
SELECT
co.country_name AS state
, ci.city_name AS city
, ci.city_id
, ci.country_id
, ci.num
FROM (
SELECT
ci1.city_id
, ci1.city_name
, ci1.country_id
, COUNT(l.id) AS num
FROM cities ci1
INNER JOIN dancers l ON l.city_id = ci1.city_id
AND l.closed = 0
AND l.approved = 1
WHERE ci1.main = 1
GROUP BY
ci1.city_id
, ci1.city_name
, ci1.country_id
) AS ci
INNER JOIN countries co ON ci.country_id = co.country_id
;
And that you provide the explain plan output for further analysis if needed. When optimizing knowing what indexes exist, and having he explain plan, are essentials.
Not also, that MySQL does permit non-standard GROUP BY syntax (where only one or some of the columns in the select list are included under the group by clause).
In recent versions of MySQL the default behaviour for GROUP BY has changed to SQL standard syntax (where all "non-aggregating" columns in the select list must be included under the group by clause). While your existing queries use the non-standard group by syntax, the query supplied here dis compliant.
Related
I have an issue when adding an ORDER BY on my query.
without ORDER BY query takes around 26ms, as soon as I add a ORDER BY it's taking around 20s.
I have tried a few different ways but can seem to reduce the time.
Tried FORCE INDEX (PRIMARY)
Tried wrapping SELECT * FROM (...)
Does anyone have any other ideas?
QUERY:
SELECT
orders.id AS order_number,
orders.created AS order_created,
CONCAT_WS(' ', clients.name, office.name) AS client_office,
order_item.code AS product_code,
order_item.name AS product_name,
order_item.description AS product_description,
order_item.sale_price,
jobs.rating AS job_rating,
job_role.start_booking AS book_start,
job_role.end_booking AS book_end,
CONCAT_WS(' ', admin_staffs.first_name, admin_staffs.last_name) AS soul,
admin_roles.name AS role,
services.name AS service,
COUNT(job_item.id) AS total_assets,
COALESCE(orders.project_name, CONCAT_WS(' ', orders.location_unit_no, orders.location_street_number, orders.location_street_number, orders.location_street_name
)
) AS order_title
FROM jobs
LEFT JOIN job_item ON jobs.id = job_item.job_id
LEFT JOIN job_role ON jobs.id = job_role.job_id
LEFT JOIN admin_roles ON admin_roles.id = job_role.role_id
LEFT JOIN services ON services.id = jobs.service_id
LEFT JOIN admin_staffs ON admin_staffs.id = job_role.staff_id
LEFT JOIN order_item ON jobs.item_id = order_item.id
LEFT JOIN orders ON orders.id = order_item.order_id
LEFT JOIN clients ON orders.client_id = clients.id
LEFT JOIN office ON orders.office_id = office.id
LEFT JOIN client_users ON orders.user_id = client_users.id
GROUP BY jobs.id
ORDER BY jobs.order_id DESC
LIMIT 50
EXPLAIN:
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | jobs | ALL | PRIMARY | NULL | NULL | NULL | 49555 | Using temporary; Using filesort |
| 1 | SIMPLE | job_item | ref | job_id | job_id | 5 | Wx3392vf_UCO_app.jobs.id | 8 | Using where; Using index |
| 1 | SIMPLE | job_role | ref | job_id | job_id | 4 | Wx3392vf_UCO_app.jobs.id | 1 | Using where |
| 1 | SIMPLE | admin_roles | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.job_role.role_id | 1 | Using where |
| 1 | SIMPLE | services | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.jobs.service_id | 1 | Using where |
| 1 | SIMPLE | admin_staffs | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.job_role.staff_id | 1 | NULL |
| 1 | SIMPLE | order_item | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.jobs.item_id | 1 | Using where |
| 1 | SIMPLE | orders | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.order_item.order_id | 1 | Using where |
| 1 | SIMPLE | clients | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.client_id | 1 | Using where |
| 1 | SIMPLE | office | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.office_id | 1 | Using where |
| 1 | SIMPLE | client_users | eq_ref | PRIMARY | PRIMARY | 4 | Wx3392vf_UCO_app.orders.user_id | 1 | Using index |
+----+-------------+--------------+--------+----------------+-----------+---------+--------------------------------------+-------+---------------------------------+
Try to start from this query:
SELECT * FROM jobs ORDER BY jobs.order_id DESC LIMIT 50
And then increase its complexity by adding more joins:
SELECT * FROM jobs
LEFT JOIN job_item ON jobs.id = job_item.job_id
ORDER BY jobs.order_id DESC LIMIT 50
etc.
Make profiling at each step and you will be able to find a bottleneck. Probably, it is some TEXT field in one of joined tables that slows down the query or some other reason.
I have a database which consists of three tables, with the following structure:
restaurant table: restaurant_id, location_id, rating. Example: 1325, 77, 4.5
restaurant_name table: restaurant_id, language, name. Example: 1325, 'en', 'Pizza Express'
location_name table: location_id, language, name. Example: 77, 'en', 'New York'
I would like to get the restaurant info in English, sorted by location name and restaurant name, and use the LIMIT clause to paginate the result. So my SQL is:
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
This is terribly slow - so I refined my SQL with deferred JOIN, which make things a lot faster (from over 10 seconds to 2 seconds):
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN (
SELECT r.restaurant_id
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
) r1
ON r.restaurant_id = r1.restaurant_id
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
2 seconds is unfortunately still not very acceptable to the user, so I go and check the EXPLAIN of the my query, and it appears that the slow part is on the ORDER BY clause, which I see "Using temporary; Using filesort". I checked the official reference manual about ORDER BY optimization and I come across this statement:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it may still use indexes to find the rows that match the
WHERE clause. Examples:
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
So for my case, given that the two columns I'm ordering by are from the nonconstant joined tables, index cannot be used. My question is, is there any other approach I can take to speed things up, or what I've done so far is already the best I can achieve?
Thanks in advance for your help!
EDIT 1
Below is the EXPLAIN output with the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | ALL | idx_restaurant_name_1 | NULL | NULL | NULL | 8484 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
Below is the EXPLAIN output without the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | index | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | NULL | 8484 | Using where; Using index |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where; Using index |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
EDIT 2
Below are the DDL of the table. I built them for illustrating this problem only, the real table has much more columns.
CREATE TABLE restaurant (
restaurant_id INT NOT NULL AUTO_INCREMENT,
location_id INT NOT NULL,
rating INT NOT NULL,
PRIMARY KEY (restaurant_id),
INDEX idx_restaurant_1 (location_id)
);
CREATE TABLE restaurant_name (
restaurant_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_restaurant_name_1 (restaurant_id, language),
INDEX idx_restaurant_name_2 (name)
);
CREATE TABLE location_name (
location_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_location_name_1 (location_id, language),
INDEX idx_location_name_2 (name)
);
Based on the EXPLAIN numbers, there could be about 170 "pages" of restaurants (8484/50)? I suggest that that is impractical for paging through. I strongly recommend you rethink the UI. In doing so, the performance problem you state will probably vanish.
For example, the UI could be 2 steps instead of 170 to get to the restaurants in Zimbabwe. Step 1, pick a country. (OK, that might be page 5 of the countries.) Step 2, view the list of restaurants in that country; it would be only a few pages to flip through. Much better for the user; much better for the database.
Addenda
In order to optimize the pagination, get the paginated list of pages from a single table (so that you can 'remember where you left off'). Then join the language table(s) to look up the translations. Note that this only looks up on page's worth of translations, not thousands.
My query is running very long. I tried to improve the query performance by adding index on website_id in hh_hits table.
Query:
SELECT
w.state, xx.pretty_name, xx.member_id,
SUM(ce.total_charge) as total_charge,
SUM(ce.shipping_cost) as shipping_cost,
SUM(ce.product_price * product_count) as product_price,
SUM(ce.tax) as tax,
SUM(ce.service_charge) as service_charge,
COUNT(distinct ce.order_id) as order_count,
SUM(ce.product_count) as product_count,
SUM( (select sum(addon_price*addon_count)
from wow.cart_entry_addons_archive cean
where cean.cart_entry_id=ce.cart_entry_id
and cean.addon_type='xx'
group by ce.cart_entry_id)) as giftwrap_total,
sum( (select sum(addon_price*addon_count)
from wow.cart_entry_addons_archive cean2
where cean2.cart_entry_id=ce.cart_entry_id
and cean2.addon_type='xx'
group by ce.cart_entry_id)) as addon_total,
(select sum(number) as hits
from wow.hh_hits thts
where thts.website_id=xx.website_id
and thts.start_date >= 'xxx'
and thts.start_date <= 'xx') as visits
FROM
wow.carts_archive c,
wow.cart_entries_archive ce,
eoe.websites xx
WHERE
ce.order_date >= 'xx' and
ce.order_date <= 'xx' and
ce.website_id=xx.website_id and
lower(ce.status) != 'deleted' and
ce.order_status != 'cancelled' and
ce.cart_id = c.cart_id and
(c.cc_number <> '343334' or c.cc_number is null)
GROUP BY ce.website_id
ORDER BY ce.website_id;
Explain plan:
+----+--------------------+-------+--------+------------------+----------+---------+---------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+------------------+----------+---------+---------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | ce | range | id1_nn,idx_726 | idx_1049 | 9 | NULL | 33 | Using where; Using filesort |
| 1 | PRIMARY | w | ref | idx_1055 | idx_1055 | 5 | wow.ce.website_id | 1 | Using where |
| 1 | PRIMARY | c | eq_ref | PRIMARY | PRIMARY | 4 | eoe.ce.cart_id | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | thts | ALL | hh_n1 | NULL | NULL | NULL | 24493 | Using where |
| 3 | DEPENDENT SUBQUERY | cean2 | ref | idx_1383 | idx_1383 | 4 | wow.ce.cart_entry_id | 1 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | cean | ref | idx_1383 | idx_1383 | 4 | wow.ce.cart_entry_id | 1 | Using where; Using temporary; Using filesort |
Query explain plan seems hh_hits table is not using indexes.
On hh_hits, add this compound index:
INDEX(website_id, start_date)
For further discussion, please provide SHOW CREATE TABLE for each of the tables.
If my suggestion does not help enough, then we should talk about building a Summary Table that contains the subtotals for each website-date pair. You would augment the table each night, then run the dependent subquery against it.
I have a query which is very slow in INNER JOIN condition, but is faster when used in WHERE IN clause:
Slower Inner join:
SELECT *
FROM cases
left join
(
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0 group by tst.team_set_id
) cases_tf
ON cases_tf.team_set_id = cases.team_set_id
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
ORDER BY cases.name LIMIT 0,20;
Faster where in:
SELECT *
FROM cases
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
and cases.team_set_id in (
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0
group by tst.team_set_id
)
ORDER BY cases.name LIMIT 0,20;
The explain plan for INNER JOIN and WHERE IN clause are below:
Inner Join:
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | cases | ref | idx_cases_tmst_id | idx_cases_tmst_id | 109 | cases_tf.team_set_id | 446976 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DERIVED | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | | 2 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
While in condition:
------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | cases | index | NULL | idx_case_name | 768 | NULL | 20 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DEPENDENT SUBQUERY | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | const | 2 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
Though there are indexes, I couldn't figure out what the problem. Please help me out. Thanks.
(This is a query in sugarcrm)
It is impossible to convert IN condition to INNER JOIN.
A query with IN condition is called semi-join.
A semi-join returns rows from one table that would join with another table, but without performing a complete join.
Here is a simple example of a semi-join query using IN operator:
SELECT *
FROM table1
WHERE some-column IN (
SELECT some-other-column
FROM table2
WHERE some-conditions
)
the above semi-join can be converted to semantically equivalent query
(eqivalent - means giving exactly same results)
using EXISTS operator and dependend subquery:
SELECT *
FROM table1
WHERE EXISTS(
SELECT 1
FROM table2
WHERE some-conditions
AND table1.some-column = table2.some-other-column
)
Most leading databases use the same plan for both of the above queries, and their speed is the same,
unfortunately this is not always true for MySql.
Joins and semi joins are totally different queries, with completely different execution plans, so comparing their speed is like comparing apples to onions.
You can try to convert the first query with IN into a query with EXIST, but not into the join.
I have been having issues with MySQL (version 5.5) left join performance on a number of queries. In all cases I have been able to work around the issue by restructuring the queries with unions and subselects (I saw some examples of this in the book High Performance MySQL). The problem is this this leads to very messy queries.
Below is an example of two queries that produce the exact same results. The first query is roughly two orders of magnitude slower than the second. The second query is much less readable than the first.
As far as I can tell these sorts of queries are not performing poorly because of bad indexing. In all cases when I restructure the query it runs just fine. I have also tried carefully looking at the indexes and using hints to no avail.
Has anyone else run into similar issues with MySQL? Are there any server parameters I should try tweaking? Has anyone found a cleaner way to work around this sort of issue?
Query 1
select
i.id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from items i
left join (vendor_products vp, purchase_order_lines pol, purchase_orders po) on
vp.item_id = i.id and
pol.vendor_product_id = vp.id and
pol.purchase_order_id = po.id and
po.received_at is null and
po.closed_at is null
group by i.id
explain:
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| 1 | SIMPLE | i | index | NULL | PRIMARY | 4 | NULL | 241 | Using index |
| 1 | SIMPLE | po | ref | PRIMARY,received_at,closed_at | received_at | 9 | const | 2 | |
| 1 | SIMPLE | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 1 | SIMPLE | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
Query 2
select
i.id,
sum(on_order.measurement_on_order) measurement_on_order
from (
(
select
i.id item_id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from purchase_orders po
join purchase_order_lines pol on pol.purchase_order_id = po.id
join vendor_products vp on pol.vendor_product_id = vp.id
join items i on vp.item_id = i.id
where
po.received_at is null and po.closed_at is null
group by i.id
)
union all
(select id, 0 from items)
) on_order
join items i on on_order.item_id = i.id
group by i.id
explain:
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3793 | Using temporary; Using filesort |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 4 | on_order.item_id | 1 | Using index |
| 2 | DERIVED | po | ALL | PRIMARY,received_at,closed_at | NULL | NULL | NULL | 20 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 2 | DERIVED | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 4 | nutkernel_dev.vp.item_id | 1 | Using index |
| 3 | UNION | items | index | NULL | index_new_items_on_external_id | 257 | NULL | 3380 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+