This MySQL query doesn't use my indexes - mysql

The following query get a lists of computer groups. For each group it sums how many computers have state=1 and how many have state=2
SELECT
cg.id
cg.order
cg.group_mode
cg.created
cg.updated
SUM(
CASE WHEN c.state = 1 THEN 1 ELSE 0 END
) AS sclr7,
SUM(
CASE WHEN c.state = 2 THEN 1 ELSE 0 END
) AS sclr8,
FROM
computer_group cg
LEFT JOIN computer c ON cg.id = c.group_id
WHERE
cg.group_mode <> 3
GROUP BY
cg.id
ORDER BY
cg.order ASC;
When i run EXPLAIN on this query, mysql returns: Using where; Using temporary; Using filesort
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | cg | ALL | mode | NULL | NULL | NULL | 33 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c | ref | IDX_39B3258BBE45D62E | IDX_39B3258BBE45D62E | 768 | test.c.id | 57 | 100.00 | |
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
I have the following indexes:
computer_group table
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| computer_group | 0 | PRIMARY | 1 | id | A | 33 | NULL | NULL | | BTREE | |
| computer_group | 1 | mode | 1 | mode | A | 3 | NULL | NULL | | BTREE | |
| computer_group | 1 | order | 1 | order | A | 33 | NULL | NULL | | BTREE | |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
(something went wrong with copy/paste the computer_group, it is now fixed)
computer table
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| computer | 0 | PRIMARY | 1 | id | A | 2611 | NULL | NULL | | BTREE | |
| computer | 1 | IDX_39B3258BBE45D62E | 1 | group_id | A | 32 | NULL | NULL | YES | BTREE | |
| computer | 1 | state | 1 | state | A | 1 | NULL | NULL | | BTREE | |
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
I tried to add various indexes but it seems i cant get to prevent the filesort and the temporary table.
This is driving me nuts. I have spend several days trying to fixing this. Am i doing something wrong or is this not preventable?

You should be able to get it to use indexes if you eliminate the outer group by:
SELECT cg.*, c.sclr7, c.sclr8
FROM computer_group cg JOIN
(SELECT c.group_id, SUM(c.state = 1) as sclr7, SUM(c.state = 1) on sclr8
FROM computer c
) c
ON cg.id = c.group_id
WHERE cg.group_mode <> 3
ORDER BY cg.order ASC;
MySQL might use an index on computer_group(order, group_mode) for the query.
The join might actually confuse MySQL. A surer query is this:
SELECT cg.*,
(SELECT SUM(c.state = 1)
FROM computer c
WHERE cg.id = c.group_id
) as sclr7,
(SELECT SUM(c.state = 2)
FROM computer c
WHERE cg.id = c.group_id
) as sclr8
FROM computer_group cg
ON cg.id = c.group_id
WHERE cg.group_mode <> 3
ORDER BY cg.order ASC;
You want an index on computer_group(order, group_mode, id) and computer(group_id, state).

Related

Joined query index not applied

I have a query:
SELECT `dsd_prefix`,
`dsd_partner`,
`eev1`.`eev_dse_element_name`,
`devd_explanation`,
`devd_min`,
`eev1`.`eev_dev_value`,
`devd_max`,
`devd_format`,
`devd_not_applicable`,
`devd_not_available`,
`dsd_nid`
FROM `devdescription`
INNER JOIN ekohubelementvalue AS `eev1`
ON `eev1`.`eev_dse_element_name` = `devd_element_name`
AND `eev1`.`eev_prefix` = `devd_prefix`
LEFT JOIN `ekohubelementvalue` AS `eev2`
ON `eev1`.`eev_prefix` = `eev2`.`eev_prefix`
AND `eev1`.`eev_dse_element_name` = `eev2`.`eev_dse_element_name`
AND `eev1`.`eev_subcategory` = `eev2`.`eev_subcategory`
AND `eev1`.`eev_company_id` = `eev2`.`eev_company_id`
AND `eev2`.`eev_date_updated` > `eev1`.`eev_date_updated`
INNER JOIN `datasourcedescription`
ON `eev1`.`eev_prefix` = `dsd_prefix`
WHERE (`eev1`.`eev_company_id` = 'ADD4027'
AND `eev2`.`eev_date_updated` IS NULL
AND `dsd_type_id` != 'MAJ'
AND `dsd_hide` = 'No'
AND (`devd_supress` IS NULL OR `devd_supress` <> 'Yes'))
GROUP BY `eev1`.`eev_dse_element_name`, `eev1`.`eev_prefix`
ORDER BY dsd_prefix
EXPLAIN of this query:
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | datasourcedescription | NULL | ALL | PRIMARY,datasourcedescription_dsd_type_id | NULL | NULL | NULL | 688 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | eev1 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date_for_company,sources_for_special_issue | earliest_and_latest | 47 | csrhub_data_1.datasourcedescription.dsd_prefix | 607 | 0.04 | Using where |
| 1 | SIMPLE | devdescription | NULL | ref | reports,supress,devd_element_name | reports | 816 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name | 1 | 50.00 | Using where |
| 1 | SIMPLE | eev2 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date,slice_by_date_for_company,sources_for_special_issue | eev_prefix | 861 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name,csrhub_data_1.eev1.eev_company_id | 17 | 19.00 | Using where |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
As you can see the datasourcedescription indexes are not being used though they exist in posible_keys. The key column is NULL.
SHOW INDEXES FROM datasourcedescription;
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| datasourcedescription | 0 | PRIMARY | 1 | dsd_prefix | A | 688 | NULL | NULL | | BTREE | | | YES | NULL |
| datasourcedescription | 1 | datasourcedescription_dsd_type_id | 1 | dsd_type_id | A | 8 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
How to make the optimizer utilize datasourcedescription indexes?
As response to #O. Jones:
The datasourcedescription columns are dsd_prefix, dsd_type_id and dsd_hide
The table datasourcedescription has 727 rows.
The table ekohubelementvalue has nearly 300,000,000 (300M) rows
You metion the ekohubelementvalue has nearly 3M rows. Your where clause was based on a specific company ID. I would rewrite the query slightly, but also, ensure the ekohubelementvalue table has an index with the company id in the primary position and other columns to help cover the join/wehre criteria where possible. Also with MySQL, I would add the "STRAIGHT_JOIN" keyword to tell MySQL to query in the order you provided
vs it guessing which order.
I would have the following indexes available
ekohubelementvalue index on ( eev_company_id, eev_prefix, eev_dse_element_name, eev_subcategory, eev_date_updated )
devdescription index on ( devd_element_name, devd_prefix, devd_supress )
datasourcedescription index on ( dsd_prefix, dsd_type_id, dsd_hide )
Since the order was by the dsd_prefix, but that was joined by the eev_prefix, use the eev_prefix from the primary table which already has optimized index component, let the primary table (not the lookups) be the basis of the group/order.
I also cleaned-up the query some. Easier to give aliases to long table names so you can use the alias for qualifying each column in the query and respective joins.
SELECT STRAIGHT_JOIN
dsd.dsd_prefix,
dsd.dsd_partner,
eev1.eev_dse_element_name,
devd.devd_explanation,
devd.devd_min,
eev1.eev_dev_value,
devd.devd_max,
devd.devd_format,
devd.devd_not_applicable,
devd.devd_not_available,
dsd.dsd_nid
FROM
ekohubelementvalue AS eev1
INNER JOIN devdescription devd
ON eev1.eev_prefix = devd.devd_prefix
AND eev1.eev_dse_element_name = devd.devd_element_name
LEFT JOIN ekohubelementvalue AS eev2
ON eev1.eev_company_id = eev2.eev_company_id
AND eev1.eev_prefix = eev2.eev_prefix
AND eev1.eev_dse_element_name = eev2.eev_dse_element_name
AND eev1.eev_subcategory = eev2.eev_subcategory
AND eev1.eev_date_updated < eev2.eev_date_updated
INNER JOIN datasourcedescription dsd
ON eev1.eev_prefix = dsd.dsd_prefix
AND dsd.dsd_type_id != 'MAJ'
AND dsd.dsd_hide = 'No'
WHERE
eev1.eev_company_id = 'ADD4027'
AND ( devd.devd_supress IS NULL
OR devd.devd_supress <> 'Yes')
AND eev2.eev_date_updated IS NULL
GROUP BY
eev1.eev_prefix,
eev1.eev_dse_element_name
ORDER BY
eev1.eev_prefix

Adding SUM's to SQL query makes it last 7 minutes instead of 3 seconds

I have the following SQL query (generated by Doctrine ORM):
SELECT
DISTINCT
s0_.id AS id0,
SUM(
s1_.price * s1_.amount * (1 + s1_.tax + s1_.retax)
) AS sclr1,
s0_.id AS id2
FROM
fr_order s0_
INNER JOIN fr_store s2_ ON s0_.store_id = s2_.id
LEFT JOIN fr_orderline s1_ ON s0_.id = s1_.order_id AND (s1_.rejected = 0)
LEFT JOIN fr_order_provider_warn s3_ ON s0_.id = s3_.order_id
WHERE
s0_.state >= 3
GROUP BY
s0_.id,
s0_.date,
s0_.shipment_limit_date,
s0_.state,
s0_.state_changed_date,
s0_.received,
s0_.shipment_cost,
s0_.username,
s0_.notes,
s0_.user_id,
s0_.store_id,
s0_.storedata_id,
s3_.id,
s3_.createdDate,
s3_.comments,
s3_.order_id
ORDER BY
s0_.id DESC
LIMIT
10 OFFSET 0
It takes approximately 3 seconds to run in an 18000 rows table (fr_order). I need to get a couple more of summed values so I modified the DQL and Doctrine added the following lines to the SELECT, after the first SUM:
SUM(s1_.price * s1_.amount) AS sclr2,
SUM(s1_.price * s1_.amount * s1_.tax) AS sclr3,
SUM(s1_.price * s1_.amount * s1_.retax) AS sclr4,
Now, the query takes 7 minutes, so the application becomes unusable. Is this performance drop normal? I'm using MySQL 5 as the database server.
EDIT
I have ran an EXPLAIN on both queries. The result is the same:
+----+-------------+-------+--------+----------------------+----------------------+---------+----------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------+----------------------+---------+----------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | s0_ | ALL | IDX_F4A5D9B092A811 | NULL | NULL | NULL | 16823 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | s2_ | eq_ref | PRIMARY | PRIMARY | 4 | companydb_new.s0_.store_id | 1 | Using index |
| 1 | SIMPLE | s1_ | ref | IDX_252BF9D78D9F6D38 | IDX_252BF9D78D9F6D38 | 5 | companydb_new.s0_.id | 3 | |
| 1 | SIMPLE | s3_ | ref | IDX_20FC41F28D9F6D38 | IDX_20FC41F28D9F6D38 | 5 | companydb_new.s0_.id | 1 | |
+----+-------------+-------+--------+----------------------+----------------------+---------+----------------------------+-------+----------------------------------------------+
And this are the indexes for the biggest tables, fr_oder (s0_) and fr_orderline (s1_):
mysql> show indexes from fr_order;
+----------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| fr_order | 0 | PRIMARY | 1 | id | A | 14986 | NULL | NULL | | BTREE | | |
| fr_order | 1 | IDX_F4A5D9B092A811 | 1 | store_id | A | 71 | NULL | NULL | YES | BTREE | | |
| fr_order | 1 | IDX_F4A5D9AAD1D029 | 1 | storedata_id | A | 405 | NULL | NULL | YES | BTREE | | |
| fr_order | 1 | IDX_F4A5D9A76ED395 | 1 | user_id | A | 86 | NULL | NULL | YES | BTREE | | |
+----------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
mysql> show indexes from fr_orderline;
+--------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| fr_orderline | 0 | PRIMARY | 1 | id | A | 114799 | NULL | NULL | | BTREE | | |
| fr_orderline | 1 | IDX_252BF9D7A53A8AA | 1 | provider_id | A | 88 | NULL | NULL | YES | BTREE | | |
| fr_orderline | 1 | IDX_252BF9D78D9F6D38 | 1 | order_id | A | 28699 | NULL | NULL | YES | BTREE | | |
| fr_orderline | 1 | IDX_252BF9D72989F1FD | 1 | invoice_id | A | 28699 | NULL | NULL | YES | BTREE | | |
+--------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
From the EXPLAIN output, it seems that MySQL is not using s0_ index... I've tried regenerating all the tables indexes but the result is the same.
Thanks!
Assuming s0_.id and s3_.id are the primary key for the table s0_ and s3_
SELECT
DISTINCT s0_.id AS id0,
SUM( s1_.price * s1_.amount * (1 + s1_.tax + s1_.retax) ) AS sclr1,
SUM( s1_.price * s1_.amount) AS sclr2,
SUM( s1_.price * s1_.amount * s1_.tax) AS sclr3,
SUM( s1_.price * s1_.amount * s1_.retax) AS sclr4,
s0_.id AS id2
FROM fr_order s0_
INNER JOIN fr_store s2_ ON s0_.store_id = s2_.id
LEFT JOIN fr_orderline s1_ ON s0_.id = s1_.order_id
AND (s1_.rejected = 0)
LEFT JOIN fr_order_provider_warn s3_ ON s0_.id = s3_.order_id
WHERE s0_.state >= 3
GROUP BY s0_.id, s3_.id,
ORDER BY s0_.id DESC
LIMIT 10 OFFSET 0
You don't need distinct for grouped values and you don't need others column of a table if you group by for a primary key of this table
be sure you hava proper index on the involved table .. for this could be use some composite indexes
for table s1_ a composite index on ( order_id,rejected price, amount, tax, retax)
for table s0_ a composite index on ( state, store_id, id)
for table s3 _ a index on ( order_id )

MAX(ID) with IN() comparison function performance

I have stumbled upon a performance issue with this query. I've stared at this problem for a long time now scratching my head. This query was actually pretty fast at one point, but once data grew, it became slower and slower. The 'Posts' table has +5 million rows, the 'Items' table has +6000 rows. These tables are growing constantly on a daily basis.
SELECT Posts.itemID, Items.itemName, Items.itemImage, Items.guid, Posts.price,
Posts.quantity, Posts.date, Games.name, Items.profit FROM Items
INNER JOIN Posts ON Items.itemID=Posts.itemID
INNER JOIN Games ON Posts.gameID=Games.gameID
WHERE Posts.postID IN (SELECT MAX(postID) FROM Posts GROUP BY itemID) AND Posts.gameID=:gameID
AND Posts.price BETWEEN :price_min AND :price_max
AND Posts.quantity BETWEEN :quant_min AND :quant_max
AND Items.profit BETWEEN :profit_min AND :profit_max
ORDER BY Items.profit DESC LIMIT 0, 20
In the code I've split up the query and sub query into two. Together they were performing slower. This was all good and well, until the data in both the Posts and Items started growing. The 'where' statements that I've put in ** get concatenate depending on what filters are set.
Here's the EXPLAIN that I get. (This is the query without the sub query)
https://docs.google.com/file/d/0B1jxMdMfC35VeDBEbnJISmNGb3c/edit?usp=sharing
SHOW INDEX FROM Posts:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Posts | 0 | PRIMARY | 1 | postID | A | 5890249 | NULL | NULL | | BTREE | | |
| Posts | 1 | itemID | 1 | itemID | A | 16453 | NULL | NULL | YES | BTREE | | |
| Posts | 1 | gameID | 1 | gameID | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Items;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Items | 0 | PRIMARY | 1 | itemID | A | 6452 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Games;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Games | 0 | PRIMARY | 1 | gameID | A | 2487 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Is there anyway I can make this query faster? Do you guys have any advice? Is there a better way of writing this query? All help is appreciated.
EXPLAIN Proposed Query:
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19 | Using temporary; Using filesort |
| 1 | PRIMARY | p | eq_ref | PRIMARY,itemID,gameID | PRIMARY | 4 | q.postID | 1 | |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
| 1 | PRIMARY | g | eq_ref | PRIMARY | PRIMARY | 4 | db323245342342345.p.gameID | 1 | Using where |
| 2 | DERIVED | p | ref | itemID,gameID | gameID | 2 | | 2945124 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
Try to rewrite it with JOIN. Something like
SELECT p.itemID,
i.itemName,
i.itemImage,
i.guid,
p.price,
p.quantity,
p.date,
g.name,
i.profit
FROM
(
SELECT MAX(postID) postID
FROM Posts p JOIN Items i
ON p.itemID = i.itemID
WHERE p.gameID = :gameID
AND p.price BETWEEN :price_min AND :price_max
AND p.quantity BETWEEN :quant_min AND :quant_max
AND i.profit BETWEEN :profit_min AND :profit_max
GROUP BY itemID
) q JOIN Posts p
ON q.postID = p.postID JOIN Items i
ON p.itemID = i.itemID JOIN Games g
ON p.gameID = g.gameID
ORDER BY i.profit DESC
LIMIT 0, 20
Not sure if this helps, but try moving the subquery to the end of your where clause and also try making it a correlated subquery. Move the filter on Items to the top.
SELECT
p1.itemID,
Items.itemName,
Items.itemImage,
Items.guid,
p1.price,
p1.quantity,
p1.date,
Games.name,
Items.profit
FROM Items
INNER JOIN Posts p1 ON Items.itemID=p1.itemID
INNER JOIN Games ON p1.gameID=Games.gameID
WHERE Items.profit BETWEEN :profit_min AND :profit_max
AND p1.gameID=:gameID
AND p1.price BETWEEN :price_min AND :price_max
AND p1.quantity BETWEEN :quant_min AND :quant_max
AND p1.postID IN (SELECT MAX(p2.postID) FROM posts p2 WHERE p2.itemID = p1.ItemID GROUP BY p2.itemID)
ORDER BY
Items.profit DESC
LIMIT 0, 20
Also, make sure you create an index on Posts(itemID, gameID, postID)

MySQL left join performance issues

I have been having issues with MySQL (version 5.5) left join performance on a number of queries. In all cases I have been able to work around the issue by restructuring the queries with unions and subselects (I saw some examples of this in the book High Performance MySQL). The problem is this this leads to very messy queries.
Below is an example of two queries that produce the exact same results. The first query is roughly two orders of magnitude slower than the second. The second query is much less readable than the first.
As far as I can tell these sorts of queries are not performing poorly because of bad indexing. In all cases when I restructure the query it runs just fine. I have also tried carefully looking at the indexes and using hints to no avail.
Has anyone else run into similar issues with MySQL? Are there any server parameters I should try tweaking? Has anyone found a cleaner way to work around this sort of issue?
Query 1
select
i.id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from items i
left join (vendor_products vp, purchase_order_lines pol, purchase_orders po) on
vp.item_id = i.id and
pol.vendor_product_id = vp.id and
pol.purchase_order_id = po.id and
po.received_at is null and
po.closed_at is null
group by i.id
explain:
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| 1 | SIMPLE | i | index | NULL | PRIMARY | 4 | NULL | 241 | Using index |
| 1 | SIMPLE | po | ref | PRIMARY,received_at,closed_at | received_at | 9 | const | 2 | |
| 1 | SIMPLE | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 1 | SIMPLE | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
Query 2
select
i.id,
sum(on_order.measurement_on_order) measurement_on_order
from (
(
select
i.id item_id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from purchase_orders po
join purchase_order_lines pol on pol.purchase_order_id = po.id
join vendor_products vp on pol.vendor_product_id = vp.id
join items i on vp.item_id = i.id
where
po.received_at is null and po.closed_at is null
group by i.id
)
union all
(select id, 0 from items)
) on_order
join items i on on_order.item_id = i.id
group by i.id
explain:
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3793 | Using temporary; Using filesort |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 4 | on_order.item_id | 1 | Using index |
| 2 | DERIVED | po | ALL | PRIMARY,received_at,closed_at | NULL | NULL | NULL | 20 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 2 | DERIVED | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 4 | nutkernel_dev.vp.item_id | 1 | Using index |
| 3 | UNION | items | index | NULL | index_new_items_on_external_id | 257 | NULL | 3380 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+

SQL Query Optimization When using Multiple Joins and Large Record Set

I am making a message board and I trying to retrieve regular topics (ie, topics that are not stickied) and sort them by the date of the last posted message. I am able to accomplish this however when I have about 10,000 messages and 1500 topics the query time is >60 seconds.
My question is, is there anything I can do to my query to increase performance or is my design fundamentally flawed?
Here is the query that I am using.
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages
GROUP BY topic_id)
AND Messages.topic_id
NOT IN (
SELECT topic_id
FROM StickiedTopics)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
Edit Here is the Explain Plan
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| 1 | PRIMARY | Topics | ref | PRIMARY,board_id | board_id | 4 | const | 641 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Users | eq_ref | PRIMARY | PRIMARY | 4 | spergs3.Topics.user_id | 1 | |
| 1 | PRIMARY | Messages | ref | topic_id | topic_id | 4 | spergs3.Topics.topic_id | 3 | Using where |
| 4 | DEPENDENT SUBQUERY | Messages | index | NULL | PRIMARY | 8 | NULL | 1 | |
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id | topic_id | 4 | func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | Messages | index | NULL | topic_id | 4 | NULL | 3 | Using index |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
Indexes
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Messages | 0 | PRIMARY | 1 | message_id | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 0 | PRIMARY | 2 | revision_no | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 1 | user_id | 1 | user_id | A | 432 | NULL | NULL | | BTREE | |
| Messages | 1 | topic_id | 1 | topic_id | A | 3318 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Topics | 0 | PRIMARY | 1 | topic_id | A | 1205 | NULL | NULL | | BTREE | |
| Topics | 1 | user_id | 1 | user_id | A | 133 | NULL | NULL | | BTREE | |
| Topics | 1 | board_id | 1 | board_id | A | 1 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Users | 0 | PRIMARY | 1 | user_id | A | 2051 | NULL | NULL | | BTREE | |
| Users | 0 | username_UNIQUE | 1 | username | A | 2051 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
I would start with the first basis of qualified topics, get those IDs, then join out after.
My inner first query does a pre-qualify grouped by topic_id and max message to just get distinct IDs pre-qualified. I've also applied a LEFT JOIN to the stickiesTopics too. Why? By doing a left-join, I can look for those that are FOUND (those you want to exclude). So I've applied a WHERE clause for Stickies topic ID is NULL (ie: NOT found). So by doing this, we've ALREADY paired down the list SIGNIFICANTLY without doing several nested sub-queries. From THAT result, we can join to the messages, topics (including qualifier of board_id = 1), users and get parts as needed. Finally, apply a single WHERE IN sub-select for your MIN(posted) qualifier. Don't understand the basis of that, but left it in as part of your original query. Then the order by and limit.
SELECT STRAIGHT_JOIN
M.topic_id,
M.posted,
T.title,
T.user_id,
U.username
FROM
( select
M1.Topic_ID,
MAX( M1.Message_id ) MaxMsgPerTopic
from
Messages M1
LEFT Join StickiedTopics ST
ON M1.Topic_ID = ST.Topic_ID
where
ST.Topic_ID IS NULL
group by
M1.Topic_ID ) PreQuery
JOIN Messages M
ON PreQuery.MaxMsgPerTopic = M.Message_ID
JOIN Topics T
ON M.Topic_ID = T.Topic_ID
AND T.Board_ID = 1
LEFT JOIN Users U
on T.User_ID = U.user_id
WHERE
M.posted IN ( SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
ORDER BY
M.posted DESC
LIMIT 50
I would guess that a big part of your problem lies in your subqueries. Try something like this:
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id
AND StickedTopics.topic_id IS NULL
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages m1
WHERE m1.topic_id = Messages.topic_id)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
I optimized the first subquery by removing the grouping. The second subquery was unnecessary because it can be replaced with a JOIN.
I'm not quite sure what this third subquery is supposed to do:
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
I might be able to help optimize this if I know what it's supposed to do. What exactly is posted - a date, integer, etc? What does it represent?