MAX(ID) with IN() comparison function performance - mysql

I have stumbled upon a performance issue with this query. I've stared at this problem for a long time now scratching my head. This query was actually pretty fast at one point, but once data grew, it became slower and slower. The 'Posts' table has +5 million rows, the 'Items' table has +6000 rows. These tables are growing constantly on a daily basis.
SELECT Posts.itemID, Items.itemName, Items.itemImage, Items.guid, Posts.price,
Posts.quantity, Posts.date, Games.name, Items.profit FROM Items
INNER JOIN Posts ON Items.itemID=Posts.itemID
INNER JOIN Games ON Posts.gameID=Games.gameID
WHERE Posts.postID IN (SELECT MAX(postID) FROM Posts GROUP BY itemID) AND Posts.gameID=:gameID
AND Posts.price BETWEEN :price_min AND :price_max
AND Posts.quantity BETWEEN :quant_min AND :quant_max
AND Items.profit BETWEEN :profit_min AND :profit_max
ORDER BY Items.profit DESC LIMIT 0, 20
In the code I've split up the query and sub query into two. Together they were performing slower. This was all good and well, until the data in both the Posts and Items started growing. The 'where' statements that I've put in ** get concatenate depending on what filters are set.
Here's the EXPLAIN that I get. (This is the query without the sub query)
https://docs.google.com/file/d/0B1jxMdMfC35VeDBEbnJISmNGb3c/edit?usp=sharing
SHOW INDEX FROM Posts:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Posts | 0 | PRIMARY | 1 | postID | A | 5890249 | NULL | NULL | | BTREE | | |
| Posts | 1 | itemID | 1 | itemID | A | 16453 | NULL | NULL | YES | BTREE | | |
| Posts | 1 | gameID | 1 | gameID | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Items;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Items | 0 | PRIMARY | 1 | itemID | A | 6452 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Games;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Games | 0 | PRIMARY | 1 | gameID | A | 2487 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Is there anyway I can make this query faster? Do you guys have any advice? Is there a better way of writing this query? All help is appreciated.
EXPLAIN Proposed Query:
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19 | Using temporary; Using filesort |
| 1 | PRIMARY | p | eq_ref | PRIMARY,itemID,gameID | PRIMARY | 4 | q.postID | 1 | |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
| 1 | PRIMARY | g | eq_ref | PRIMARY | PRIMARY | 4 | db323245342342345.p.gameID | 1 | Using where |
| 2 | DERIVED | p | ref | itemID,gameID | gameID | 2 | | 2945124 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+

Try to rewrite it with JOIN. Something like
SELECT p.itemID,
i.itemName,
i.itemImage,
i.guid,
p.price,
p.quantity,
p.date,
g.name,
i.profit
FROM
(
SELECT MAX(postID) postID
FROM Posts p JOIN Items i
ON p.itemID = i.itemID
WHERE p.gameID = :gameID
AND p.price BETWEEN :price_min AND :price_max
AND p.quantity BETWEEN :quant_min AND :quant_max
AND i.profit BETWEEN :profit_min AND :profit_max
GROUP BY itemID
) q JOIN Posts p
ON q.postID = p.postID JOIN Items i
ON p.itemID = i.itemID JOIN Games g
ON p.gameID = g.gameID
ORDER BY i.profit DESC
LIMIT 0, 20

Not sure if this helps, but try moving the subquery to the end of your where clause and also try making it a correlated subquery. Move the filter on Items to the top.
SELECT
p1.itemID,
Items.itemName,
Items.itemImage,
Items.guid,
p1.price,
p1.quantity,
p1.date,
Games.name,
Items.profit
FROM Items
INNER JOIN Posts p1 ON Items.itemID=p1.itemID
INNER JOIN Games ON p1.gameID=Games.gameID
WHERE Items.profit BETWEEN :profit_min AND :profit_max
AND p1.gameID=:gameID
AND p1.price BETWEEN :price_min AND :price_max
AND p1.quantity BETWEEN :quant_min AND :quant_max
AND p1.postID IN (SELECT MAX(p2.postID) FROM posts p2 WHERE p2.itemID = p1.ItemID GROUP BY p2.itemID)
ORDER BY
Items.profit DESC
LIMIT 0, 20
Also, make sure you create an index on Posts(itemID, gameID, postID)

Related

Joined query index not applied

I have a query:
SELECT `dsd_prefix`,
`dsd_partner`,
`eev1`.`eev_dse_element_name`,
`devd_explanation`,
`devd_min`,
`eev1`.`eev_dev_value`,
`devd_max`,
`devd_format`,
`devd_not_applicable`,
`devd_not_available`,
`dsd_nid`
FROM `devdescription`
INNER JOIN ekohubelementvalue AS `eev1`
ON `eev1`.`eev_dse_element_name` = `devd_element_name`
AND `eev1`.`eev_prefix` = `devd_prefix`
LEFT JOIN `ekohubelementvalue` AS `eev2`
ON `eev1`.`eev_prefix` = `eev2`.`eev_prefix`
AND `eev1`.`eev_dse_element_name` = `eev2`.`eev_dse_element_name`
AND `eev1`.`eev_subcategory` = `eev2`.`eev_subcategory`
AND `eev1`.`eev_company_id` = `eev2`.`eev_company_id`
AND `eev2`.`eev_date_updated` > `eev1`.`eev_date_updated`
INNER JOIN `datasourcedescription`
ON `eev1`.`eev_prefix` = `dsd_prefix`
WHERE (`eev1`.`eev_company_id` = 'ADD4027'
AND `eev2`.`eev_date_updated` IS NULL
AND `dsd_type_id` != 'MAJ'
AND `dsd_hide` = 'No'
AND (`devd_supress` IS NULL OR `devd_supress` <> 'Yes'))
GROUP BY `eev1`.`eev_dse_element_name`, `eev1`.`eev_prefix`
ORDER BY dsd_prefix
EXPLAIN of this query:
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | datasourcedescription | NULL | ALL | PRIMARY,datasourcedescription_dsd_type_id | NULL | NULL | NULL | 688 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | eev1 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date_for_company,sources_for_special_issue | earliest_and_latest | 47 | csrhub_data_1.datasourcedescription.dsd_prefix | 607 | 0.04 | Using where |
| 1 | SIMPLE | devdescription | NULL | ref | reports,supress,devd_element_name | reports | 816 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name | 1 | 50.00 | Using where |
| 1 | SIMPLE | eev2 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date,slice_by_date_for_company,sources_for_special_issue | eev_prefix | 861 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name,csrhub_data_1.eev1.eev_company_id | 17 | 19.00 | Using where |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
As you can see the datasourcedescription indexes are not being used though they exist in posible_keys. The key column is NULL.
SHOW INDEXES FROM datasourcedescription;
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| datasourcedescription | 0 | PRIMARY | 1 | dsd_prefix | A | 688 | NULL | NULL | | BTREE | | | YES | NULL |
| datasourcedescription | 1 | datasourcedescription_dsd_type_id | 1 | dsd_type_id | A | 8 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
How to make the optimizer utilize datasourcedescription indexes?
As response to #O. Jones:
The datasourcedescription columns are dsd_prefix, dsd_type_id and dsd_hide
The table datasourcedescription has 727 rows.
The table ekohubelementvalue has nearly 300,000,000 (300M) rows
You metion the ekohubelementvalue has nearly 3M rows. Your where clause was based on a specific company ID. I would rewrite the query slightly, but also, ensure the ekohubelementvalue table has an index with the company id in the primary position and other columns to help cover the join/wehre criteria where possible. Also with MySQL, I would add the "STRAIGHT_JOIN" keyword to tell MySQL to query in the order you provided
vs it guessing which order.
I would have the following indexes available
ekohubelementvalue index on ( eev_company_id, eev_prefix, eev_dse_element_name, eev_subcategory, eev_date_updated )
devdescription index on ( devd_element_name, devd_prefix, devd_supress )
datasourcedescription index on ( dsd_prefix, dsd_type_id, dsd_hide )
Since the order was by the dsd_prefix, but that was joined by the eev_prefix, use the eev_prefix from the primary table which already has optimized index component, let the primary table (not the lookups) be the basis of the group/order.
I also cleaned-up the query some. Easier to give aliases to long table names so you can use the alias for qualifying each column in the query and respective joins.
SELECT STRAIGHT_JOIN
dsd.dsd_prefix,
dsd.dsd_partner,
eev1.eev_dse_element_name,
devd.devd_explanation,
devd.devd_min,
eev1.eev_dev_value,
devd.devd_max,
devd.devd_format,
devd.devd_not_applicable,
devd.devd_not_available,
dsd.dsd_nid
FROM
ekohubelementvalue AS eev1
INNER JOIN devdescription devd
ON eev1.eev_prefix = devd.devd_prefix
AND eev1.eev_dse_element_name = devd.devd_element_name
LEFT JOIN ekohubelementvalue AS eev2
ON eev1.eev_company_id = eev2.eev_company_id
AND eev1.eev_prefix = eev2.eev_prefix
AND eev1.eev_dse_element_name = eev2.eev_dse_element_name
AND eev1.eev_subcategory = eev2.eev_subcategory
AND eev1.eev_date_updated < eev2.eev_date_updated
INNER JOIN datasourcedescription dsd
ON eev1.eev_prefix = dsd.dsd_prefix
AND dsd.dsd_type_id != 'MAJ'
AND dsd.dsd_hide = 'No'
WHERE
eev1.eev_company_id = 'ADD4027'
AND ( devd.devd_supress IS NULL
OR devd.devd_supress <> 'Yes')
AND eev2.eev_date_updated IS NULL
GROUP BY
eev1.eev_prefix,
eev1.eev_dse_element_name
ORDER BY
eev1.eev_prefix

Slow query to count tags

This query counts videos in relation to tags (top 50). It runs very slow (video table around 800k records). I have set all the appropriate indexes/keys.
SELECT `tags`.`id_tag`, `tags`.`tag_text`, COUNT(video_tags`.`id_video`) AS `total_video_count`
FROM `tags`
INNER JOIN `video_tags` ON ( `tags`.`id_tag` = `video_tags`.`id_tag` )
INNER JOIN `videos` ON ( `video_tags`.`id_video` = `videos`.`id_video` )
GROUP BY `tags`.`id_tag`
ORDER BY `total_video_count` DESC
LIMIT 50;
Any ideas of what could be contributing to the poor performance or any alternative ways to structure query?
----Update ----
+--------+------------+------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| videos | 0 | PRIMARY | 1 | id_video | A | 812967 | NULL | NULL | | BTREE | | |
+--------+------------+------------------------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+------------+------------+---------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+---------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| video_tags | 0 | PRIMARY | 1 | id_video_tag | A | 4113266 | NULL | NULL | | BTREE | | |
| video_tags | 1 | video_tags_id_tag_7e0eba6ebf2ab1be_fk_tags_id_tag | 1 | id_tag | A | 10852 | NULL | NULL | | BTREE | | |
| video_tags | 1 | video_tags_id_video_6fa83a06b3a6ec45_fk_videos_id_video | 1 | id_video | A | 1371088 | NULL | NULL | | BTREE | | |
+------------+------------+---------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tags | 0 | PRIMARY | 1 | id_tag | A | 35186 | NULL | NULL | | BTREE | | |
| tags | 0 | tag_text | 1 | tag_text | A | 35186 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Above last comment is correct ..
add an index on video_tags (id_tag, id_video). What should happen is the first left join should kick that new index into action. and it will pull the id_video into memory and get the second left join predicate in memory..
What I believe is happening is that you are making another read to the video_tags table to pull in the id_video column.. so thats a non mem hit to the table.. then if the physical layout of the rows in video_tags does not match the sequence of the PRIMARY KEY (What its probably using).. You end up thrashing IO
I would try adding the composite key composite unique key on video_tags (id_tag, id_video) and give it another shot.
SELECT
`tags`.`id_tag`,
`tags`.`tag_text`,
`ranking`.`total_video_count` as `total_video_count`
FROM tags
INNER JOIN (
SELECT
`video_tags`.`id_tag`,
count(*) as total_video_count
FROM `video_tags`
GROUP BY `video_tags`.`id_tag`
ORDER BY `total_video_count` DESC
LIMIT 50
) ranking ON ( `tags`.`id_tag` = `ranking`.`id_tag` )
There could be some possible typos. Anyway, you get the basic idea. It should increase performance in case you were joining two large tables.

This MySQL query doesn't use my indexes

The following query get a lists of computer groups. For each group it sums how many computers have state=1 and how many have state=2
SELECT
cg.id
cg.order
cg.group_mode
cg.created
cg.updated
SUM(
CASE WHEN c.state = 1 THEN 1 ELSE 0 END
) AS sclr7,
SUM(
CASE WHEN c.state = 2 THEN 1 ELSE 0 END
) AS sclr8,
FROM
computer_group cg
LEFT JOIN computer c ON cg.id = c.group_id
WHERE
cg.group_mode <> 3
GROUP BY
cg.id
ORDER BY
cg.order ASC;
When i run EXPLAIN on this query, mysql returns: Using where; Using temporary; Using filesort
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | cg | ALL | mode | NULL | NULL | NULL | 33 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c | ref | IDX_39B3258BBE45D62E | IDX_39B3258BBE45D62E | 768 | test.c.id | 57 | 100.00 | |
+----+-------------+-------+------+----------------------+----------------------+---------+--------------------+------+----------+----------------------------------------------+
I have the following indexes:
computer_group table
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| computer_group | 0 | PRIMARY | 1 | id | A | 33 | NULL | NULL | | BTREE | |
| computer_group | 1 | mode | 1 | mode | A | 3 | NULL | NULL | | BTREE | |
| computer_group | 1 | order | 1 | order | A | 33 | NULL | NULL | | BTREE | |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
(something went wrong with copy/paste the computer_group, it is now fixed)
computer table
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| computer | 0 | PRIMARY | 1 | id | A | 2611 | NULL | NULL | | BTREE | |
| computer | 1 | IDX_39B3258BBE45D62E | 1 | group_id | A | 32 | NULL | NULL | YES | BTREE | |
| computer | 1 | state | 1 | state | A | 1 | NULL | NULL | | BTREE | |
+--------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
I tried to add various indexes but it seems i cant get to prevent the filesort and the temporary table.
This is driving me nuts. I have spend several days trying to fixing this. Am i doing something wrong or is this not preventable?
You should be able to get it to use indexes if you eliminate the outer group by:
SELECT cg.*, c.sclr7, c.sclr8
FROM computer_group cg JOIN
(SELECT c.group_id, SUM(c.state = 1) as sclr7, SUM(c.state = 1) on sclr8
FROM computer c
) c
ON cg.id = c.group_id
WHERE cg.group_mode <> 3
ORDER BY cg.order ASC;
MySQL might use an index on computer_group(order, group_mode) for the query.
The join might actually confuse MySQL. A surer query is this:
SELECT cg.*,
(SELECT SUM(c.state = 1)
FROM computer c
WHERE cg.id = c.group_id
) as sclr7,
(SELECT SUM(c.state = 2)
FROM computer c
WHERE cg.id = c.group_id
) as sclr8
FROM computer_group cg
ON cg.id = c.group_id
WHERE cg.group_mode <> 3
ORDER BY cg.order ASC;
You want an index on computer_group(order, group_mode, id) and computer(group_id, state).

Query Optimization (WHERE, GROUP BY, LEFT JOINs)

I am using InnoDB.
QUERY, EXPLAIN & INDEXES
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(
DISTINCT classifications2.name SEPARATOR ';'
) AS classifications_name,
GROUP_CONCAT(
DISTINCT images.id
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_id,
GROUP_CONCAT(
DISTINCT images.caption
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_caption,
GROUP_CONCAT(
DISTINCT images.thumbnail
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_thumbnail,
GROUP_CONCAT(
DISTINCT images.medium
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_medium,
GROUP_CONCAT(
DISTINCT images.large
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_large,
GROUP_CONCAT(
DISTINCT users.id
ORDER BY users.id SEPARATOR ';'
) AS authors_id,
GROUP_CONCAT(
DISTINCT users.display_name
ORDER BY users.id SEPARATOR ';'
) AS authors_display_name,
GROUP_CONCAT(
DISTINCT users.url
ORDER BY users.id SEPARATOR ';'
) AS authors_url
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
LEFT JOIN author_story
ON stories.id = author_story.story_id
LEFT JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY classifications.`name`, classifications.`position`
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL |
| 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE |
| stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE |
| stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE |
| stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE |
| classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE |
| classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE |
| comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
QUERY TIMES
It takes on average 0.035 seconds to run.
If I remove only the GROUP BY, the time drops to 0.007 on average.
If I remove only the stories.status=1 filter, the time drops to 0.025 on average. This one seems like it can be easily optimized.
And if I remove only the LIKE filter and ORDER BY clause, the time drops to 0.006 on average.
UPDATE 1: 2013-04-13
My understanding has improved manifold going through the answers.
I added indices to author_story and images_story which seemed improved query to 0.025 seconds but for some strange reason the EXPLAIN plan looks a whole lot better. At this point removing ORDER BY drops query to 0.015 seconds and dropping both ORDER BY and GROUP BY improves query performance to 0.006. I am these are the two things to focus on right now? I may move ORDER BY into app logic if needed.
Here are the revised EXPLAIN and INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | |
| author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | |
| stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | |
| stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | |
| stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | |
| stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | |
| images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | |
| images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | |
| users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | |
| users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id,
GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name,
GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url,
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name,
GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id,
GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption,
GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail,
GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium,
GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
INNER JOIN author_story
ON stories.id = author_story.story_id
INNER JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY NULL
UPDATE 2: 2013-04-14
I noticed one other thing. If I don't SELECT stories.content (LONGTEXT) and stories.content_html (LONGTEXT) the query drops from 0.015 seconds to 0.006 seconds. For now I am considering if I can do without content and content_html or replace them with something else.
I have updated the query, indexes and explain in the 2013-04-13 update above instead of re-posting in this one since they were minor and incremental. The query is still using filesort. I can't get rid of GROUP BY but have gotten rid of ORDER BY.
UPDATE 3: 2013-04-16
As requested, I dropped the stories_id INDEXES from both image_story and author_story as they are redundant. The result was that output of explain only changed to show that the possible_keys changed. It still didn't show Using Index optimization unfortunately.
Also changed LONGTEXT to TEXT and am now fetching LEFT(stories.content, 500) instead of stories.content which is making a very significant difference in query execution time.
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
innodb_buffer_pool_size
134217728
TABLE_NAME INDEX_LENGTH
image_story 10010624
image_story 4556800
image_story 0
TABLE_NAME INDEX_NAMES SIZE
dawn/image_story story_image 13921
I can see two opportunities for optimization right away:
Change an OUTER JOIN to INNER JOIN
Your query is currently scanning 434792 stories, and you should be able to narrow that down better, assuming not every story has a classification matching 'Home:Top%'. It would be better to use an index to find the classifications you're looking for, and then look up the matching stories.
But you're using LEFT OUTER JOIN for classifications, meaning all stories will be scanned whether they have a matching classification or not. Then you're defeating that by putting a condition on classifications in the WHERE clause, effectively making it mandatory that there be a classification matching your pattern with LIKE. So it's no longer an outer join, it's an inner join.
If you put the classifications table first, and make it an inner join, the optimizer will use that to narrow down the search for stories just to those that have a matching classification.
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
The optimizer is supposed to be able to figure out when it's advantageous to re-order tables, so you may not have to change the order in your query. But you do need to use an INNER JOIN in this case.
Add compound indexes
Your intersection tables image_story and author_story don't have compound indexes. It's often a big advantage to add compound indexes to the intersection tables in a many-to-many relationship, so that they can perform the join and get the "Using index" optimization.
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
Re your comments and update:
I'm not sure you created the new indexes correctly. Your dump of the indexes doesn't show the columns, and according to the updated EXPLAIN, the new indexes aren't being used, which I would expect to happen. Using the new indexes would result in "Using index" in the extra field of EXPLAIN, which should help performance.
Output of SHOW CREATE TABLE for each table would be more complete information than a dump of the indexes (without column names) as you have shown.
You may have to run ANALYZE TABLE once on each of those tables after creating the indexes. Also, run the query more than once, to make sure the indexes are in the buffer pool. Is this table InnoDB or MyISAM?
I also notice in your EXPLAIN output that the rows column shows a lot fewer rows being touched. That's an improvement.
Do you really need the ORDER BY? If you use ORDER BY NULL you should be able to get rid of the "Using filesort" and that may improve performance.
Re your update:
You still aren't getting the "Using index" optimization from your image_story and author_story tables. One suggestion I'd have is to eliminate the redundant indexes:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
The reason is that any query that could benefit from the single-column index on story_id can also benefit from the two-column index on (story_id,image_id). Eliminating the redundant index helps the optimizer make a better decision (as well as saving some space). This is the theory behind a tool like pt-duplicate-key-checker.
I'd also check to make sure your buffer pool is large enough to hold your indexes. You don't want indexes to be paging in and out of the buffer pool during a query.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
Check the size of indexes for your image_story table:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
And compare that to how much of those indexes are currently residing in the buffer pool:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
Of course, change `test` above to the database name your table belongs to.
That information_schema table is new in MySQL 5.6. I assume you're using MySQL 5.6 because your EXPLAIN shows "Using index condition" which is also new in MySQL 5.6.
I don't use LONGTEXT at all unless I really need to story very long strings. Keep in mind:
TEXT holds up to 64KB
MEDIUMTEXT holds up to 16MB
LONGTEXT holds up to 4GB
As you are using MYSQL you can take advantage of Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
Also one scope of improvement is in filtering the data of table stories as you only need data having status 1
So in the form clause instead of adding the whole stories table add the only the needed records as your query plan shows that there are 434792 rows and same for the classification table
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
Also one more suggestion you can increase sort_buffer_size as you are shown as order by and group by, but be careful increasing your buffer size as the size of the buffer increases for each session.
Also if it is possible you can order your records in your application if possible as you have mentioned removing the order by clause improves the takes only 1/6 part of the original time...
EDIT
Add indexes to image_story.image_id for image_story table and author_story.story_id for author_story table as these columns are used for join
Also index on images.position, images.id has to be created as you are using it.
EDIT 16/4
I think so you almost optimized your query seeing you update...
Still one place you can improve is using appropriate data type as BillKarwin has mentioned...
You can use ENUM or TINYINT type for columns like status and other which don'y have any scope of growth, it will help you to optimize your query performance and also storage performance of your table....
Hope this helps....
Computing
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
is probably the most time-consuming operation because classifications is a big table and the number of rows to work with is multiplied because of all the joins.
So I would recommend using a temporary table for that information.
Also, to avoid computing the LIKE condition twice (once for the temporary table and once for the "real" query), I would also create a temporary table for that.
Your original query, in a very simplified version (without the images and users table so that it's easier to read) is:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
I would replace it with the following queries, with temporary tables _tmp_filtered_classifications (the ids of classifications with name LIKE Home:Top%') and _tmp_classifications_of_story (for each story id 'contained' in _tmp_filtered_classifications, all classification names):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
Note that I added some more "order by" clauses to your query in order to check that both queries provide the same results (using diff). I also changed count(comments.id) to count(DISTINCT comments.id) otherwise the number of comments the query computes is wrong (again, because of the joins that multiply the number of rows).
I don't know all the details of your data to experiment, but I do know that you should perform the operation first that will match the least amount of data and therefore eliminate the most amount of data for subsequent operations.
Depending on how complex your overall query is, you may not be able to re-order the operations in this way. However, you can perform two separate queries, where the first one just eliminates data that is definitely not going to be needed, and then feeds its result to the second query. Someone else suggested using temporary tables, and that is a good way to handle that situation.
If you need any clarification of this strategy, let me know.
** Update: A similar tactic, used when each operation matches approximately the same percentage of data as the other operations, is to time each operation separately, and then run the operation first that uses the least amount of time. Some searching operations are quicker than others, and the fastest ones should be first if all other factors are equal. This way, the slower searching operations will have less data to work with, and the overall result will be higher performance.
My bet is that the LIKE condition is the worst thing on your request.
Are you sure you have to do this?
4 steps :
Create a IsHomeTop bool indexed column on the classifications table
Run UPDATE classifications SET IsTopHome = 1 WHERE NAME LIKE 'Home:Top%'
Run your initial query with WHERE classifications.IsTopHome == 1
Enjoy
Your query is too critical for letting the LIKE operator decrease your performance.
And if stories is updated a lot, I don't think it is the case of your classifications table. So give you a chance and eradicate the LIKE operator.
Some ways you can attempt here:
1) create covering index on classifications.`name`
You can speed up the query by creating covering index .
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
CREATE TABLE classifications (
KEY class_name (name,...all columns)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
2) instead of classifications.name LIKE 'Home:Top%'
use locate('Home:Top',classifications.name)

SQL Query Optimization When using Multiple Joins and Large Record Set

I am making a message board and I trying to retrieve regular topics (ie, topics that are not stickied) and sort them by the date of the last posted message. I am able to accomplish this however when I have about 10,000 messages and 1500 topics the query time is >60 seconds.
My question is, is there anything I can do to my query to increase performance or is my design fundamentally flawed?
Here is the query that I am using.
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages
GROUP BY topic_id)
AND Messages.topic_id
NOT IN (
SELECT topic_id
FROM StickiedTopics)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
Edit Here is the Explain Plan
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| 1 | PRIMARY | Topics | ref | PRIMARY,board_id | board_id | 4 | const | 641 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Users | eq_ref | PRIMARY | PRIMARY | 4 | spergs3.Topics.user_id | 1 | |
| 1 | PRIMARY | Messages | ref | topic_id | topic_id | 4 | spergs3.Topics.topic_id | 3 | Using where |
| 4 | DEPENDENT SUBQUERY | Messages | index | NULL | PRIMARY | 8 | NULL | 1 | |
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id | topic_id | 4 | func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | Messages | index | NULL | topic_id | 4 | NULL | 3 | Using index |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
Indexes
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Messages | 0 | PRIMARY | 1 | message_id | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 0 | PRIMARY | 2 | revision_no | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 1 | user_id | 1 | user_id | A | 432 | NULL | NULL | | BTREE | |
| Messages | 1 | topic_id | 1 | topic_id | A | 3318 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Topics | 0 | PRIMARY | 1 | topic_id | A | 1205 | NULL | NULL | | BTREE | |
| Topics | 1 | user_id | 1 | user_id | A | 133 | NULL | NULL | | BTREE | |
| Topics | 1 | board_id | 1 | board_id | A | 1 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Users | 0 | PRIMARY | 1 | user_id | A | 2051 | NULL | NULL | | BTREE | |
| Users | 0 | username_UNIQUE | 1 | username | A | 2051 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
I would start with the first basis of qualified topics, get those IDs, then join out after.
My inner first query does a pre-qualify grouped by topic_id and max message to just get distinct IDs pre-qualified. I've also applied a LEFT JOIN to the stickiesTopics too. Why? By doing a left-join, I can look for those that are FOUND (those you want to exclude). So I've applied a WHERE clause for Stickies topic ID is NULL (ie: NOT found). So by doing this, we've ALREADY paired down the list SIGNIFICANTLY without doing several nested sub-queries. From THAT result, we can join to the messages, topics (including qualifier of board_id = 1), users and get parts as needed. Finally, apply a single WHERE IN sub-select for your MIN(posted) qualifier. Don't understand the basis of that, but left it in as part of your original query. Then the order by and limit.
SELECT STRAIGHT_JOIN
M.topic_id,
M.posted,
T.title,
T.user_id,
U.username
FROM
( select
M1.Topic_ID,
MAX( M1.Message_id ) MaxMsgPerTopic
from
Messages M1
LEFT Join StickiedTopics ST
ON M1.Topic_ID = ST.Topic_ID
where
ST.Topic_ID IS NULL
group by
M1.Topic_ID ) PreQuery
JOIN Messages M
ON PreQuery.MaxMsgPerTopic = M.Message_ID
JOIN Topics T
ON M.Topic_ID = T.Topic_ID
AND T.Board_ID = 1
LEFT JOIN Users U
on T.User_ID = U.user_id
WHERE
M.posted IN ( SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
ORDER BY
M.posted DESC
LIMIT 50
I would guess that a big part of your problem lies in your subqueries. Try something like this:
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id
AND StickedTopics.topic_id IS NULL
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages m1
WHERE m1.topic_id = Messages.topic_id)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
I optimized the first subquery by removing the grouping. The second subquery was unnecessary because it can be replaced with a JOIN.
I'm not quite sure what this third subquery is supposed to do:
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
I might be able to help optimize this if I know what it's supposed to do. What exactly is posted - a date, integer, etc? What does it represent?