Proper index/query when using INNER JOIN - mysql

I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | |
| log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | |
| log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | |
| log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where |
| 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index |
| 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index |
| 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index |
| 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | |
| 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index |
| 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index |
| 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+

There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.
All subsequent comments are therefore to be taken not as absolute rules:
An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?
For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.
So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.
Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.

Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -
ALTER TABLE `log_codes`
ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);
ALTER TABLE `log_entries`
ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.

Related

Joined query index not applied

I have a query:
SELECT `dsd_prefix`,
`dsd_partner`,
`eev1`.`eev_dse_element_name`,
`devd_explanation`,
`devd_min`,
`eev1`.`eev_dev_value`,
`devd_max`,
`devd_format`,
`devd_not_applicable`,
`devd_not_available`,
`dsd_nid`
FROM `devdescription`
INNER JOIN ekohubelementvalue AS `eev1`
ON `eev1`.`eev_dse_element_name` = `devd_element_name`
AND `eev1`.`eev_prefix` = `devd_prefix`
LEFT JOIN `ekohubelementvalue` AS `eev2`
ON `eev1`.`eev_prefix` = `eev2`.`eev_prefix`
AND `eev1`.`eev_dse_element_name` = `eev2`.`eev_dse_element_name`
AND `eev1`.`eev_subcategory` = `eev2`.`eev_subcategory`
AND `eev1`.`eev_company_id` = `eev2`.`eev_company_id`
AND `eev2`.`eev_date_updated` > `eev1`.`eev_date_updated`
INNER JOIN `datasourcedescription`
ON `eev1`.`eev_prefix` = `dsd_prefix`
WHERE (`eev1`.`eev_company_id` = 'ADD4027'
AND `eev2`.`eev_date_updated` IS NULL
AND `dsd_type_id` != 'MAJ'
AND `dsd_hide` = 'No'
AND (`devd_supress` IS NULL OR `devd_supress` <> 'Yes'))
GROUP BY `eev1`.`eev_dse_element_name`, `eev1`.`eev_prefix`
ORDER BY dsd_prefix
EXPLAIN of this query:
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | datasourcedescription | NULL | ALL | PRIMARY,datasourcedescription_dsd_type_id | NULL | NULL | NULL | 688 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | eev1 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date_for_company,sources_for_special_issue | earliest_and_latest | 47 | csrhub_data_1.datasourcedescription.dsd_prefix | 607 | 0.04 | Using where |
| 1 | SIMPLE | devdescription | NULL | ref | reports,supress,devd_element_name | reports | 816 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name | 1 | 50.00 | Using where |
| 1 | SIMPLE | eev2 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date,slice_by_date_for_company,sources_for_special_issue | eev_prefix | 861 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name,csrhub_data_1.eev1.eev_company_id | 17 | 19.00 | Using where |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
As you can see the datasourcedescription indexes are not being used though they exist in posible_keys. The key column is NULL.
SHOW INDEXES FROM datasourcedescription;
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| datasourcedescription | 0 | PRIMARY | 1 | dsd_prefix | A | 688 | NULL | NULL | | BTREE | | | YES | NULL |
| datasourcedescription | 1 | datasourcedescription_dsd_type_id | 1 | dsd_type_id | A | 8 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
How to make the optimizer utilize datasourcedescription indexes?
As response to #O. Jones:
The datasourcedescription columns are dsd_prefix, dsd_type_id and dsd_hide
The table datasourcedescription has 727 rows.
The table ekohubelementvalue has nearly 300,000,000 (300M) rows
You metion the ekohubelementvalue has nearly 3M rows. Your where clause was based on a specific company ID. I would rewrite the query slightly, but also, ensure the ekohubelementvalue table has an index with the company id in the primary position and other columns to help cover the join/wehre criteria where possible. Also with MySQL, I would add the "STRAIGHT_JOIN" keyword to tell MySQL to query in the order you provided
vs it guessing which order.
I would have the following indexes available
ekohubelementvalue index on ( eev_company_id, eev_prefix, eev_dse_element_name, eev_subcategory, eev_date_updated )
devdescription index on ( devd_element_name, devd_prefix, devd_supress )
datasourcedescription index on ( dsd_prefix, dsd_type_id, dsd_hide )
Since the order was by the dsd_prefix, but that was joined by the eev_prefix, use the eev_prefix from the primary table which already has optimized index component, let the primary table (not the lookups) be the basis of the group/order.
I also cleaned-up the query some. Easier to give aliases to long table names so you can use the alias for qualifying each column in the query and respective joins.
SELECT STRAIGHT_JOIN
dsd.dsd_prefix,
dsd.dsd_partner,
eev1.eev_dse_element_name,
devd.devd_explanation,
devd.devd_min,
eev1.eev_dev_value,
devd.devd_max,
devd.devd_format,
devd.devd_not_applicable,
devd.devd_not_available,
dsd.dsd_nid
FROM
ekohubelementvalue AS eev1
INNER JOIN devdescription devd
ON eev1.eev_prefix = devd.devd_prefix
AND eev1.eev_dse_element_name = devd.devd_element_name
LEFT JOIN ekohubelementvalue AS eev2
ON eev1.eev_company_id = eev2.eev_company_id
AND eev1.eev_prefix = eev2.eev_prefix
AND eev1.eev_dse_element_name = eev2.eev_dse_element_name
AND eev1.eev_subcategory = eev2.eev_subcategory
AND eev1.eev_date_updated < eev2.eev_date_updated
INNER JOIN datasourcedescription dsd
ON eev1.eev_prefix = dsd.dsd_prefix
AND dsd.dsd_type_id != 'MAJ'
AND dsd.dsd_hide = 'No'
WHERE
eev1.eev_company_id = 'ADD4027'
AND ( devd.devd_supress IS NULL
OR devd.devd_supress <> 'Yes')
AND eev2.eev_date_updated IS NULL
GROUP BY
eev1.eev_prefix,
eev1.eev_dse_element_name
ORDER BY
eev1.eev_prefix

Query uses temporary table without force index

Query
SELECT SQL_NO_CACHE contacts.id,
contacts.date_modified contacts__date_modified
FROM contacts
INNER JOIN
(SELECT tst.team_set_id
FROM team_sets_teams tst
INNER JOIN team_memberships team_membershipscontacts ON (team_membershipscontacts.team_id = tst.team_id)
AND (team_membershipscontacts.user_id = '5daa2e92-c347-11e9-afc5-525400a80916')
AND (team_membershipscontacts.deleted = 0)
GROUP BY tst.team_set_id) contacts_tf ON contacts_tf.team_set_id = contacts.team_set_id
LEFT JOIN contacts_cstm contacts_cstm ON contacts_cstm.id_c = contacts.id
WHERE contacts.deleted = 0
ORDER BY contacts.date_modified DESC,
contacts.id DESC
LIMIT 21;
Takes extremely long (2 minutes on 2M records). I cant change this query, since it is system generated
This is it's explain:
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_contacts_tmst_id,idx_del_date_modified,idx_contacts_del_last,idx_cont_del_reports,idx_del_id_user | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 3 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 3 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
But when I use force index(idx_del_date_modified) (which is the same index used in explain), the query takes just 0.01s and I get slightly different explain.
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_del_date_modified | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 2 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
The first query uses temporary table and file sort, but the query with force index uses just where. Shouldn't the queries be the same? Why is the query with force index so much faster - used index is still the same.
According to MySQL manual:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if
the ORDER BY or GROUP BY contains columns from tables other than the
first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory
temporary table, unless the query also contains elements (described
later) that require on-disk storage.
Likely, you have better performance because in MySQL there is the query optimizer component.
If you create index the query optimizer could not use the index column even though the index exists.
Using force index(..) you are forcing MySql to use index, instead.
Please consider a detailed example here.

Query Optimization (WHERE, GROUP BY, LEFT JOINs)

I am using InnoDB.
QUERY, EXPLAIN & INDEXES
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(
DISTINCT classifications2.name SEPARATOR ';'
) AS classifications_name,
GROUP_CONCAT(
DISTINCT images.id
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_id,
GROUP_CONCAT(
DISTINCT images.caption
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_caption,
GROUP_CONCAT(
DISTINCT images.thumbnail
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_thumbnail,
GROUP_CONCAT(
DISTINCT images.medium
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_medium,
GROUP_CONCAT(
DISTINCT images.large
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_large,
GROUP_CONCAT(
DISTINCT users.id
ORDER BY users.id SEPARATOR ';'
) AS authors_id,
GROUP_CONCAT(
DISTINCT users.display_name
ORDER BY users.id SEPARATOR ';'
) AS authors_display_name,
GROUP_CONCAT(
DISTINCT users.url
ORDER BY users.id SEPARATOR ';'
) AS authors_url
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
LEFT JOIN author_story
ON stories.id = author_story.story_id
LEFT JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY classifications.`name`, classifications.`position`
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL |
| 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE |
| stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE |
| stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE |
| stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE |
| classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE |
| classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE |
| comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
QUERY TIMES
It takes on average 0.035 seconds to run.
If I remove only the GROUP BY, the time drops to 0.007 on average.
If I remove only the stories.status=1 filter, the time drops to 0.025 on average. This one seems like it can be easily optimized.
And if I remove only the LIKE filter and ORDER BY clause, the time drops to 0.006 on average.
UPDATE 1: 2013-04-13
My understanding has improved manifold going through the answers.
I added indices to author_story and images_story which seemed improved query to 0.025 seconds but for some strange reason the EXPLAIN plan looks a whole lot better. At this point removing ORDER BY drops query to 0.015 seconds and dropping both ORDER BY and GROUP BY improves query performance to 0.006. I am these are the two things to focus on right now? I may move ORDER BY into app logic if needed.
Here are the revised EXPLAIN and INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | |
| author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | |
| stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | |
| stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | |
| stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | |
| stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | |
| images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | |
| images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | |
| users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | |
| users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id,
GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name,
GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url,
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name,
GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id,
GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption,
GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail,
GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium,
GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
INNER JOIN author_story
ON stories.id = author_story.story_id
INNER JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY NULL
UPDATE 2: 2013-04-14
I noticed one other thing. If I don't SELECT stories.content (LONGTEXT) and stories.content_html (LONGTEXT) the query drops from 0.015 seconds to 0.006 seconds. For now I am considering if I can do without content and content_html or replace them with something else.
I have updated the query, indexes and explain in the 2013-04-13 update above instead of re-posting in this one since they were minor and incremental. The query is still using filesort. I can't get rid of GROUP BY but have gotten rid of ORDER BY.
UPDATE 3: 2013-04-16
As requested, I dropped the stories_id INDEXES from both image_story and author_story as they are redundant. The result was that output of explain only changed to show that the possible_keys changed. It still didn't show Using Index optimization unfortunately.
Also changed LONGTEXT to TEXT and am now fetching LEFT(stories.content, 500) instead of stories.content which is making a very significant difference in query execution time.
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
innodb_buffer_pool_size
134217728
TABLE_NAME INDEX_LENGTH
image_story 10010624
image_story 4556800
image_story 0
TABLE_NAME INDEX_NAMES SIZE
dawn/image_story story_image 13921
I can see two opportunities for optimization right away:
Change an OUTER JOIN to INNER JOIN
Your query is currently scanning 434792 stories, and you should be able to narrow that down better, assuming not every story has a classification matching 'Home:Top%'. It would be better to use an index to find the classifications you're looking for, and then look up the matching stories.
But you're using LEFT OUTER JOIN for classifications, meaning all stories will be scanned whether they have a matching classification or not. Then you're defeating that by putting a condition on classifications in the WHERE clause, effectively making it mandatory that there be a classification matching your pattern with LIKE. So it's no longer an outer join, it's an inner join.
If you put the classifications table first, and make it an inner join, the optimizer will use that to narrow down the search for stories just to those that have a matching classification.
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
The optimizer is supposed to be able to figure out when it's advantageous to re-order tables, so you may not have to change the order in your query. But you do need to use an INNER JOIN in this case.
Add compound indexes
Your intersection tables image_story and author_story don't have compound indexes. It's often a big advantage to add compound indexes to the intersection tables in a many-to-many relationship, so that they can perform the join and get the "Using index" optimization.
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
Re your comments and update:
I'm not sure you created the new indexes correctly. Your dump of the indexes doesn't show the columns, and according to the updated EXPLAIN, the new indexes aren't being used, which I would expect to happen. Using the new indexes would result in "Using index" in the extra field of EXPLAIN, which should help performance.
Output of SHOW CREATE TABLE for each table would be more complete information than a dump of the indexes (without column names) as you have shown.
You may have to run ANALYZE TABLE once on each of those tables after creating the indexes. Also, run the query more than once, to make sure the indexes are in the buffer pool. Is this table InnoDB or MyISAM?
I also notice in your EXPLAIN output that the rows column shows a lot fewer rows being touched. That's an improvement.
Do you really need the ORDER BY? If you use ORDER BY NULL you should be able to get rid of the "Using filesort" and that may improve performance.
Re your update:
You still aren't getting the "Using index" optimization from your image_story and author_story tables. One suggestion I'd have is to eliminate the redundant indexes:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
The reason is that any query that could benefit from the single-column index on story_id can also benefit from the two-column index on (story_id,image_id). Eliminating the redundant index helps the optimizer make a better decision (as well as saving some space). This is the theory behind a tool like pt-duplicate-key-checker.
I'd also check to make sure your buffer pool is large enough to hold your indexes. You don't want indexes to be paging in and out of the buffer pool during a query.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
Check the size of indexes for your image_story table:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
And compare that to how much of those indexes are currently residing in the buffer pool:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
Of course, change `test` above to the database name your table belongs to.
That information_schema table is new in MySQL 5.6. I assume you're using MySQL 5.6 because your EXPLAIN shows "Using index condition" which is also new in MySQL 5.6.
I don't use LONGTEXT at all unless I really need to story very long strings. Keep in mind:
TEXT holds up to 64KB
MEDIUMTEXT holds up to 16MB
LONGTEXT holds up to 4GB
As you are using MYSQL you can take advantage of Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
Also one scope of improvement is in filtering the data of table stories as you only need data having status 1
So in the form clause instead of adding the whole stories table add the only the needed records as your query plan shows that there are 434792 rows and same for the classification table
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
Also one more suggestion you can increase sort_buffer_size as you are shown as order by and group by, but be careful increasing your buffer size as the size of the buffer increases for each session.
Also if it is possible you can order your records in your application if possible as you have mentioned removing the order by clause improves the takes only 1/6 part of the original time...
EDIT
Add indexes to image_story.image_id for image_story table and author_story.story_id for author_story table as these columns are used for join
Also index on images.position, images.id has to be created as you are using it.
EDIT 16/4
I think so you almost optimized your query seeing you update...
Still one place you can improve is using appropriate data type as BillKarwin has mentioned...
You can use ENUM or TINYINT type for columns like status and other which don'y have any scope of growth, it will help you to optimize your query performance and also storage performance of your table....
Hope this helps....
Computing
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
is probably the most time-consuming operation because classifications is a big table and the number of rows to work with is multiplied because of all the joins.
So I would recommend using a temporary table for that information.
Also, to avoid computing the LIKE condition twice (once for the temporary table and once for the "real" query), I would also create a temporary table for that.
Your original query, in a very simplified version (without the images and users table so that it's easier to read) is:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
I would replace it with the following queries, with temporary tables _tmp_filtered_classifications (the ids of classifications with name LIKE Home:Top%') and _tmp_classifications_of_story (for each story id 'contained' in _tmp_filtered_classifications, all classification names):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
Note that I added some more "order by" clauses to your query in order to check that both queries provide the same results (using diff). I also changed count(comments.id) to count(DISTINCT comments.id) otherwise the number of comments the query computes is wrong (again, because of the joins that multiply the number of rows).
I don't know all the details of your data to experiment, but I do know that you should perform the operation first that will match the least amount of data and therefore eliminate the most amount of data for subsequent operations.
Depending on how complex your overall query is, you may not be able to re-order the operations in this way. However, you can perform two separate queries, where the first one just eliminates data that is definitely not going to be needed, and then feeds its result to the second query. Someone else suggested using temporary tables, and that is a good way to handle that situation.
If you need any clarification of this strategy, let me know.
** Update: A similar tactic, used when each operation matches approximately the same percentage of data as the other operations, is to time each operation separately, and then run the operation first that uses the least amount of time. Some searching operations are quicker than others, and the fastest ones should be first if all other factors are equal. This way, the slower searching operations will have less data to work with, and the overall result will be higher performance.
My bet is that the LIKE condition is the worst thing on your request.
Are you sure you have to do this?
4 steps :
Create a IsHomeTop bool indexed column on the classifications table
Run UPDATE classifications SET IsTopHome = 1 WHERE NAME LIKE 'Home:Top%'
Run your initial query with WHERE classifications.IsTopHome == 1
Enjoy
Your query is too critical for letting the LIKE operator decrease your performance.
And if stories is updated a lot, I don't think it is the case of your classifications table. So give you a chance and eradicate the LIKE operator.
Some ways you can attempt here:
1) create covering index on classifications.`name`
You can speed up the query by creating covering index .
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
CREATE TABLE classifications (
KEY class_name (name,...all columns)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
2) instead of classifications.name LIKE 'Home:Top%'
use locate('Home:Top',classifications.name)

MySQL index slowing down query

MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.

When ordering by date desc, "Using temporary" slows down query

I have a table for log entries, and a description table for the about 100 possible log codes:
CREATE TABLE `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`partner_id` smallint(4) NOT NULL,
`log_code` smallint(4) NOT NULL,
PRIMARY KEY (`logentry_id`),
KEY `IX_code` (`log_code`),
KEY `IX_partner_code` (`partner_id`,`log_code`)
) ENGINE=MyISAM ;
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(4) NOT NULL DEFAULT '0',
`log_desc` varchar(255) DEFAULT NULL,
`category_overview` tinyint(1) NOT NULL DEFAULT '0',
`category_error` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`log_code`),
KEY `IX_overview_code` (`category_overview`,`log_code`),
KEY `IX_error_code` (`category_error`,`log_code`)
) ENGINE=MyISAM ;
The follwing query (matching 10k of 20k rows) executes in 0.0034 sec (using LIMIT 0,20):
SELECT log_entries.date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1;
But when adding ORDER BY log_entries.logentry_id DESC, which is of course necessary, it slows down to 0.6 sec. Probably because "Using temporary" is used on the log_codes table? Removing the indexes actually makes the query perform faster, but still slow (0.3 sec).
EXPLAIN output of the query without ORDER BY:
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,IX_overview_code | IX_overview_code | 1 | const | 56 | |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,log_codes.log_code | 25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
And including the ORDER BY:
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,IX_overview_code | IX_overview_code | 1 | const | 56 | Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,log_codes.log_code | 25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
Any hints on how to get this query to perform faster? I can't see why "using temporary" should be needed, as the log codes should be chosen before fetching and sorting the appropiate log entries?
UPDATE #Eugen Rieck:
SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_id;
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,lc.log_code | 25 | Using where |
| 2 | DERIVED | log_codes | ref | IX_overview_code | IX_overview_code | 1 | | 56 | |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
UPDATE #RolandoMySQLDBA:
With my original indexes, ORDER BY date DESC:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | ref | IX_overview_code | IX_overview_code | 1 | | 56 | |
| 2 | DERIVED | log_entries | ALL | IX_partner_code | NULL | NULL | NULL | 22787 | Using where |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
With your indexes, no ordering:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code);
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | index | IX_overview_code_desc | IX_overview_code_desc | 771 | NULL | 80 | Using where; Using index |
| 2 | DERIVED | log_entries | index | IX_partner_code_date | IX_partner_code_date | 15 | NULL | 22787 | Using where; Using index |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
With your indexes, ORDER BY date DESC:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | index | IX_overview_code_desc | IX_overview_code_desc | 771 | NULL | 80 | Using where; Using index |
| 2 | DERIVED | log_entries | index | IX_partner_code_date | IX_partner_code_date | 15 | NULL | 22787 | Using where; Using index |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
UPDATE #Joe Stefanelli:
SELECT log_entries.date, log_codes.log_desc FROM log_entries INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY date DESC;
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ALL | PRIMARY,IX_code_overview | NULL | NULL | NULL | 80 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_code_partner | IX_code_partner | 7 | log_codes.log_code,const | 25 | Using where |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
I think, most of problems here and in similar questions come from misunderstanding how MySQL (and other databases) uses indexes for sorting. The answer is: MySQL does not use indexes for sorting, it just can read data in the order of an index or in the opposite direction. If you happened to want the data to be oredered in the order of the currently used index - you are lucky, otherwise the result will be sorted (hence filesort in EXPLAIN)
That is order of the whole result mostly depends on which table was the first in the join. And if you look at your EXPLAIN you will see that the join starts from 'log_codes' table (because it is much smaller).
Basically, what you need is a composite index (partner_id, date) on 'log_entries', a covering composite index (log_code, category_overview, log_desc) for 'log_codes', change 'INNER JOIN' to 'STRAIGHT_JOIN' to force the join order, and order by 'date' DESC (this index will fortunately be covering too).
UPD1: I am sorry, I mistyped the index for the first table: it should be (partner_id, log_code, date).
But I still struggle to understand why MySQL choose to "use temporary" on the log_codes table (and 100x query time) when I try to sort on a column in another table?
MySQL can either directly output data as long as you agree with the ordering in which it gets it, or put data in a temporary table, apply sorting and output then. When you order by a field from any non-first table in joins, MySQL has to sort data (not just output in the order of an index) and to sort data it needs a temporary table.
But as I get further into the dataset it is slower (6 sec for LIMIT 50000,25). Do you know why?
To output rows 50000,25 MySQL anyway needs to fetch the first 50000 and skip them. Since I missed a column in the index, MySQL not just skanned the index but for each item made an additional on disc lookup for log_code value. With the covering index that should be much faster, since all data can be fetched from the index.
UPD2: try to force the index:
SELECT log_entries.date, log_codes.log_desc
FROM log_entries FORCE INDEX (IX_partner_code_date)
STRAIGHT_JOIN log_codes
ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1
AND log_codes.category_overview = 1
ORDER BY log_entries.date DESC;
You are going to need two things
REFACTOR THE QUERY
SELECT log_entries.date, log_codes.log_desc FROM
(SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries
INNER JOIN
(SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes
USING (log_code);
CREATE INDEXES TO SUPPORT SUBQUERIES AND REDUCE TABLE ACCESS
Before creating these indexes, run these
SELECT COUNT(1) rowcount,partner_id FROM log_entries GROUP BY partner_id;
SELECT COUNT(1) rowcount,category_overview FROM log_codes GROUP BY category_overview;
If none of the counts from all possible partner_id values exceed 5% of the log_entries table, create this index
ALTER TABLE log_entries ADD INDEX (partner_id,log_code,date);
If none of the counts from all possible category_overview values exceed 5% of the log_codes table, create this index
ALTER TABLE log_codes ADD INDEX (category_overview,log_code,log_desc);
Give it a Try !!!
Please try this refactored query with LIMIT 0,25 included
SELECT log_entries.date, log_codes.log_desc FROM
(
SELECT A.log_code FROM
(SELECT log_code FROM log_entries WHERE partner_id = 1) A INNER JOIN
(SELECT log_code FROM log_codes WHERE category_overview = 1) B USING (log_code)
LIMIT 0,25
) log_code_keys
INNER JOIN log_entries USING (log_code)
INNER JOIN log_code USING (log_code);
I'd start by reversing the columns in the IX_partner_code and IX_overview_code indexes. That should make them better suited to support both the JOIN and the WHERE clause.
...
KEY `IX_code_partner` (`log_code`,`partner_id`)
...
KEY `IX_code_overview` (`log_code`,`category_overview`),
...