I am using InnoDB.
QUERY, EXPLAIN & INDEXES
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(
DISTINCT classifications2.name SEPARATOR ';'
) AS classifications_name,
GROUP_CONCAT(
DISTINCT images.id
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_id,
GROUP_CONCAT(
DISTINCT images.caption
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_caption,
GROUP_CONCAT(
DISTINCT images.thumbnail
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_thumbnail,
GROUP_CONCAT(
DISTINCT images.medium
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_medium,
GROUP_CONCAT(
DISTINCT images.large
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_large,
GROUP_CONCAT(
DISTINCT users.id
ORDER BY users.id SEPARATOR ';'
) AS authors_id,
GROUP_CONCAT(
DISTINCT users.display_name
ORDER BY users.id SEPARATOR ';'
) AS authors_display_name,
GROUP_CONCAT(
DISTINCT users.url
ORDER BY users.id SEPARATOR ';'
) AS authors_url
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
LEFT JOIN author_story
ON stories.id = author_story.story_id
LEFT JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY classifications.`name`, classifications.`position`
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL |
| 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE |
| stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE |
| stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE |
| stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE |
| classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE |
| classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE |
| comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
QUERY TIMES
It takes on average 0.035 seconds to run.
If I remove only the GROUP BY, the time drops to 0.007 on average.
If I remove only the stories.status=1 filter, the time drops to 0.025 on average. This one seems like it can be easily optimized.
And if I remove only the LIKE filter and ORDER BY clause, the time drops to 0.006 on average.
UPDATE 1: 2013-04-13
My understanding has improved manifold going through the answers.
I added indices to author_story and images_story which seemed improved query to 0.025 seconds but for some strange reason the EXPLAIN plan looks a whole lot better. At this point removing ORDER BY drops query to 0.015 seconds and dropping both ORDER BY and GROUP BY improves query performance to 0.006. I am these are the two things to focus on right now? I may move ORDER BY into app logic if needed.
Here are the revised EXPLAIN and INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | |
| author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | |
| stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | |
| stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | |
| stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | |
| stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | |
| images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | |
| images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | |
| users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | |
| users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id,
GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name,
GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url,
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name,
GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id,
GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption,
GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail,
GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium,
GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
INNER JOIN author_story
ON stories.id = author_story.story_id
INNER JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY NULL
UPDATE 2: 2013-04-14
I noticed one other thing. If I don't SELECT stories.content (LONGTEXT) and stories.content_html (LONGTEXT) the query drops from 0.015 seconds to 0.006 seconds. For now I am considering if I can do without content and content_html or replace them with something else.
I have updated the query, indexes and explain in the 2013-04-13 update above instead of re-posting in this one since they were minor and incremental. The query is still using filesort. I can't get rid of GROUP BY but have gotten rid of ORDER BY.
UPDATE 3: 2013-04-16
As requested, I dropped the stories_id INDEXES from both image_story and author_story as they are redundant. The result was that output of explain only changed to show that the possible_keys changed. It still didn't show Using Index optimization unfortunately.
Also changed LONGTEXT to TEXT and am now fetching LEFT(stories.content, 500) instead of stories.content which is making a very significant difference in query execution time.
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
| 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+
innodb_buffer_pool_size
134217728
TABLE_NAME INDEX_LENGTH
image_story 10010624
image_story 4556800
image_story 0
TABLE_NAME INDEX_NAMES SIZE
dawn/image_story story_image 13921
I can see two opportunities for optimization right away:
Change an OUTER JOIN to INNER JOIN
Your query is currently scanning 434792 stories, and you should be able to narrow that down better, assuming not every story has a classification matching 'Home:Top%'. It would be better to use an index to find the classifications you're looking for, and then look up the matching stories.
But you're using LEFT OUTER JOIN for classifications, meaning all stories will be scanned whether they have a matching classification or not. Then you're defeating that by putting a condition on classifications in the WHERE clause, effectively making it mandatory that there be a classification matching your pattern with LIKE. So it's no longer an outer join, it's an inner join.
If you put the classifications table first, and make it an inner join, the optimizer will use that to narrow down the search for stories just to those that have a matching classification.
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
The optimizer is supposed to be able to figure out when it's advantageous to re-order tables, so you may not have to change the order in your query. But you do need to use an INNER JOIN in this case.
Add compound indexes
Your intersection tables image_story and author_story don't have compound indexes. It's often a big advantage to add compound indexes to the intersection tables in a many-to-many relationship, so that they can perform the join and get the "Using index" optimization.
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
Re your comments and update:
I'm not sure you created the new indexes correctly. Your dump of the indexes doesn't show the columns, and according to the updated EXPLAIN, the new indexes aren't being used, which I would expect to happen. Using the new indexes would result in "Using index" in the extra field of EXPLAIN, which should help performance.
Output of SHOW CREATE TABLE for each table would be more complete information than a dump of the indexes (without column names) as you have shown.
You may have to run ANALYZE TABLE once on each of those tables after creating the indexes. Also, run the query more than once, to make sure the indexes are in the buffer pool. Is this table InnoDB or MyISAM?
I also notice in your EXPLAIN output that the rows column shows a lot fewer rows being touched. That's an improvement.
Do you really need the ORDER BY? If you use ORDER BY NULL you should be able to get rid of the "Using filesort" and that may improve performance.
Re your update:
You still aren't getting the "Using index" optimization from your image_story and author_story tables. One suggestion I'd have is to eliminate the redundant indexes:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
The reason is that any query that could benefit from the single-column index on story_id can also benefit from the two-column index on (story_id,image_id). Eliminating the redundant index helps the optimizer make a better decision (as well as saving some space). This is the theory behind a tool like pt-duplicate-key-checker.
I'd also check to make sure your buffer pool is large enough to hold your indexes. You don't want indexes to be paging in and out of the buffer pool during a query.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
Check the size of indexes for your image_story table:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
And compare that to how much of those indexes are currently residing in the buffer pool:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
Of course, change `test` above to the database name your table belongs to.
That information_schema table is new in MySQL 5.6. I assume you're using MySQL 5.6 because your EXPLAIN shows "Using index condition" which is also new in MySQL 5.6.
I don't use LONGTEXT at all unless I really need to story very long strings. Keep in mind:
TEXT holds up to 64KB
MEDIUMTEXT holds up to 16MB
LONGTEXT holds up to 4GB
As you are using MYSQL you can take advantage of Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
Also one scope of improvement is in filtering the data of table stories as you only need data having status 1
So in the form clause instead of adding the whole stories table add the only the needed records as your query plan shows that there are 434792 rows and same for the classification table
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
Also one more suggestion you can increase sort_buffer_size as you are shown as order by and group by, but be careful increasing your buffer size as the size of the buffer increases for each session.
Also if it is possible you can order your records in your application if possible as you have mentioned removing the order by clause improves the takes only 1/6 part of the original time...
EDIT
Add indexes to image_story.image_id for image_story table and author_story.story_id for author_story table as these columns are used for join
Also index on images.position, images.id has to be created as you are using it.
EDIT 16/4
I think so you almost optimized your query seeing you update...
Still one place you can improve is using appropriate data type as BillKarwin has mentioned...
You can use ENUM or TINYINT type for columns like status and other which don'y have any scope of growth, it will help you to optimize your query performance and also storage performance of your table....
Hope this helps....
Computing
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
is probably the most time-consuming operation because classifications is a big table and the number of rows to work with is multiplied because of all the joins.
So I would recommend using a temporary table for that information.
Also, to avoid computing the LIKE condition twice (once for the temporary table and once for the "real" query), I would also create a temporary table for that.
Your original query, in a very simplified version (without the images and users table so that it's easier to read) is:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
I would replace it with the following queries, with temporary tables _tmp_filtered_classifications (the ids of classifications with name LIKE Home:Top%') and _tmp_classifications_of_story (for each story id 'contained' in _tmp_filtered_classifications, all classification names):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
Note that I added some more "order by" clauses to your query in order to check that both queries provide the same results (using diff). I also changed count(comments.id) to count(DISTINCT comments.id) otherwise the number of comments the query computes is wrong (again, because of the joins that multiply the number of rows).
I don't know all the details of your data to experiment, but I do know that you should perform the operation first that will match the least amount of data and therefore eliminate the most amount of data for subsequent operations.
Depending on how complex your overall query is, you may not be able to re-order the operations in this way. However, you can perform two separate queries, where the first one just eliminates data that is definitely not going to be needed, and then feeds its result to the second query. Someone else suggested using temporary tables, and that is a good way to handle that situation.
If you need any clarification of this strategy, let me know.
** Update: A similar tactic, used when each operation matches approximately the same percentage of data as the other operations, is to time each operation separately, and then run the operation first that uses the least amount of time. Some searching operations are quicker than others, and the fastest ones should be first if all other factors are equal. This way, the slower searching operations will have less data to work with, and the overall result will be higher performance.
My bet is that the LIKE condition is the worst thing on your request.
Are you sure you have to do this?
4 steps :
Create a IsHomeTop bool indexed column on the classifications table
Run UPDATE classifications SET IsTopHome = 1 WHERE NAME LIKE 'Home:Top%'
Run your initial query with WHERE classifications.IsTopHome == 1
Enjoy
Your query is too critical for letting the LIKE operator decrease your performance.
And if stories is updated a lot, I don't think it is the case of your classifications table. So give you a chance and eradicate the LIKE operator.
Some ways you can attempt here:
1) create covering index on classifications.`name`
You can speed up the query by creating covering index .
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
CREATE TABLE classifications (
KEY class_name (name,...all columns)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
2) instead of classifications.name LIKE 'Home:Top%'
use locate('Home:Top',classifications.name)
Related
I have a query:
SELECT `dsd_prefix`,
`dsd_partner`,
`eev1`.`eev_dse_element_name`,
`devd_explanation`,
`devd_min`,
`eev1`.`eev_dev_value`,
`devd_max`,
`devd_format`,
`devd_not_applicable`,
`devd_not_available`,
`dsd_nid`
FROM `devdescription`
INNER JOIN ekohubelementvalue AS `eev1`
ON `eev1`.`eev_dse_element_name` = `devd_element_name`
AND `eev1`.`eev_prefix` = `devd_prefix`
LEFT JOIN `ekohubelementvalue` AS `eev2`
ON `eev1`.`eev_prefix` = `eev2`.`eev_prefix`
AND `eev1`.`eev_dse_element_name` = `eev2`.`eev_dse_element_name`
AND `eev1`.`eev_subcategory` = `eev2`.`eev_subcategory`
AND `eev1`.`eev_company_id` = `eev2`.`eev_company_id`
AND `eev2`.`eev_date_updated` > `eev1`.`eev_date_updated`
INNER JOIN `datasourcedescription`
ON `eev1`.`eev_prefix` = `dsd_prefix`
WHERE (`eev1`.`eev_company_id` = 'ADD4027'
AND `eev2`.`eev_date_updated` IS NULL
AND `dsd_type_id` != 'MAJ'
AND `dsd_hide` = 'No'
AND (`devd_supress` IS NULL OR `devd_supress` <> 'Yes'))
GROUP BY `eev1`.`eev_dse_element_name`, `eev1`.`eev_prefix`
ORDER BY dsd_prefix
EXPLAIN of this query:
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | datasourcedescription | NULL | ALL | PRIMARY,datasourcedescription_dsd_type_id | NULL | NULL | NULL | 688 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | eev1 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date_for_company,sources_for_special_issue | earliest_and_latest | 47 | csrhub_data_1.datasourcedescription.dsd_prefix | 607 | 0.04 | Using where |
| 1 | SIMPLE | devdescription | NULL | ref | reports,supress,devd_element_name | reports | 816 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name | 1 | 50.00 | Using where |
| 1 | SIMPLE | eev2 | NULL | ref | eev_prefix,eev_company_id,earliest_and_latest,slice_by_date,slice_by_date_for_company,sources_for_special_issue | eev_prefix | 861 | csrhub_data_1.datasourcedescription.dsd_prefix,csrhub_data_1.eev1.eev_dse_element_name,csrhub_data_1.eev1.eev_company_id | 17 | 19.00 | Using where |
+----+-------------+-----------------------+------------+------+-----------------------------------------------------------------------------------------------------------------+---------------------+---------+--------------------------------------------------------------------------------------------------------------------------+------+----------+----------------------------------------------+
As you can see the datasourcedescription indexes are not being used though they exist in posible_keys. The key column is NULL.
SHOW INDEXES FROM datasourcedescription;
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| datasourcedescription | 0 | PRIMARY | 1 | dsd_prefix | A | 688 | NULL | NULL | | BTREE | | | YES | NULL |
| datasourcedescription | 1 | datasourcedescription_dsd_type_id | 1 | dsd_type_id | A | 8 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-----------------------+------------+-----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
How to make the optimizer utilize datasourcedescription indexes?
As response to #O. Jones:
The datasourcedescription columns are dsd_prefix, dsd_type_id and dsd_hide
The table datasourcedescription has 727 rows.
The table ekohubelementvalue has nearly 300,000,000 (300M) rows
You metion the ekohubelementvalue has nearly 3M rows. Your where clause was based on a specific company ID. I would rewrite the query slightly, but also, ensure the ekohubelementvalue table has an index with the company id in the primary position and other columns to help cover the join/wehre criteria where possible. Also with MySQL, I would add the "STRAIGHT_JOIN" keyword to tell MySQL to query in the order you provided
vs it guessing which order.
I would have the following indexes available
ekohubelementvalue index on ( eev_company_id, eev_prefix, eev_dse_element_name, eev_subcategory, eev_date_updated )
devdescription index on ( devd_element_name, devd_prefix, devd_supress )
datasourcedescription index on ( dsd_prefix, dsd_type_id, dsd_hide )
Since the order was by the dsd_prefix, but that was joined by the eev_prefix, use the eev_prefix from the primary table which already has optimized index component, let the primary table (not the lookups) be the basis of the group/order.
I also cleaned-up the query some. Easier to give aliases to long table names so you can use the alias for qualifying each column in the query and respective joins.
SELECT STRAIGHT_JOIN
dsd.dsd_prefix,
dsd.dsd_partner,
eev1.eev_dse_element_name,
devd.devd_explanation,
devd.devd_min,
eev1.eev_dev_value,
devd.devd_max,
devd.devd_format,
devd.devd_not_applicable,
devd.devd_not_available,
dsd.dsd_nid
FROM
ekohubelementvalue AS eev1
INNER JOIN devdescription devd
ON eev1.eev_prefix = devd.devd_prefix
AND eev1.eev_dse_element_name = devd.devd_element_name
LEFT JOIN ekohubelementvalue AS eev2
ON eev1.eev_company_id = eev2.eev_company_id
AND eev1.eev_prefix = eev2.eev_prefix
AND eev1.eev_dse_element_name = eev2.eev_dse_element_name
AND eev1.eev_subcategory = eev2.eev_subcategory
AND eev1.eev_date_updated < eev2.eev_date_updated
INNER JOIN datasourcedescription dsd
ON eev1.eev_prefix = dsd.dsd_prefix
AND dsd.dsd_type_id != 'MAJ'
AND dsd.dsd_hide = 'No'
WHERE
eev1.eev_company_id = 'ADD4027'
AND ( devd.devd_supress IS NULL
OR devd.devd_supress <> 'Yes')
AND eev2.eev_date_updated IS NULL
GROUP BY
eev1.eev_prefix,
eev1.eev_dse_element_name
ORDER BY
eev1.eev_prefix
I have stumbled upon a performance issue with this query. I've stared at this problem for a long time now scratching my head. This query was actually pretty fast at one point, but once data grew, it became slower and slower. The 'Posts' table has +5 million rows, the 'Items' table has +6000 rows. These tables are growing constantly on a daily basis.
SELECT Posts.itemID, Items.itemName, Items.itemImage, Items.guid, Posts.price,
Posts.quantity, Posts.date, Games.name, Items.profit FROM Items
INNER JOIN Posts ON Items.itemID=Posts.itemID
INNER JOIN Games ON Posts.gameID=Games.gameID
WHERE Posts.postID IN (SELECT MAX(postID) FROM Posts GROUP BY itemID) AND Posts.gameID=:gameID
AND Posts.price BETWEEN :price_min AND :price_max
AND Posts.quantity BETWEEN :quant_min AND :quant_max
AND Items.profit BETWEEN :profit_min AND :profit_max
ORDER BY Items.profit DESC LIMIT 0, 20
In the code I've split up the query and sub query into two. Together they were performing slower. This was all good and well, until the data in both the Posts and Items started growing. The 'where' statements that I've put in ** get concatenate depending on what filters are set.
Here's the EXPLAIN that I get. (This is the query without the sub query)
https://docs.google.com/file/d/0B1jxMdMfC35VeDBEbnJISmNGb3c/edit?usp=sharing
SHOW INDEX FROM Posts:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Posts | 0 | PRIMARY | 1 | postID | A | 5890249 | NULL | NULL | | BTREE | | |
| Posts | 1 | itemID | 1 | itemID | A | 16453 | NULL | NULL | YES | BTREE | | |
| Posts | 1 | gameID | 1 | gameID | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Items;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Items | 0 | PRIMARY | 1 | itemID | A | 6452 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM Games;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Games | 0 | PRIMARY | 1 | gameID | A | 2487 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Is there anyway I can make this query faster? Do you guys have any advice? Is there a better way of writing this query? All help is appreciated.
EXPLAIN Proposed Query:
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19 | Using temporary; Using filesort |
| 1 | PRIMARY | p | eq_ref | PRIMARY,itemID,gameID | PRIMARY | 4 | q.postID | 1 | |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
| 1 | PRIMARY | g | eq_ref | PRIMARY | PRIMARY | 4 | db323245342342345.p.gameID | 1 | Using where |
| 2 | DERIVED | p | ref | itemID,gameID | gameID | 2 | | 2945124 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 2 | db323245342342345.p.itemID | 1 | Using where |
+----+-------------+------------+--------+-----------------------+---------+---------+----------------------------+---------+----------------------------------------------+
Try to rewrite it with JOIN. Something like
SELECT p.itemID,
i.itemName,
i.itemImage,
i.guid,
p.price,
p.quantity,
p.date,
g.name,
i.profit
FROM
(
SELECT MAX(postID) postID
FROM Posts p JOIN Items i
ON p.itemID = i.itemID
WHERE p.gameID = :gameID
AND p.price BETWEEN :price_min AND :price_max
AND p.quantity BETWEEN :quant_min AND :quant_max
AND i.profit BETWEEN :profit_min AND :profit_max
GROUP BY itemID
) q JOIN Posts p
ON q.postID = p.postID JOIN Items i
ON p.itemID = i.itemID JOIN Games g
ON p.gameID = g.gameID
ORDER BY i.profit DESC
LIMIT 0, 20
Not sure if this helps, but try moving the subquery to the end of your where clause and also try making it a correlated subquery. Move the filter on Items to the top.
SELECT
p1.itemID,
Items.itemName,
Items.itemImage,
Items.guid,
p1.price,
p1.quantity,
p1.date,
Games.name,
Items.profit
FROM Items
INNER JOIN Posts p1 ON Items.itemID=p1.itemID
INNER JOIN Games ON p1.gameID=Games.gameID
WHERE Items.profit BETWEEN :profit_min AND :profit_max
AND p1.gameID=:gameID
AND p1.price BETWEEN :price_min AND :price_max
AND p1.quantity BETWEEN :quant_min AND :quant_max
AND p1.postID IN (SELECT MAX(p2.postID) FROM posts p2 WHERE p2.itemID = p1.ItemID GROUP BY p2.itemID)
ORDER BY
Items.profit DESC
LIMIT 0, 20
Also, make sure you create an index on Posts(itemID, gameID, postID)
MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.
I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | |
| log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | |
| log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | |
| log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where |
| 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index |
| 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index |
| 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index |
| 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | |
| 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index |
| 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index |
| 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.
All subsequent comments are therefore to be taken not as absolute rules:
An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?
For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.
So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.
Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.
Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -
ALTER TABLE `log_codes`
ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);
ALTER TABLE `log_entries`
ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.
I've recently noticed that a query I have is running quite slowly, at almost 1 second per query.
The query looks like this
SELECT eventdate.id,
eventdate.eid,
eventdate.date,
eventdate.time,
eventdate.title,
eventdate.address,
eventdate.rank,
eventdate.city,
eventdate.state,
eventdate.name,
source.link,
type,
eventdate.img
FROM source
RIGHT OUTER JOIN
(
SELECT event.id,
event.date,
users.name,
users.rank,
users.eid,
event.address,
event.city,
event.state,
event.lat,
event.`long`,
GROUP_CONCAT(types.type SEPARATOR ' | ') AS type
FROM event FORCE INDEX (latlong_idx)
JOIN users ON event.uid = users.id
JOIN types ON users.tid=types.id
WHERE `long` BETWEEN -74.36829174058 AND -73.64365405942
AND lat BETWEEN 40.35195025942 AND 41.07658794058
AND event.date >= '2009-10-15'
GROUP BY event.id, event.date
ORDER BY event.date, users.rank DESC
LIMIT 0, 20
)eventdate
ON eventdate.uid = source.uid
AND eventdate.date = source.date;
and the explain is
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 20 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | event | ALL | latlong_idx | NULL | NULL | NULL | 19500 | Using temporary; Using filesort |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.id | 10674 | Using index |
| 2 | DERIVED | users | eq_ref | id_idx | id_idx | 4 | active.types.id | 1 | Using where |
+----+-------------+------------+--------+---------------+-------------+---------+------------------------------+-------+---------------------------------+
I've tried using 'force index' on latlong, but that doesn't seem to speed things up at all.
Is it the derived table that is causing the slow responses? If so, is there a way to improve the performance of this?
--------EDIT-------------
I've attempted to improve the formatting to make it more readable, as well
I run the same query changing only the 'WHERE statement as
WHERE users.id = (
SELECT users.id
FROM users
WHERE uidname = 'frankt1'
ORDER BY users.approved DESC , users.rank DESC
LIMIT 1 )
AND date & gt ; = '2009-10-15'
GROUP BY date
ORDER BY date)
That query runs in 0.006 seconds
the explain looks like
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 42 | |
| 1 | PRIMARY | source | ref | iddate_idx | iddate_idx | 7 | eventdate.id,eventdate.date | 156 | |
| 2 | DERIVED | users | const | id_idx | id_idx | 4 | | 1 | |
| 2 | DERIVED | event | range | eiddate_idx | eiddate_idx | 7 | NULL | 24 | Using where |
| 2 | DERIVED | types | ref | eid_idx | eid_idx | 4 | active.event.bid | 3 | Using index |
| 3 | SUBQUERY | users | ALL | idname_idx | idname_idx | 767 | | 5 | Using filesort |
+----+-------------+------------+-------+---------------+---------------+---------+------------------------------+------+----------------+
The only way to clean up that mammoth SQL statement is to go back to the drawing board and carefully work though your database design and requirements. As soon as you start joining 6 tables and using an inner select you should expect incredible execution times.
As a start, ensure that all your id fields are indexed, but better to ensure that your design is valid. I don't know where to START looking at your SQL - even after I reformatted it for you.
Note that 'using indexes' means you need to issue the correct instructions when you CREATE or ALTER the tables you are using. See for instance MySql 5.0 create indexes