I have a table for log entries, and a description table for the about 100 possible log codes:
CREATE TABLE `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`partner_id` smallint(4) NOT NULL,
`log_code` smallint(4) NOT NULL,
PRIMARY KEY (`logentry_id`),
KEY `IX_code` (`log_code`),
KEY `IX_partner_code` (`partner_id`,`log_code`)
) ENGINE=MyISAM ;
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(4) NOT NULL DEFAULT '0',
`log_desc` varchar(255) DEFAULT NULL,
`category_overview` tinyint(1) NOT NULL DEFAULT '0',
`category_error` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`log_code`),
KEY `IX_overview_code` (`category_overview`,`log_code`),
KEY `IX_error_code` (`category_error`,`log_code`)
) ENGINE=MyISAM ;
The follwing query (matching 10k of 20k rows) executes in 0.0034 sec (using LIMIT 0,20):
SELECT log_entries.date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1;
But when adding ORDER BY log_entries.logentry_id DESC, which is of course necessary, it slows down to 0.6 sec. Probably because "Using temporary" is used on the log_codes table? Removing the indexes actually makes the query perform faster, but still slow (0.3 sec).
EXPLAIN output of the query without ORDER BY:
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,IX_overview_code | IX_overview_code | 1 | const | 56 | |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,log_codes.log_code | 25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
And including the ORDER BY:
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,IX_overview_code | IX_overview_code | 1 | const | 56 | Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,log_codes.log_code | 25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
Any hints on how to get this query to perform faster? I can't see why "using temporary" should be needed, as the log codes should be chosen before fetching and sorting the appropiate log entries?
UPDATE #Eugen Rieck:
SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_id;
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | log_entries | ref | IX_code,IX_partner_code | IX_partner_code | 7 | const,lc.log_code | 25 | Using where |
| 2 | DERIVED | log_codes | ref | IX_overview_code | IX_overview_code | 1 | | 56 | |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
UPDATE #RolandoMySQLDBA:
With my original indexes, ORDER BY date DESC:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | ref | IX_overview_code | IX_overview_code | 1 | | 56 | |
| 2 | DERIVED | log_entries | ALL | IX_partner_code | NULL | NULL | NULL | 22787 | Using where |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
With your indexes, no ordering:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code);
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | index | IX_overview_code_desc | IX_overview_code_desc | 771 | NULL | 80 | Using where; Using index |
| 2 | DERIVED | log_entries | index | IX_partner_code_date | IX_partner_code_date | 15 | NULL | 22787 | Using where; Using index |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
With your indexes, ORDER BY date DESC:
SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21937 | Using where; Using join buffer |
| 3 | DERIVED | log_codes | index | IX_overview_code_desc | IX_overview_code_desc | 771 | NULL | 80 | Using where; Using index |
| 2 | DERIVED | log_entries | index | IX_partner_code_date | IX_partner_code_date | 15 | NULL | 22787 | Using where; Using index |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
UPDATE #Joe Stefanelli:
SELECT log_entries.date, log_codes.log_desc FROM log_entries INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY date DESC;
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ALL | PRIMARY,IX_code_overview | NULL | NULL | NULL | 80 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | IX_code,IX_code_partner | IX_code_partner | 7 | log_codes.log_code,const | 25 | Using where |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
I think, most of problems here and in similar questions come from misunderstanding how MySQL (and other databases) uses indexes for sorting. The answer is: MySQL does not use indexes for sorting, it just can read data in the order of an index or in the opposite direction. If you happened to want the data to be oredered in the order of the currently used index - you are lucky, otherwise the result will be sorted (hence filesort in EXPLAIN)
That is order of the whole result mostly depends on which table was the first in the join. And if you look at your EXPLAIN you will see that the join starts from 'log_codes' table (because it is much smaller).
Basically, what you need is a composite index (partner_id, date) on 'log_entries', a covering composite index (log_code, category_overview, log_desc) for 'log_codes', change 'INNER JOIN' to 'STRAIGHT_JOIN' to force the join order, and order by 'date' DESC (this index will fortunately be covering too).
UPD1: I am sorry, I mistyped the index for the first table: it should be (partner_id, log_code, date).
But I still struggle to understand why MySQL choose to "use temporary" on the log_codes table (and 100x query time) when I try to sort on a column in another table?
MySQL can either directly output data as long as you agree with the ordering in which it gets it, or put data in a temporary table, apply sorting and output then. When you order by a field from any non-first table in joins, MySQL has to sort data (not just output in the order of an index) and to sort data it needs a temporary table.
But as I get further into the dataset it is slower (6 sec for LIMIT 50000,25). Do you know why?
To output rows 50000,25 MySQL anyway needs to fetch the first 50000 and skip them. Since I missed a column in the index, MySQL not just skanned the index but for each item made an additional on disc lookup for log_code value. With the covering index that should be much faster, since all data can be fetched from the index.
UPD2: try to force the index:
SELECT log_entries.date, log_codes.log_desc
FROM log_entries FORCE INDEX (IX_partner_code_date)
STRAIGHT_JOIN log_codes
ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1
AND log_codes.category_overview = 1
ORDER BY log_entries.date DESC;
You are going to need two things
REFACTOR THE QUERY
SELECT log_entries.date, log_codes.log_desc FROM
(SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries
INNER JOIN
(SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes
USING (log_code);
CREATE INDEXES TO SUPPORT SUBQUERIES AND REDUCE TABLE ACCESS
Before creating these indexes, run these
SELECT COUNT(1) rowcount,partner_id FROM log_entries GROUP BY partner_id;
SELECT COUNT(1) rowcount,category_overview FROM log_codes GROUP BY category_overview;
If none of the counts from all possible partner_id values exceed 5% of the log_entries table, create this index
ALTER TABLE log_entries ADD INDEX (partner_id,log_code,date);
If none of the counts from all possible category_overview values exceed 5% of the log_codes table, create this index
ALTER TABLE log_codes ADD INDEX (category_overview,log_code,log_desc);
Give it a Try !!!
Please try this refactored query with LIMIT 0,25 included
SELECT log_entries.date, log_codes.log_desc FROM
(
SELECT A.log_code FROM
(SELECT log_code FROM log_entries WHERE partner_id = 1) A INNER JOIN
(SELECT log_code FROM log_codes WHERE category_overview = 1) B USING (log_code)
LIMIT 0,25
) log_code_keys
INNER JOIN log_entries USING (log_code)
INNER JOIN log_code USING (log_code);
I'd start by reversing the columns in the IX_partner_code and IX_overview_code indexes. That should make them better suited to support both the JOIN and the WHERE clause.
...
KEY `IX_code_partner` (`log_code`,`partner_id`)
...
KEY `IX_code_overview` (`log_code`,`category_overview`),
...
Related
Query
SELECT SQL_NO_CACHE contacts.id,
contacts.date_modified contacts__date_modified
FROM contacts
INNER JOIN
(SELECT tst.team_set_id
FROM team_sets_teams tst
INNER JOIN team_memberships team_membershipscontacts ON (team_membershipscontacts.team_id = tst.team_id)
AND (team_membershipscontacts.user_id = '5daa2e92-c347-11e9-afc5-525400a80916')
AND (team_membershipscontacts.deleted = 0)
GROUP BY tst.team_set_id) contacts_tf ON contacts_tf.team_set_id = contacts.team_set_id
LEFT JOIN contacts_cstm contacts_cstm ON contacts_cstm.id_c = contacts.id
WHERE contacts.deleted = 0
ORDER BY contacts.date_modified DESC,
contacts.id DESC
LIMIT 21;
Takes extremely long (2 minutes on 2M records). I cant change this query, since it is system generated
This is it's explain:
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_contacts_tmst_id,idx_del_date_modified,idx_contacts_del_last,idx_cont_del_reports,idx_del_id_user | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 3 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 3 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
But when I use force index(idx_del_date_modified) (which is the same index used in explain), the query takes just 0.01s and I get slightly different explain.
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_del_date_modified | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 2 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
The first query uses temporary table and file sort, but the query with force index uses just where. Shouldn't the queries be the same? Why is the query with force index so much faster - used index is still the same.
According to MySQL manual:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if
the ORDER BY or GROUP BY contains columns from tables other than the
first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory
temporary table, unless the query also contains elements (described
later) that require on-disk storage.
Likely, you have better performance because in MySQL there is the query optimizer component.
If you create index the query optimizer could not use the index column even though the index exists.
Using force index(..) you are forcing MySql to use index, instead.
Please consider a detailed example here.
MySQL version 5.7.17
When I use JOIN for small tables, MySQL don't allow me to use index on the main table.
Compare:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | ALL | NULL | NULL | NULL | NULL | 60737 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | ko_of | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 2 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
and the same query with index hint:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
FORCE INDEX FOR JOIN (`PRIMARY`)
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | index | NULL | order_time | 6 | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | ko_of | NULL | eq_ref | PRIMARY | PRIMARY | 4 | e3.ko_base_client_orders.office | 1 | 100.00 | Using index |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
In second case MySQL uses key for order_time, but in first example it ignores index and scans the entire table.
I can not predict which table would be small, and which would not (it differs from project to project), so I don't want to use index hints all the time. Is there another solution for this problem?
Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
I have a MySQL query on which I believe I already have all the indexes and so I would expect the query to be processed instantly. However, the query takes several seconds to finish. Using EXPLAIN shows the query uses INDEX CONDITION:
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | il | fulltext | PRIMARY,FK32F0579FBAF38C6F,name_ftx | name_ftx | 0 | NULL | 1 | Using where; Using temporary |
| 1 | SIMPLE | i | eq_ref | PRIMARY,FK5C6729AF5D52323,... | PRIMARY | 8 | goout.il.thing_id | 1 | Using where |
| 1 | SIMPLE | s | ref | state,event_id,parent_id | parent_id | 9 | const | 5 | Using index condition; Using where; Distinct |
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
Please notice the third row which uses this INDEX CONDITION. Is there a way to find what causes it?
SELECT DISTINCT i.id
FROM event i
JOIN event_locale il ON i.id = il.thing_id
LEFT JOIN event_schedule s ON i.id = s.event_id
WHERE (i.parent_id IS NULL)
AND (s.state = 2 OR s.state = 3)
AND s.parent_id IS NULL
AND (i.state = 2 OR i.state = 3)
AND (MATCH(il.name) AGAINST ("jan"))
LIMIT 10;
I am positive I have indexes on all used columns and fulltext index on the match-against column.
I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | |
| log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | |
| log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | |
| log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where |
| 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index |
| 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index |
| 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index |
| 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | |
| 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index |
| 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index |
| 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.
All subsequent comments are therefore to be taken not as absolute rules:
An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?
For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.
So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.
Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.
Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -
ALTER TABLE `log_codes`
ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);
ALTER TABLE `log_entries`
ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.