Why does the output of EXPLAIN change after each SHOW index? - mysql

I was trying to improve performance on some queries through indexes using EXPLAIN and I noticed each time I used SHOW index FROM TableB; the output of the rows colums in the EXPLAIN of a query changed
Ex:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
and...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Why does this happen?

The rows column should be taken as a rough estimate only. It's not a precise number.
It's based on statistical estimates of how many rows will be examined during a query. The actual number of rows cannot be known until you actually execute the query.
The statistics are based on samples read from the table periodically. These samples are re-read occasionally, for example after you run ANALYZE TABLE or certain INFORMATION_SCHEMA queries, or certain SHOW statements.

I don't find 20% variation in statistics to be a big deal. In many situations, think of the graph being like an upturned parabola, and you need to know which side of the minimum point you are on. In complex queries, where the Optimizer is likely to goof, it need a lot more than simple stats, such as Histograms of MariaDB 10.0 / 10.1. (I don't have enough experience with such to say whether that makes much headway.)
Your particular query is probably going to be performed in only one way, regardless of the statistics. An example of a complicated query would be a JOIN with WHERE clauses filtering each table. The optimizer has to decide which table to start with. Another case is a single table with a WHERE and ORDER BY and they cannot both be handled by a single index -- should it use an index to filter, but then have to sort? or should it use an index for ORDER BY, but then have to filter on the fly?

Related

How to make an efficient UPDATE like my SELECT in MariaDB

Background
I made a small table of 10 rows from a previous SELECT already ran (SavedAnimals).
I have a massive table (animals) which I would like to UPDATE using the rows with the same id as each row in my new table.
What I have tried so far
I can quickly SELECT the desired rows from the big table like this:
mysql> EXPLAIN SELECT * FROM animals WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | animals | eq_ref | PRIMARY | PRIMARY | 8 | db_staging.SavedAnimals.animal_id | 1 | Using where |
| 2 | MATERIALIZED | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
But the "same" command on the UPDATE is not quick:
mysql> EXPLAIN UPDATE animals SET ignored=1, ignored_when=CURRENT_TIMESTAMP WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | PRIMARY | animals | index | NULL | PRIMARY | 8 | NULL | 34269464 | Using where |
| 2 | DEPENDENT SUBQUERY | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
2 rows in set (0.00 sec)
The UPDATE command never finishes if I run it.
QUESTION
How do I make mariaDB run with the Materialized select_type on the UPDATE like it does on the SELECT?
OR
Is there a totally separate way that I should approach this which would be quick?
Notes
Version: 10.3.23-MariaDB-log
Use JOIN rather than WHERE...IN. MySQL tends to optimize them better.
UPDATE animals AS a
JOIN SavedAnimals AS sa ON a.id = sa.animal_id
SET a.ignored=1, a.ignored_when=CURRENT_TIMESTAMP
WHERE a.ignored = 0
You should find an EXISTS clause more efficient than an IN clause. For example:
UPDATE animals a
SET a.ignored = 1,
a.ignored_when = CURRENT_TIMESTAMP
WHERE a.ignored = 0
AND EXISTS (SELECT * FROM SavedAnimals sa WHERE sa.animal_id = a.id)

Comparing performance of two queries in MySQL

I am trying to optimize a query on a mysql table I've created. I expect that there will be many many rows in the table. Looking at this question the accepted answer and the top voted answer suggests two different approaches.
I wrote these two queries and want to know which one is more performant.
SELECT uv.*
FROM UserVisit uv INNER JOIN
(SELECT ID,MAX(visitDate) visitDate
FROM UserVisit GROUP BY ID) last
ON (uv.ID = last.ID AND uv.visitDate = last.visitDate);
Running this with EXPLAIN yields:
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | PRIMARY | uv | eq_ref | PRIMARY | PRIMARY | 11 | last.playscanID,last.visitDate | 1 | |
| 2 | DERIVED | UserVisit | index | NULL | PRIMARY | 11 | NULL | 4 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
3 rows in set (0.01 sec)
And the other query:
SELECT lastVisits.*
FROM ( SELECT * FROM UserVisit ORDER BY visitDate DESC ) lastVisits
GROUP BY lastVisits.ID
Running that with EXPLAIN yields:
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | UserVisit | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)
I am uncertain how to interpret the result of the two EXPLAINs.
Which of these queries can I expect to be faster and why?
EDIT:
This is the way UserVisit table looks:
+----------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ID | bigint(20) unsigned | NO | PRI | NULL | |
| visitDate | date | NO | PRI | NULL | |
| visitTime | time | NO | | NULL | |
| analysisResult | decimal(3,2) | NO | | NULL | |
+----------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Firstly, you might want to read the manual on EXPLAIN. It's a dense read, but it should provide most of the information you want.
Secondly, as Strawberry says, the second query works by accident. The behaviour may change in future versions, and your query would not return an error, just different data. That's nearly always a bad thing.
Finally, the EXPLAIN suggests that version 1 will be faster. In EXTRA, it's saying it's using an index, which is much faster than filesort. Without a schema, it's hard to be sure, but I think you will also benefit from a compound key on ID and visitdate.

Is this SQl Considered a JOIN?

I am using MySQl to category name from 1 table based on the Category ID in a "Module" table.
I have the below SQL working fine for my needs but I am wanting to know if this is considered a JOIN or not?
Since it does not call a JOIN ?
SELECT `mo_category_fk` , `mo_name_vc` , `mc_name_vc`
FROM x_modcats mc, x_modules m
WHERE mc.mc_id_pk = m.mo_category_fk
AND m.mo_folder_vc = :module
Yes - In MySQL implicit and explicit joins have identical execution plans. You can verify this with EXPLAIN. But here is a sample from another thread:
mysql> explain select * from table1 a inner join table2 b on a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| 1 | SIMPLE | b | ALL | PRIMARY | NULL | NULL | NULL | 986 | |
| 1 | SIMPLE | a | ref | pid | pid | 4 | schema.b.pid | 70 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.02 sec)
mysql> explain select * from table1 a, table2 b where a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| 1 | SIMPLE | b | ALL | PRIMARY | NULL | NULL | NULL | 986 | |
| 1 | SIMPLE | a | ref | pid | pid | 4 | schema.b.pid | 70 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.00 sec)
Yes you are joining. Per the documentation, , can be used as a substitute for the JOIN keyword .. except you can't use the very helpful ON clause. However, you have a condition that connects the tables in the WHERE clause. In my opinion, it makes more sense to do it as part of the FROM clause:
SELECT mo_category_fk, mo_name_vc, mc_name_vc
FROM x_modcats mc
JOIN x_modules m ON (mc.mc_id_pk = m.mo_category_fk)
WHERE m.mo_folder_vc = :module

MySQL index slowing down query

MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.

Proper index/query when using INNER JOIN

I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | |
| log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | |
| log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | |
| log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where |
| 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index |
| 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index |
| 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index |
| 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | |
| 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index |
| 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index |
| 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.
All subsequent comments are therefore to be taken not as absolute rules:
An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?
For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.
So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.
Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.
Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -
ALTER TABLE `log_codes`
ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);
ALTER TABLE `log_entries`
ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.