MySQL 5.6 long WHERE IN query very slow - mysql

Since version 5.6 of MySQL a very simple albeit long query takes several orders longer than in 5.4.
The schema: Three tables, one with elements, one with categories and an M:N table tween those. Create Statements:
CREATE TABLE element (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=4257455 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE category (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE elements_categories (
id int(11) NOT NULL AUTO_INCREMENT,
element_id int(11) NOT NULL,
category_id int(11) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY element_id (element_id,category_id),
KEY elements_categories_element_id (element_id),
KEY elements_categories_category_id (category_id),
CONSTRAINT D7d489b06a407a0c1c70f108712c815e FOREIGN KEY (category_id) REFERENCES category (id),
CONSTRAINT co_element_id_57f4f2ec0db9441c_fk_element_id FOREIGN KEY (element_id) REFERENCES element (id)
) ENGINE=InnoDB AUTO_INCREMENT=88131737 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The query:
SELECT elements_categories.element_id, category.id, category.name
FROM category
INNER JOIN elements_categories
ON category.id = elements_categories.category_id
WHERE elements_categories.element_id IN (1, 2, 3, ...)
So, the element table does not even play a role in this query, I already got a bunch of IDs from with with a previous query. (Disclaimer: I'm using an ORM and also inlining the first query did not make things faster.) The number of values in the IN clause can become very big, in my example 14240. That's not a problem, takes a tenth of a second or so. That's the execution plan:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+---------------------------------------------------------------------------+------------+---------+---------------------------------+-------+--------------------------+
| 1 | SIMPLE | elements_categories | range | element_id,elements_categories_element_id,elements_categories.category_id | element_id | 4 | NULL | 42720 | Using where; Using index |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 4 | elements_categories.category_id | 1 | NULL |
When I add one more element, the execution time explodes to 60 seconds plus a fetch time of 200 seconds. The execution plan also changes to this:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+------+---------------------------------------------------------------------------+---------------------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | category | ALL | PRIMARY | NULL | NULL | NULL | 75 | NULL |
| 1 | SIMPLE | elements_categories | ref | element_id,elements_categories_element_id,elements_categories_category_id | elements_categories_category_id | 4 | category.id | 760 | Using where |
range and eq_ref lookups exchanged for ALL and ref, order of tables switched, not using elements_categories.category_id as ref although it is the foreign key between those two tables. I don't get why the plan gets changed like this.
There are 75 categories and 4,300,000 elements and 1,600,000 assignments.
My guess is that I'm exceeding some size limit here, but cannot figure out which one. Also I didn't change anything from the MySQL 5.5 installation which stuck to the former execution plan all the time.

There are several ways to trick the optimizer to use the correct plan:
Add an index hint: ... JOIN elements_categories FORCE INDEX (element_id)...
Swap the tables around and make category a LEFT JOIN (assuming every elements_categories has a category). This is not a generic solution, but should work in this case.
Make a temp table with the element_id's and JOIN it in all of your queries instead of using IN (1,2,3...). You should also be able to use IN (SELECT id FROM <temp table>) instead of literals.

The reason that the optimizer chooses another plan when you have different parameters is that it looks at statistics from the tables and guess which index will remove the most rows, but this is a guess and can often be wrong.
If you know better you need to tell the optimizer what to do with an index hint like the first example #Vatev gives.
An interesting thing about the optimizer is that since an index adds an extra layer of indirection and thus potentially more reads it has to remove more than half the table to be considered useful by the optimizer. (I don't remember how much more than half...)
Another interesting feature of the optimizer is that if the index contains all information needed from a table it can avoid looking up the actual row so depending on your situation you might benefit from adding an extra column to the index. This optimization is used in the first query-plan "using index", but not the second. Thus adding "element_id" to your index "elements_categories_category_id" might speed things up. see http://dev.mysql.com/doc/refman/5.6/en/explain-output.html

Related

MySQL: why is a query using a VIEW less efficient compared to a query directly using the view's underlying JOIN?

I have three tables, bug, bugrule and bugtrace, for which relationships are:
bug 1--------N bugrule
id = bugid
bugrule 0---------N bugtrace
id = ruleid
Because I'm almost always interested in relations between bug <---> bugtrace I have created an appropriate VIEW which is used as part of several queries. Interestingly, queries using this VIEW have significantly worse performance than equivalent queries using the underlying JOIN explicitly.
VIEW definition:
CREATE VIEW bugtracev AS
SELECT t.*, r.bugid
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
Execution plan for a query using the VIEW (bad performance):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id)) FROM bugtracev AS t
WHERE t.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | t | index | NULL | ruleid | 9 | NULL | 1426004 | Using index |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id | id_2 | 8 | bugapp.t.ruleid | 1 | Using where |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
3 rows in set (0.00 sec)
Execution plan for a query using the underlying JOIN directly (good performance):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id))
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
AND r.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id,bugid | bugid | 8 | bugapp.c.id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t | ref | ruleid | ruleid | 9 | bugapp.r.id | 713002 | Using index |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
3 rows in set (0.00 sec)
CREATE TABLE statements (reduced by irrelevant columns) are:
mysql> show create table bug;
CREATE TABLE `bug` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`state` varchar(16) DEFAULT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugrule;
CREATE TABLE `bugrule` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`bugid` bigint(20) NOT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`),
KEY `bugid` (`bugid`),
CONSTRAINT `bugrule_ibfk_1` FOREIGN KEY (`bugid`) REFERENCES `bug` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugtrace;
CREATE TABLE `bugtrace` (
`id` bigint(20) NOT NULL,
`ruleid` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ruleid` (`ruleid`),
CONSTRAINT `bugtrace_ibfk_1` FOREIGN KEY (`ruleid`) REFERENCES `bugrule` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
You ask why about query optimization for a couple of complex queries with COUNT(DISTINCT val) and dependent subqueries. It's hard to know why for sure.
You will probably fix most of your performance problem by getting rid of your dependent subquery, though. Try something like this:
SELECT c.id,state, cnt.cnt
FROM bug AS c
LEFT JOIN (
SELECT bugid, COUNT(DISTINCT id) cnt
FROM bugtracev
GROUP BY bugid
) cnt ON c.id = cnt.bugid
WHERE c.version IS NULL
AND c.id<10;
Why does this help? To satisfy the query the optimizer can choose to run the GROUP BY subquery just once, rather than many times. And, you can use EXPLAIN on the GROUP BY subquery to understand its performance.
You may also get a performance boost by creating a compound index on bugrule that matches the query in your view. Try this one.
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
and try switching the last two columns like so
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
These indexes are called covering indexes because they contain all the columns needed to satisfy your query. version appears first because that helps optimize WHERE version IS NULL in your view definition. That makes it faster.
Pro tip: Avoid using SELECT * in views and queries, especially when you have performance problems. Instead, list the columns you actually need. The * may force the query optimizer to avoid a covering index, even when the index would help.
When using MySQL 5.6 (or older), try with at least MySQL 5.7. According to What’s New in MySQL 5.7?:
We have to a large extent unified the handling of derived tables and views. Until now, subqueries in the FROM clause (derived tables) were unconditionally materialized, while views created from the same query expressions were sometimes materialized and sometimes merged into the outer query. This behavior, beside being inconsistent, can lead to a serious performance penalty.

MySQL - Created index isn't showing up as possible key

I have the following table (it has more data columns, removed them because it would be a long post):
CREATE TABLE `members` (
`memberid` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`memberid`),
KEY `members_lname_ix` (`lastname`)
) ENGINE=InnoDB AUTO_INCREMENT=1019 DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;
By default, a user only ever accesses 10-20 rows from this table at a time and it is usually sorted by the lastname column, it's all paginated server side. so I decided to add an index to lastname to help with sorting, however the index does not seem to be working like I would expect it to. when I run EXPLAIN SELECT * FROM members ORDER BY lastname ASC I get:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
1 | simple | members | ALL | null | null | null | null | 711 | using filesort
I can at least confirm the index exists because if I run SHOW INDEX FROM members I get:
Table | Non_Unique | Key_name | Seq_in_ix | Col_name | Collation | Cardinality | Sub part | Packed | Null | Ix type
members | 0 | PRIMARY | 1 | memberid | A | 711 | null | null | (blank) | BTREE
members | 1 | members_lname_ix | 1 | lastname | A | 711 | null | null | YES | BTREE
if I add USE INDEX (members_lname_ix) both possible_keys and key will remain null. However if I add FORCE INDEX (members_lname_ix) possible_keys remains null and key shows members_lname_ix. This is my first time trying to apply indexing but to me this doesn't seem very intuitive - it feels like mysql should know that I created an index for lastname, no? I can't quite figure out what I'm doing wrong here unless I am misunderstanding something. Is the solution here to just keep using FORCE INDEX?
There are two ways to perform that query:
Plan A (as you were expecting):
Scan through the index sequentially, reading the entire (estimated) 711 rows.
Randomly look up each row in the data BTree. This involves reading the entire dataset.
Deliver the data in order.
Plan B (what it does):
Scan through the data, reading all 711 rows.
Sort the data
Deliver the sorted data.
Plan B does not touch the index at all; this was deemed to be a bigger savings than not having to sort the data.
In a table as tiny as yours, it would be hard to see a difference in speed. (In my test case, it took under 10 milliseconds either way.) In huge tables, the difference could be significant.
For optimal pagination, see http://mysql.rjweb.org/doc.php/pagination

MySQL with JOIN not using index

Problem with MySQL version 5.7.18. Earlier versions of MySQL behaves as supposed to.
Here are two tables. Table 1:
CREATE TABLE `test_events` (
`id` int(11) NOT NULL,
`event` int(11) DEFAULT '0',
`manager` int(11) DEFAULT '0',
`base_id` int(11) DEFAULT '0',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`client` int(11) DEFAULT '0',
`event_time` datetime DEFAULT '0000-00-00 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_events`
ADD PRIMARY KEY (`id`),
ADD KEY `client` (`client`),
ADD KEY `event_time` (`event_time`),
ADD KEY `manager` (`manager`),
ADD KEY `base_id` (`base_id`),
ADD KEY `create_time` (`create_time`);
And the second table:
CREATE TABLE `test_event_types` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`base` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_event_types`
ADD PRIMARY KEY (`id`);
Let's try to select last event from base "314":
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
WHERE base = 314
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | test_events | NULL | ALL | NULL | NULL | NULL | NULL | 434928 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | test_event_types | NULL | ALL | PRIMARY | NULL | NULL | NULL | 44 | 2.27 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
MySQL is not using index and reads the whole table.
Without WHERE statement:
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| 1 | SIMPLE | test_events | NULL | index | NULL | create_time | 4 | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | test_event_types | NULL | eq_ref | PRIMARY | PRIMARY | 4 | m16.test_events.event | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
Now it uses index.
MySQL 5.5.55 uses index in both cases. Why is it so and what to do with it?
I don't know the difference you are seeing in your previous and current installations but the servers behaviour makes sense.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) ORDER BY test_events.create_time DESC LIMIT 1;
In this query you do not have a where clause but you are fetching one row only. And that's after sorting by create_time which happens to have an index. And that index can be used for sorting. But let's see the second query.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) WHERE base = 314 ORDER BY test_events.create_time DESC LIMIT 1
You don't have an index on the base column. So no index can be used on that. To find the relevent records mysql has to do a table scan. Having identified the relevent rows, they need to be sorted. But in this case the query planner has decided that it's just not worth it to use the index on create_time
I see several problems with your setup, the first being not having and index on base as already mentioned. But why is base varchar? You appear to be storing integers in it.
ALTER TABLE test_events
ADD PRIMARY KEY (id),
ADD KEY client (client),
ADD KEY event_time (event_time),
ADD KEY manager (manager),
ADD KEY base_id (base_id),
ADD KEY create_time (create_time);
And making multiple indexes like this doesn't make much sense in mysql. That's because mysql can use only one index per table for queries. You would be far better off with one or two indexes. Possibly multi column indexes.
I think your ideal index would contain both create_time and event fields
base = 314 with base VARCHAR... is a performance problem. Either put quotes around 314 or make base some integer type.
You appear not to need LEFT. If not, then do a plain JOIN so that the optimizer has the freedom to start with an INDEX(base), which is then missing and needed.
As for the differences between 5.5 and 5.6 and 5.7, there have been a number of Optimization changes; you may have encountered a regression. But I don't want to chase that until you have improved the query and indexes.
I stumbled upon same scenario where MySQL was using table scan, instead of INDEX search.
This could be because of one of the reasons, mentioned in MySQL docs:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
mysql docs link
And when I checked EXPLAIN of MySQL query in production server with large number of rows, it used INDEX search as expected.
Its one of the MySQL optimizations, under the hood :)

mysql difference in index usage between MyISAM and InnoDB

I have these small tables, item and category:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
`category_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `category_id` (`category_id`)
) CHARSET=utf8
CREATE TABLE `category` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) CHARSET=utf8
I have inserted 100 categories and 1000 items.
If I run this:
EXPLAIN SELECT item.id,category.name AS category_name FROM item JOIN category ON item.category_id=category.id;
Then, if the tables' engine is InnoDB I get:
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| 1 | SIMPLE | category | index | PRIMARY | name | 452 | NULL | 103 | Using index |
| 1 | SIMPLE | item | ref | category_id | category_id | 3 | dbname.category.id | 5 | Using index |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
Whereas, if I switch to MyISAM (with alter table engine=myisam) I get:
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| 1 | SIMPLE | item | ALL | category_id | NULL | NULL | NULL | 1003 | |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 3 | dbname.item.category_id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
My question is, why this difference in the way indexes are handled?
In InnoDB, any secondary index internally contains the primary key column of the table. So the index name on column (name) is implicitly on columns (name, id).
This means that EXPLAIN shows your access to the category table as an "index-scan" (this is shown in the type column as "index"). By scanning the index, it also has access to the id column, which it uses to look up rows in the second table, item.
Then it also takes advantage of the item index on (category_id) which is really (category_id, id), and it is able to fetch item.id for your select-list simply by reading the index. No need to read the table at all (this is shown in the Extra column as "Using index").
MyISAM doesn't store primary keys with the secondary key in this way, so it can't get the same optimizations. The access to the category table is type "ALL" which means a table-scan.
I would expect the access to the MyISAM table item would be "ref" as it looks up rows using the index on (category_id). But the optimizer may get skewed results if you have very few rows in the table, or if you haven't done ANALYZE TABLE item since creating the index.
Re your update:
It looks like the optimizer prefers an index-scan over a table-scan, so it takes the opportunity to do an index-scan in InnoDB, and puts the category table first. The optimizer decides to re-order the tables instead of using the tables in the order you gave them in your query.
In the MyISAM tables, there will be one table-scan whichever table it chooses to access first, but by putting the category table second, it joins to category's PRIMARY key index instead of item's secondary index. The optimizer prefers lookups to a unique or primary key (type "eq_ref").

MySQL JOIN extremely poor performance

I've been messing around all day trying to find why my query performance is terrible. It is extremely simple, yet can take over 15 minutes to execute (I abort the query at that stage). I am joining a table with over 2 million records.
This is the select:
SELECT
audit.MessageID, alerts.AlertCount
FROM
audit
LEFT JOIN (
SELECT MessageID, COUNT(ID) AS 'AlertCount'
FROM alerts
GROUP BY MessageID
) AS alerts ON alerts.MessageID = audit.MessageID
This is the EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | PRIMARY | AL | index | NULL | IDX_audit_MessageID | 4 | NULL | 2330944 | 100.00 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 124140 | 100.00 | |
| 2 | DERIVED | alerts | index | NULL | IDX_alerts_MessageID | 5 | NULL | 124675 | 100.00 | Using index |
This is the schema:
# Not joining, just showing types
CREATE TABLE messages (
ID int NOT NULL AUTO_INCREMENT,
MessageID varchar(255) NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_messages_MessageID (MessageID)
);
# 2,324,931 records
CREATE TABLE audit (
ID int NOT NULL AUTO_INCREMENT,
MessageID int NOT NULL,
LogTimestamp timestamp NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_audit_MessageID (MessageID),
CONSTRAINT FK_audit_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
# 124,140
CREATE TABLE alerts (
ID int NOT NULL AUTO_INCREMENT,
AlertLevel int NOT NULL,
Text nvarchar(4096) DEFAULT NULL,
MessageID int DEFAULT 0,
PRIMARY KEY (ID),
INDEX IDX_alert_MessageID (MessageID),
CONSTRAINT FK_alert_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
A few very important things to note - the MessageID is not 1:1 in either 'audit' or 'alerts'; The MessageID can exist in one table, but not the other, or may exist in both (which is the purpose of my join); In my test DB, none of the MessageID exist in both. In other words, my query will return 2.3 million records with 0 as the count.
Another thing to note is that the 'audit' and 'alert' tables used to use MessageID as varchar(255). I created the 'messages' table expecting that it would fix the join. It actually made it worse. Previously, it would take 78 seconds, now, it never returns.
What am I missing about MySQL?
Subqueries are very hard for the MySQL engine to optimize. Try:
SELECT
audit.MessageID, COUNT(alerts.ID) AS AlertCount
FROM
audit
LEFT JOIN alerts ON alerts.MessageID = audit.MessageID
GROUP BY audit.MessageID
You're joining to a subquery.
The subquery results are effectively a temporary table - note the <derived2> in the query execution plan. As you can see there, they're not indexed, since they're ephemeral.
You should execute the query as a single unit with a join, rather than joining to the results of a second query.
EDIT: Andrew has posted an answer with one example of how to do your work in a normal join query, instead of in two steps.