I'm trying to run the following query in my database:
SELECT * FROM ts_cards WHERE ( cardstatus= 2 OR cardstatus= 3 ) AND ( cardtype= 1 OR cardtype= 2 ) ORDER BY cardserial DESC LIMIT 10;
All three fields (cardstatus, cardtype and cardserial) are indexed:
mysql> SHOW INDEX FROM ts_cards;
+----------+------------+----------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| ts_cards | 0 | PRIMARY | 1 | card_id | A | 15000134 | NULL | NULL | | BTREE | |
| ts_cards | 1 | CardID | 1 | cardserial | A | 15000134 | NULL | NULL | | BTREE | |
| ts_cards | 1 | CardType | 1 | cardtype | A | 17 | NULL | NULL | | BTREE | |
| ts_cards | 1 | CardHolder | 1 | cardstatusholder | A | 17 | NULL | NULL | | BTREE | |
| ts_cards | 1 | CardExpiration | 1 | cardexpiredstatus | A | 17 | NULL | NULL | | BTREE | |
| ts_cards | 1 | CardStatus | 1 | cardstatus | A | 17 | NULL | NULL | | BTREE | |
+----------+------------+----------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.22 sec)
(Yes, I know the index's names suck)
However, by default, MySQL uses only cardstatus' index:
mysql> EXPLAIN SELECT * FROM `ts_cards` WHERE ( cardstatus= 2 OR cardstatus= 3 ) AND ( cardtype= 1 OR cardtype= 2 ) ORDER BY cardserial DESC LIMIT 10;
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | ts_cards | range | CardType,CardStatus | CardStatus | 1 | NULL | 3215967 | Using where; Using filesort |
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
(It doesn't even consider the index on cardserial but I guess that's another problem.)
Using "USE KEY" or "FORCE KEY" can make it use cardtype's index, but not both cardtype and cardstatus:
mysql> EXPLAIN SELECT * FROM `ts_cards` FORCE KEY (CardType) WHERE ( cardstatus= 2 OR cardstatus= 3 ) AND ( cardtype= 1 OR cardtype= 2 ) ORDER BY cardserial DESC LIMIT 10;
+----+-------------+----------+-------+---------------+----------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+----------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | ts_cards | range | CardType | CardType | 1 | NULL | 6084861 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+----------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM `ts_cards` FORCE KEY (CardType,CardStatus) WHERE ( cardstatus= 2 OR cardstatus= 3 ) AND ( cardtype= 1 OR cardtype= 2 ) ORDER BY cardserial DESC LIMIT 10;
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
| 1 | SIMPLE | ts_cards | range | CardType,CardStatus | CardStatus | 1 | NULL | 3215967 | Using where; Using filesort |
+----+-------------+----------+-------+---------------------+------------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)
How can I force MySQL to use BOTH indexes to speed up the query? Both cardtype and cardstatus indexes seem to be defined in the same way yet cardstatus seems to take precedence over cardtype.
IIRC, MySQL cannot use two distinct indexes in the same query. To make use of both indexes, MySQL would need to merge them into one (link to manual). Here is an example if such merge (click on "View Execution Plan"). Notice the "index_merge" of the first SELECT.
Disclaimer: I'm not absolutely sure about the above information.
In your case, despite your hints, the optimizer still considers that the direct scanning of the second table is faster than merging indexes (your tables probably have a very large number of rows, hence a very large, costly-to-manipulate index).
I advise:
ALTER TABLE ADD INDEX CardTypeStatus (cardtype, cardstatus);
This creates an index on both columns. Your query will probably be able to use this index. You may want to drop your CardType index afterwards: queries can still use the two-column index even if they search on the cardtype column only (but not if they search on cardstatus only).
More information about multiple-column indexes: http://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
Related
mysql create table and set createtime(int) for index, but select not use, so why?
mysql> EXPLAIN SELECT
-> player_id,
-> COUNT(*) AS count_num,
-> SUM( add_gold ) AS sum_add_gold
-> FROM
-> cloud_data_player_gold_log
-> WHERE
-> 1 = 1
-> AND create_time >= 1561046400
-> GROUP BY
-> player_id
-> ORDER BY
-> sum_add_gold ASC
-> LIMIT 0, 10;
+----+-------------+----------------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | cloud_data_player_gold_log | NULL | ALL | create_time | NULL | NULL | NULL | 555659 | 44.47 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show index from cloud_data_player_gold_log;
+----------------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| cloud_data_player_gold_log | 0 | PRIMARY | 1 | log_id | A | 555659 | NULL | NULL | | BTREE | | | YES | NULL |
| cloud_data_player_gold_log | 1 | channel_id | 1 | channel_id | A | 6 | NULL | NULL | | BTREE | | | YES | NULL |
| cloud_data_player_gold_log | 1 | game_id | 1 | game_id | A | 12 | NULL | NULL | | BTREE | | | YES | NULL |
| cloud_data_player_gold_log | 1 | game_id_2 | 1 | game_id | A | 12 | NULL | NULL | | BTREE | | | YES | NULL |
| cloud_data_player_gold_log | 1 | game_id_2 | 2 | room_id | A | 14 | NULL | NULL | | BTREE | | | YES | NULL |
| cloud_data_player_gold_log | 1 | create_time | 1 | create_time | A | 15356 | NULL | NULL | | BTREE | | | YES | NULL |
+----------------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
6 rows in set (0.04 sec)
table total data line is 560000, the sql run time over 0.6s
MySQL assumes that a significant number of rows (44%) are after create_time >= 1561046400. If it would use an index, this would mean MySQL would have to read half of your table by jumping back and forth inside it. As data is stored in blocks (that consist of more than 1 row), it would actually read the table size several times over, since for every useful row it also reads several unneeded rows (although a lot of this is mitigated by caching).
In such a case, it's faster to just read the whole table from start to end once and throw the unneeded rows away, which MySQL has decided to do here.
You can prevent MySQL from having to read the actual table data by providing all the data it needs in the index, by having a covering index (create_time, player_id, add_gold). Then it can just read the index from create_time >= 1561046400 to end in one go without the time consuming jumping inside the table.
If MySQL estimated incorrectly, e.g. there might actually be only a handful of rows in your time range, or if you just want to test the execution time with the index, you can force MySQL to use it with e.g.
... FROM cloud_data_player_gold_log FORCE INDEX (create_time) WHERE ...
This has the general disadvantage that MySQL cannot adapt to changed data, different create_time-parameters or additional filters, e.g. if using a different index would be actually faster.
An alternative index you could try would be (player_id, create_time) (or, covering, (player_id, create_time, add_gold)), which supports the group by.
It will depend on your data distribution which one will be faster: an index starting with create_time has to read less rows, an index starting with player_id has to sort one less time. Depending on how many rows there are, one will offset the other.
I was trying to improve performance on some queries through indexes using EXPLAIN and I noticed each time I used SHOW index FROM TableB; the output of the rows colums in the EXPLAIN of a query changed
Ex:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
and...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Why does this happen?
The rows column should be taken as a rough estimate only. It's not a precise number.
It's based on statistical estimates of how many rows will be examined during a query. The actual number of rows cannot be known until you actually execute the query.
The statistics are based on samples read from the table periodically. These samples are re-read occasionally, for example after you run ANALYZE TABLE or certain INFORMATION_SCHEMA queries, or certain SHOW statements.
I don't find 20% variation in statistics to be a big deal. In many situations, think of the graph being like an upturned parabola, and you need to know which side of the minimum point you are on. In complex queries, where the Optimizer is likely to goof, it need a lot more than simple stats, such as Histograms of MariaDB 10.0 / 10.1. (I don't have enough experience with such to say whether that makes much headway.)
Your particular query is probably going to be performed in only one way, regardless of the statistics. An example of a complicated query would be a JOIN with WHERE clauses filtering each table. The optimizer has to decide which table to start with. Another case is a single table with a WHERE and ORDER BY and they cannot both be handled by a single index -- should it use an index to filter, but then have to sort? or should it use an index for ORDER BY, but then have to filter on the fly?
SELECT id, name, detail FROM student WHERE id NOT IN (1,788,103,100) ORDER BY id DESC LIMIT 1000,10
The table is tiny (10,000 rows). I have to consider two point, "IN query" and "LIMIT query".
Here are the DDLs and the EXPLAIN. I'm using MySQL 5.6.4.
CREATE TABLE student
( id int(11) NOT NULL AUTO_INCREMENT
, name varchar(45) NOT NULL
, detail varchar(255) NOT NULL
, PRIMARY KEY (id)
) ENGINE = MyISAM;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | student| ALL | Primary,id | NULL | NULL | NULL | 13 | |
The LIMIT and ORDER BY clauses mean that the query has to build the whole table and then order it and then go the record 1000 and then extract the next 10 records.
Why are you looking for 10 records starting at record 1000?
Removing the ORDER BY clause would make it faster as the query would only need to extract 1010 records.
I cannot replicate this finding...
SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.5.16 |
+-----------+
SELECT COUNT(*) FROM student;
+----------+
| COUNT(*) |
+----------+
| 131072 |
+----------+
SELECT id
FROM student
WHERE id
NOT IN (1,788,103,100)
ORDER
BY id DESC
LIMIT 1000,10;
+--------+
| id |
+--------+
| 195591 |
| 195590 |
| 195589 |
| 195588 |
| 195587 |
| 195586 |
| 195585 |
| 195584 |
| 195583 |
| 195582 |
+--------+
10 rows in set (0.00 sec)
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | student | range | PRIMARY | PRIMARY | 4 | NULL | 131069 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
I have table as
+-------------------+----------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------------+------+-----+---------------------+-----------------------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| runtime_id | bigint(20) | NO | MUL | NULL | |
| place_id | bigint(20) | NO | MUL | NULL | |
| amended_timestamp | varchar(50) | YES | | NULL | |
| applicable_at | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| schedule_time | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| quality_indicator | varchar(10) | NO | | NULL | |
| flow_rate | decimal(15,10) | NO | | NULL | |
+-------------------+----------------+------+-----+---------------------+-----------------------------+
I have index on schedule_time as
create index table_index on table(schedule_time asc);
The table currently has 2121552+ records.
The thing I fail to understand is when I do explain
explain select runtime_id from table where schedule_time >= now() - INTERVAL 1 DAY;
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
| 1 | SIMPLE | table | range | table_index | table_index | 4 | NULL | 38088 | Using where |
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
1 row in set (0.00 sec)
Above index is used, but the below one not.
mysql> explain select runtime_id from table where schedule_time >= now() - INTERVAL 30 DAY;
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | table | ALL | table_index | NULL | NULL | NULL | 2118107 | Using where |
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
I'll really appreciate if someone can point out whats wrong here, as the data is updated every 12 minutes and as the time passes by query for 30 days or may be 60 days will get very slow.
The final query where I plan to use it is as follows
select avg(flow_rate),c.group from table a ,(select runtime_id from table where schedule_time >= now() - INTERVAL 1 DAY group by schedule_time ) b,place c where a.runtime_id = b.runtime_id and a.place_id = c.id group by c.group;
Update =====>
As per the comments between fails too.
mysql> explain select runtime_id from table where schedule_time between '2013-07-17 12:48:00' and '2013-08-17 12:48:00';
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | table | ALL | table_index | NULL | NULL | NULL | 2118431 | Using where |
+----+-------------+----------+------+------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
mysql> explain select runtime_id from table where schedule_time between '2013-08-16 12:48:00' and '2013-08-17 12:48:00';
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
| 1 | SIMPLE | table | range | table_index | table_index | 4 | NULL | 38770 | Using where |
+----+-------------+----------+-------+------------------------------+------------------------------+---------+------+-------+-------------+
1 row in set (0.00 sec)
Update 2 =======>
mysql> select count(*) from table where schedule_time between '2013-08-16 12:48:00' and '2013-08-17 12:48:00';
+----------+
| count(*) |
+----------+
| 19440 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from table where schedule_time between '2013-07-17 12:48:00' and '2013-08-17 12:48:00';
+----------+
| count(*) |
+----------+
| 597132 |
+----------+
1 row in set (0.00 sec)
Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu)
The MySQL optimizer tries to do the fastest thing. Where it thinks that using the index will take as long or longer than doing a table scan, it abandons the available index.
This is what you see it doing in your examples:
where the range is small (1 day) the index will be faster;
where the range is large, you're going to be hitting so much more of the table you might as well scan the table directly (remember, using the index involves searching the index and then grabbing the indexed records from the table - two sets of seeks).
If you think you know better than the optimizer (it isn't perfect), use INDEX hints:
The USE INDEX (index_list) hint tells MySQL to use only one of the
named indexes to find rows in the table. The alternative syntax IGNORE
INDEX (index_list) tells MySQL to not use some particular index or
indexes. These hints are useful if EXPLAIN shows that MySQL is using
the wrong index from the list of possible indexes.
MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.