mysql difference in index usage between MyISAM and InnoDB - mysql

I have these small tables, item and category:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
`category_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `category_id` (`category_id`)
) CHARSET=utf8
CREATE TABLE `category` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) CHARSET=utf8
I have inserted 100 categories and 1000 items.
If I run this:
EXPLAIN SELECT item.id,category.name AS category_name FROM item JOIN category ON item.category_id=category.id;
Then, if the tables' engine is InnoDB I get:
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| 1 | SIMPLE | category | index | PRIMARY | name | 452 | NULL | 103 | Using index |
| 1 | SIMPLE | item | ref | category_id | category_id | 3 | dbname.category.id | 5 | Using index |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
Whereas, if I switch to MyISAM (with alter table engine=myisam) I get:
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| 1 | SIMPLE | item | ALL | category_id | NULL | NULL | NULL | 1003 | |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 3 | dbname.item.category_id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
My question is, why this difference in the way indexes are handled?

In InnoDB, any secondary index internally contains the primary key column of the table. So the index name on column (name) is implicitly on columns (name, id).
This means that EXPLAIN shows your access to the category table as an "index-scan" (this is shown in the type column as "index"). By scanning the index, it also has access to the id column, which it uses to look up rows in the second table, item.
Then it also takes advantage of the item index on (category_id) which is really (category_id, id), and it is able to fetch item.id for your select-list simply by reading the index. No need to read the table at all (this is shown in the Extra column as "Using index").
MyISAM doesn't store primary keys with the secondary key in this way, so it can't get the same optimizations. The access to the category table is type "ALL" which means a table-scan.
I would expect the access to the MyISAM table item would be "ref" as it looks up rows using the index on (category_id). But the optimizer may get skewed results if you have very few rows in the table, or if you haven't done ANALYZE TABLE item since creating the index.
Re your update:
It looks like the optimizer prefers an index-scan over a table-scan, so it takes the opportunity to do an index-scan in InnoDB, and puts the category table first. The optimizer decides to re-order the tables instead of using the tables in the order you gave them in your query.
In the MyISAM tables, there will be one table-scan whichever table it chooses to access first, but by putting the category table second, it joins to category's PRIMARY key index instead of item's secondary index. The optimizer prefers lookups to a unique or primary key (type "eq_ref").

Related

MySQL UNIQUE for 2 columns and index [duplicate]

This question already has answers here:
Are composite unique keys indexed in MySQL?
(2 answers)
Closed 1 year ago.
I create the following table:
CREATE TABLE ta
(
id BIGINT NOT NULL auto_increment,
company_id BIGINT NOT NULL,
language VARCHAR(10) NOT NULL,
created_at DATETIME,
modified_at DATETIME,
version BIGINT,
PRIMARY KEY (id),
UNIQUE KEY unique_ta (company_id, language)
) engine = InnoDB;
CREATE INDEX ta_company_id on `ta` (company_id);
My question is if I need this line:
CREATE INDEX ta_company_id on `ta` (company_id);
?
Does UNIQUE create indexes on company_id, language automatically?
You probably don't need the extra index on company_id.
The UNIQUE KEY creates an index on the pair of columns (company_id, language) in that order. So any query you would run searching for a specific value of company_id would be able to use that index, even though it only references the first column of the unique key index.
You can see this in EXPLAIN:
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | ref | unique_ta | unique_ta | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
You can see key_len: 8 meaning it is using 8 bytes of the index, and the first BIGINT for company_id is 8 bytes.
Whereas searching for both columns will use the full 50-byte size of the index (8 bytes for the BIGINT + 10 characters for the VARCHAR, 4 bytes per character using utf8mb4, plus a couple of bytes for the VARCHAR length):
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234 AND language = 'EN';
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | const | unique_ta | unique_ta | 50 | const,const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
I said at the top "probably" because there is an exception case, for a specific form of query:
SELECT * FROM ta WHERE company_id = 1234 ORDER BY id;
This type of query would need id to be the second column of the index, so it could be assured of reading rows in primary key order. All indexes implicitly have the primary key column appended, even if you don't declare it. So your unique key index would really store the columns (company_id, language, id), and the single-column index really stores the columns (company_id, id). The latter index would optimize the query I show above, sorting by primary key efficiently.

Implementation of composit clustered index in MySQL

I need to create composit clustered index like: username, name, id. Is it real to implement such thing? I need to boost perfomance of query like where username = ? and name = ? by using clustered indexes in Innodb. But i think it wont work because id stay at 3rd place, and it wont be used.
It's fine to define a clustered index with multiple columns.
CREATE TABLE mytable (
username VARCHAR(64) NOT NULL,
name VARCHAR(64) NOT NULL,
id BIGINT
PRIMARY KEY (username, name, id)
);
If you query against the first two columns, it will use the clustered index, so it will avoid the overhead of lookups via secondary indexes.
But if you use EXPLAIN to report the optimizer's plan for the query, you'll see that the access is type: ref which means an index lookup, but not a unique index lookup. That is, it will potentially match multiple rows.
mysql> explain select * from mytable where username = 'user' and name = 'name';
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | PRIMARY | PRIMARY | 516 | const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
When doing lookups against a PRIMARY KEY, we'd like to see type: eq_ref or type: const which means it is doing a unique lookup, and the query is guaranteed to match either 0 or 1 row.
mysql> explain select * from mytable where username = 'user' and name = 'name' and id = 1;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | const | PRIMARY | PRIMARY | 524 | const,const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
Both queries are using the clustered index.
Re your comment:
InnoDB requires the auto-increment column be the first column of a key in the table. It doesn't have to be the primary key. So you can do this for example:
CREATE TABLE `mytable` (
`username` varchar(64) NOT NULL,
`name` varchar(64) NOT NULL,
`id` bigint NOT NULL AUTO_INCREMENT,
`x` int DEFAULT NULL,
PRIMARY KEY (`username`,`name`,`id`),
KEY (`id`)
) ENGINE=InnoDB;
Notice I added an extra KEY (id) to satisfy InnoDB's requirement. But in the primary key, id is still at the end.

MySQL 5.6 long WHERE IN query very slow

Since version 5.6 of MySQL a very simple albeit long query takes several orders longer than in 5.4.
The schema: Three tables, one with elements, one with categories and an M:N table tween those. Create Statements:
CREATE TABLE element (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=4257455 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE category (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE elements_categories (
id int(11) NOT NULL AUTO_INCREMENT,
element_id int(11) NOT NULL,
category_id int(11) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY element_id (element_id,category_id),
KEY elements_categories_element_id (element_id),
KEY elements_categories_category_id (category_id),
CONSTRAINT D7d489b06a407a0c1c70f108712c815e FOREIGN KEY (category_id) REFERENCES category (id),
CONSTRAINT co_element_id_57f4f2ec0db9441c_fk_element_id FOREIGN KEY (element_id) REFERENCES element (id)
) ENGINE=InnoDB AUTO_INCREMENT=88131737 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The query:
SELECT elements_categories.element_id, category.id, category.name
FROM category
INNER JOIN elements_categories
ON category.id = elements_categories.category_id
WHERE elements_categories.element_id IN (1, 2, 3, ...)
So, the element table does not even play a role in this query, I already got a bunch of IDs from with with a previous query. (Disclaimer: I'm using an ORM and also inlining the first query did not make things faster.) The number of values in the IN clause can become very big, in my example 14240. That's not a problem, takes a tenth of a second or so. That's the execution plan:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+---------------------------------------------------------------------------+------------+---------+---------------------------------+-------+--------------------------+
| 1 | SIMPLE | elements_categories | range | element_id,elements_categories_element_id,elements_categories.category_id | element_id | 4 | NULL | 42720 | Using where; Using index |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 4 | elements_categories.category_id | 1 | NULL |
When I add one more element, the execution time explodes to 60 seconds plus a fetch time of 200 seconds. The execution plan also changes to this:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+------+---------------------------------------------------------------------------+---------------------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | category | ALL | PRIMARY | NULL | NULL | NULL | 75 | NULL |
| 1 | SIMPLE | elements_categories | ref | element_id,elements_categories_element_id,elements_categories_category_id | elements_categories_category_id | 4 | category.id | 760 | Using where |
range and eq_ref lookups exchanged for ALL and ref, order of tables switched, not using elements_categories.category_id as ref although it is the foreign key between those two tables. I don't get why the plan gets changed like this.
There are 75 categories and 4,300,000 elements and 1,600,000 assignments.
My guess is that I'm exceeding some size limit here, but cannot figure out which one. Also I didn't change anything from the MySQL 5.5 installation which stuck to the former execution plan all the time.
There are several ways to trick the optimizer to use the correct plan:
Add an index hint: ... JOIN elements_categories FORCE INDEX (element_id)...
Swap the tables around and make category a LEFT JOIN (assuming every elements_categories has a category). This is not a generic solution, but should work in this case.
Make a temp table with the element_id's and JOIN it in all of your queries instead of using IN (1,2,3...). You should also be able to use IN (SELECT id FROM <temp table>) instead of literals.
The reason that the optimizer chooses another plan when you have different parameters is that it looks at statistics from the tables and guess which index will remove the most rows, but this is a guess and can often be wrong.
If you know better you need to tell the optimizer what to do with an index hint like the first example #Vatev gives.
An interesting thing about the optimizer is that since an index adds an extra layer of indirection and thus potentially more reads it has to remove more than half the table to be considered useful by the optimizer. (I don't remember how much more than half...)
Another interesting feature of the optimizer is that if the index contains all information needed from a table it can avoid looking up the actual row so depending on your situation you might benefit from adding an extra column to the index. This optimization is used in the first query-plan "using index", but not the second. Thus adding "element_id" to your index "elements_categories_category_id" might speed things up. see http://dev.mysql.com/doc/refman/5.6/en/explain-output.html

mysql foreign key does not work

I have two tables:
CREATE TABLE IF NOT EXISTS treaties(
id INT NOT NULL AUTO_INCREMENT,
name varchar(50) NOT NULL,
PRIMARY KEY(id)
)ENGINE=InnoDB;
and
CREATE TABLE IF NOT EXISTS items(
id INT NOT NULL AUTO_INCREMENT,
treaty INT NOT NULL,
item varchar(20),
PRIMARY KEY(id),
FOREIGN KEY (treaty) REFERENCES treaties(id)
ON UPDATE RESTRICT
ON DELETE RESTRICT
)ENGINE=InnoDB;
After that I inserted few lines in each of tables but values treaties.id and items.treaty were the same.
When I run
EXPLAIN SELECT *
FROM `items`
JOIN `treaties` ON `items`.`treaty` = `treaties`.`id`
WHERE 1
I obtained:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | treaties| ALL | PRIMARY | NULL| NULL | NULL| 3 |
1 | SIMPLE | items | ALL | treaty | NULL| NULL | NULL| 4 | Using where; Using join buffer
I thought if I have foreign key between items.treaty and treaties.id this key must used and type must not be ALL.
What is wrong?
Please, help me!
Thank you!
As explained in the manual:
The output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. This usually happens under the following conditions:
[...]
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length. Don't worry in this case.

MySQL JOIN extremely poor performance

I've been messing around all day trying to find why my query performance is terrible. It is extremely simple, yet can take over 15 minutes to execute (I abort the query at that stage). I am joining a table with over 2 million records.
This is the select:
SELECT
audit.MessageID, alerts.AlertCount
FROM
audit
LEFT JOIN (
SELECT MessageID, COUNT(ID) AS 'AlertCount'
FROM alerts
GROUP BY MessageID
) AS alerts ON alerts.MessageID = audit.MessageID
This is the EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | PRIMARY | AL | index | NULL | IDX_audit_MessageID | 4 | NULL | 2330944 | 100.00 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 124140 | 100.00 | |
| 2 | DERIVED | alerts | index | NULL | IDX_alerts_MessageID | 5 | NULL | 124675 | 100.00 | Using index |
This is the schema:
# Not joining, just showing types
CREATE TABLE messages (
ID int NOT NULL AUTO_INCREMENT,
MessageID varchar(255) NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_messages_MessageID (MessageID)
);
# 2,324,931 records
CREATE TABLE audit (
ID int NOT NULL AUTO_INCREMENT,
MessageID int NOT NULL,
LogTimestamp timestamp NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_audit_MessageID (MessageID),
CONSTRAINT FK_audit_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
# 124,140
CREATE TABLE alerts (
ID int NOT NULL AUTO_INCREMENT,
AlertLevel int NOT NULL,
Text nvarchar(4096) DEFAULT NULL,
MessageID int DEFAULT 0,
PRIMARY KEY (ID),
INDEX IDX_alert_MessageID (MessageID),
CONSTRAINT FK_alert_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
A few very important things to note - the MessageID is not 1:1 in either 'audit' or 'alerts'; The MessageID can exist in one table, but not the other, or may exist in both (which is the purpose of my join); In my test DB, none of the MessageID exist in both. In other words, my query will return 2.3 million records with 0 as the count.
Another thing to note is that the 'audit' and 'alert' tables used to use MessageID as varchar(255). I created the 'messages' table expecting that it would fix the join. It actually made it worse. Previously, it would take 78 seconds, now, it never returns.
What am I missing about MySQL?
Subqueries are very hard for the MySQL engine to optimize. Try:
SELECT
audit.MessageID, COUNT(alerts.ID) AS AlertCount
FROM
audit
LEFT JOIN alerts ON alerts.MessageID = audit.MessageID
GROUP BY audit.MessageID
You're joining to a subquery.
The subquery results are effectively a temporary table - note the <derived2> in the query execution plan. As you can see there, they're not indexed, since they're ephemeral.
You should execute the query as a single unit with a join, rather than joining to the results of a second query.
EDIT: Andrew has posted an answer with one example of how to do your work in a normal join query, instead of in two steps.