I need to create composit clustered index like: username, name, id. Is it real to implement such thing? I need to boost perfomance of query like where username = ? and name = ? by using clustered indexes in Innodb. But i think it wont work because id stay at 3rd place, and it wont be used.
It's fine to define a clustered index with multiple columns.
CREATE TABLE mytable (
username VARCHAR(64) NOT NULL,
name VARCHAR(64) NOT NULL,
id BIGINT
PRIMARY KEY (username, name, id)
);
If you query against the first two columns, it will use the clustered index, so it will avoid the overhead of lookups via secondary indexes.
But if you use EXPLAIN to report the optimizer's plan for the query, you'll see that the access is type: ref which means an index lookup, but not a unique index lookup. That is, it will potentially match multiple rows.
mysql> explain select * from mytable where username = 'user' and name = 'name';
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | PRIMARY | PRIMARY | 516 | const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
When doing lookups against a PRIMARY KEY, we'd like to see type: eq_ref or type: const which means it is doing a unique lookup, and the query is guaranteed to match either 0 or 1 row.
mysql> explain select * from mytable where username = 'user' and name = 'name' and id = 1;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | const | PRIMARY | PRIMARY | 524 | const,const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
Both queries are using the clustered index.
Re your comment:
InnoDB requires the auto-increment column be the first column of a key in the table. It doesn't have to be the primary key. So you can do this for example:
CREATE TABLE `mytable` (
`username` varchar(64) NOT NULL,
`name` varchar(64) NOT NULL,
`id` bigint NOT NULL AUTO_INCREMENT,
`x` int DEFAULT NULL,
PRIMARY KEY (`username`,`name`,`id`),
KEY (`id`)
) ENGINE=InnoDB;
Notice I added an extra KEY (id) to satisfy InnoDB's requirement. But in the primary key, id is still at the end.
Related
This question already has answers here:
Are composite unique keys indexed in MySQL?
(2 answers)
Closed 1 year ago.
I create the following table:
CREATE TABLE ta
(
id BIGINT NOT NULL auto_increment,
company_id BIGINT NOT NULL,
language VARCHAR(10) NOT NULL,
created_at DATETIME,
modified_at DATETIME,
version BIGINT,
PRIMARY KEY (id),
UNIQUE KEY unique_ta (company_id, language)
) engine = InnoDB;
CREATE INDEX ta_company_id on `ta` (company_id);
My question is if I need this line:
CREATE INDEX ta_company_id on `ta` (company_id);
?
Does UNIQUE create indexes on company_id, language automatically?
You probably don't need the extra index on company_id.
The UNIQUE KEY creates an index on the pair of columns (company_id, language) in that order. So any query you would run searching for a specific value of company_id would be able to use that index, even though it only references the first column of the unique key index.
You can see this in EXPLAIN:
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | ref | unique_ta | unique_ta | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
You can see key_len: 8 meaning it is using 8 bytes of the index, and the first BIGINT for company_id is 8 bytes.
Whereas searching for both columns will use the full 50-byte size of the index (8 bytes for the BIGINT + 10 characters for the VARCHAR, 4 bytes per character using utf8mb4, plus a couple of bytes for the VARCHAR length):
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234 AND language = 'EN';
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | const | unique_ta | unique_ta | 50 | const,const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
I said at the top "probably" because there is an exception case, for a specific form of query:
SELECT * FROM ta WHERE company_id = 1234 ORDER BY id;
This type of query would need id to be the second column of the index, so it could be assured of reading rows in primary key order. All indexes implicitly have the primary key column appended, even if you don't declare it. So your unique key index would really store the columns (company_id, language, id), and the single-column index really stores the columns (company_id, id). The latter index would optimize the query I show above, sorting by primary key efficiently.
table:(quantity:2100W)
CREATE TABLE `prefix` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`number` int(11) NOT NULL,
`string` varchar(750) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_string_prefix10` (`string`(10)),
KEY `idx_string` (`string`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
discrimination:
select count(distinct(left(string,10)))/count(*) from prefix;
+-------------------------------------------+
| count(distinct(left(string,10)))/count(*) |
+-------------------------------------------+
| 0.9999 |
+-------------------------------------------+
result:
select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string <"1505d28b"
243.96s,241.88s
select sql_no_cache count(*) from prefix force index(idx_string)
where string < "1505d28b"
7.96s,7.21s,7.53s
why prefix index is slower than index in mysql?(forgive my broken English)
explain select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string < "1505d28b";
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | prefix | NULL | range | idx_string_prefix10 | idx_string_prefix10 | 42 | NULL | 3489704 | 100.00 | Using where |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
When you use a prefix index, MySQL has to read from the index and also after reading the index, it has to read the row of data too, to make sure the value is selected by the WHERE condition. That's two reads, and scanning a lot more data.
When you use a non-prefix index, MySQL can read the whole string value from the index, and it knows immediately whether the value is selected by the condition, or if it can be skipped.
I have two simple tables:
CREATE TABLE cat_urls (
Id int(11) NOT NULL AUTO_INCREMENT,
SIL_Id int(11) NOT NULL,
SiteId int(11) NOT NULL,
AsCatId int(11) DEFAULT NULL,
Href varchar(2048) NOT NULL,
ReferrerHref varchar(2048) NOT NULL DEFAULT '',
AddedOn datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
GroupId int(11) DEFAULT NULL,
PRIMARY KEY (Id),
INDEX SIL (SIL_Id, AsCatId)
)
CREATE TABLE products (
Id int(11) NOT NULL AUTO_INCREMENT,
CatUrlId int(11) NOT NULL,
Href varchar(2048) NOT NULL,
SiteIdentity varchar(2048) NOT NULL,
Price decimal(12, 2) NOT NULL,
IsAvailable bit(1) NOT NULL,
ClientCode varchar(256) NOT NULL,
PRIMARY KEY (Id),
INDEX CatUrl (CatUrlId)
)
And I have pretty simple query:
SELECT cu.Href, COUNT(p.CatUrlId) FROM cat_urls cu
JOIN products p ON p.CatUrlId=cu.Id
WHERE sil_id=4601038
GROUP by cu.Id
EXPLAIN says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cu ref PRIMARY,SIL SIL 4 const 303 Using where; Using temporary; Using filesort
1 SIMPLE p ref CatUrl CatUrl 4 blue_collar_logs.cu.Id 6 Using index
Please tell me is there any way to fix "Using where; Using temporary; Using filesort" and improve perfomance of this query?
It looks that, for some reason, MySQL chooses to use the index SIL on the first table and it uses it both for lookup (WHERE sil_id = 4601038) and grouping (GROUP BY cu.Id).
You can tell it to use the PK of the table
SELECT cu.Href, COUNT(p.CatUrlId) FROM cat_urls cu
USE INDEX FOR JOIN (PRIMARY)
JOIN products p ON p.CatUrlId=cu.Id
WHERE sil_id=4601038
GROUP by cu.Id
and it will produce this execution plan:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---+-------------+-------+-------+---------------+---------+---------+------------------+------+-------------
1 | SIMPLE | cu | index | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where
1 | SIMPLE | p | ref | CatUrl | CatUrl | 4 | cbs-test-1.cu.Id | 1 | Using index
Ignore the values reported in column rows; they are not correct because my tables are empty.
Notice the Extra column now contains only Using where but also notice that the join type column changed from ref (very good) to index (full index scan, not quite good).
A better solution is to add an index on column SIL_Id. I know, SIL_Id is a prefix of index SIL(SIL_Id, AsCatId) and in theory another index on column SIL_Id is completely useless. But it seems it solves the issue on this case.
ALTER TABLE cat_urls
ADD INDEX (SIL_Id)
;
Now use it in the query:
SELECT cu.Href, COUNT(p.CatUrlId) FROM cat_urls cu
USE INDEX FOR JOIN (SIL_Id)
JOIN products p ON p.CatUrlId=cu.Id
WHERE sil_id=4601038
GROUP by cu.Id
The query execution plan looks much better now:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---+-------------+-------+------+---------------+--------+---------+------------------+------+-------------
1 | SIMPLE | cu | ref | SIL_Id | SIL_Id | 4 | const | 1 | Using where
1 | SIMPLE | p | ref | CatUrl | CatUrl | 4 | cbs-test-1.cu.Id | 1 | Using index
The drawback is that we have an extra index that is (theoretically) useless. It occupies storage space and it consumes processor cycles every time a row is added, deleted or have its SIL_Id field modified.
I have these small tables, item and category:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
`category_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `category_id` (`category_id`)
) CHARSET=utf8
CREATE TABLE `category` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) CHARSET=utf8
I have inserted 100 categories and 1000 items.
If I run this:
EXPLAIN SELECT item.id,category.name AS category_name FROM item JOIN category ON item.category_id=category.id;
Then, if the tables' engine is InnoDB I get:
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| 1 | SIMPLE | category | index | PRIMARY | name | 452 | NULL | 103 | Using index |
| 1 | SIMPLE | item | ref | category_id | category_id | 3 | dbname.category.id | 5 | Using index |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
Whereas, if I switch to MyISAM (with alter table engine=myisam) I get:
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| 1 | SIMPLE | item | ALL | category_id | NULL | NULL | NULL | 1003 | |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 3 | dbname.item.category_id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
My question is, why this difference in the way indexes are handled?
In InnoDB, any secondary index internally contains the primary key column of the table. So the index name on column (name) is implicitly on columns (name, id).
This means that EXPLAIN shows your access to the category table as an "index-scan" (this is shown in the type column as "index"). By scanning the index, it also has access to the id column, which it uses to look up rows in the second table, item.
Then it also takes advantage of the item index on (category_id) which is really (category_id, id), and it is able to fetch item.id for your select-list simply by reading the index. No need to read the table at all (this is shown in the Extra column as "Using index").
MyISAM doesn't store primary keys with the secondary key in this way, so it can't get the same optimizations. The access to the category table is type "ALL" which means a table-scan.
I would expect the access to the MyISAM table item would be "ref" as it looks up rows using the index on (category_id). But the optimizer may get skewed results if you have very few rows in the table, or if you haven't done ANALYZE TABLE item since creating the index.
Re your update:
It looks like the optimizer prefers an index-scan over a table-scan, so it takes the opportunity to do an index-scan in InnoDB, and puts the category table first. The optimizer decides to re-order the tables instead of using the tables in the order you gave them in your query.
In the MyISAM tables, there will be one table-scan whichever table it chooses to access first, but by putting the category table second, it joins to category's PRIMARY key index instead of item's secondary index. The optimizer prefers lookups to a unique or primary key (type "eq_ref").
I have the following MySQL table (table size - around 10K records):
CREATE TABLE `tmp_index_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`m_id` int(11) DEFAULT NULL,
`r_id` int(11) DEFAULT NULL,
`price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `m_key` (`m_id`),
KEY `r_key` (`r_id`),
KEY `price_key` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=16390 DEFAULT CHARSET=utf8;
As you can see, I have two INTEGER fields (r_id and m_id) and one FLOAT field (price).
For each of these fields I have an index.
Now, when I run a query with condition on the first integer AND on the second one, everything is fine:
mysql> explain select * from tmp_index_test where m_id=1 and r_id=2;
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | tmp_index_test | index_merge | m_key,r_key | r_key,m_key | 5,5 | NULL | 1 | Using intersect(r_key,m_key); Using where |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
Seems like MySQL performs it very well since there is the Using intersect(r_key,m_key) in the Extra field.
I'm not a MySQL expert, but according to what I understand, MySQL is first making the intersection on indexes, and only then collects the result of the intersection from the table itself.
HOWEVER, when I run very similar query, but instead of condition on two integers, I put similar condition on an integer and a float, MySQL refuses to intersect the result on indexes:
mysql> explain select * from tmp_index_test where m_id=3 and price=100;
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | price_key | 5 | const | 1 | Using where |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
As you can see, MySQL decides to use the index of price only.
My first question is why, and how to fix it?
In addition to it, I need to run queries with MORE sign (>) instead of the equal sign (=) on price. Currently explain shows that for such queries, MySQL uses the integer key only.
mysql> explain select * from tmp_index_test where m_id=3 and price > 100;
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | m_key | 5 | const | 2 | Using where |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
I need to make somehow MySQL first do the intersection on indexes. Anybody has any idea how?
Thanks a lot in advance!
From the MySQL manual:
ref is used if the join uses only a leftmost prefix of the key or if
the key is not a PRIMARY KEY or UNIQUE index (in other words, if the
join cannot select a single row based on the key value). If the key
that is used matches only a few rows, this is a good join type.
price is not unique or primary, so ref is chosen. I don't believe you can force an intersect.